Title

Capturing user intent for information retrieval

Date of Completion

January 2005

Keywords

Computer Science

Degree

Ph.D.

Abstract

This dissertation addresses the problem of employing a cognitive user model for information retrieval (IR) in which knowledge about a user is captured and used for improving a user's performance in an information seeking task. This research improves a user's effectiveness in a search by developing a hybrid user model to capture user intent dynamically and by combining the captured user intent with the elements of an IR system. The term hybrid refers to the methodology of combining the attributes describing a user's intent with the attributes describing an IR system in a decision theoretic framework. First, the model capturing a user's intent is built (it is called the IPC user model). Then, the hybrid model is created by combining the captured user intent with the elements of an IR system in a decision theoretic framework. In the hybrid user model, the multi-attribute utility model is used to evaluate values of the attributes describing a user's intent in combination with the attributes describing an IR system. We use the research on predicting query performance and on determining dissemination threshold to create functions to evaluate these chosen attributes. Two contributions are the integration of user intent and system elements in a decision theoretic framework and the unified evaluation framework. This approach also offers fine-grained representation of the model and the ability to learn a user's knowledge dynamically. I compare this approach with the best traditional approach in the IR community---Ide dec-hi using term frequency inverted document frequency weighting on selected collections from the IR community such as CRANFIELD, MEDLINE, and CACM. The results for the evaluation with the IPC model show that this approach retrieves more relevant documents in the initial run and performs competitively with the Ide dec-hi in the feedback run. The evaluations with our hybrid model shows that this approach retrieves more relevant documents in the first 15 returned documents than the IPC model and TFIDF approaches. A user study with human intelligence analysts shows that this approach tracked an individual's interests and helped retrieve more relevant documents compared with a commercial keyword-based system. ^