48 research outputs found
A Longitudinal Study of Exploratory and Keyword Search
Digital libraries are concerned with improving the access to collections to make their service more effective and valuable to users. In this paper, we present the results of a four-week longitudinal study investigating the use of both exploratory and keyword forms of search within an online video archive, where both forms of search were available concurrently in a single user interface. While we expected early use to be more exploratory and subsequent use to be directed, over the whole period there was a balance of exploratory and keyword searches and they were often used together. Further, to support the notion that facets support exploration, there were more than five times as many facet clicks than more complex forms of keyword search (boolean and advanced). From these results, we can conclude that there is real value in investing in exploratory search support, which was shown to be both popular and useful for extended use of the system
A Validated Framework for Measuring Interface Support for Interactive Information Seeking
In this paper we present the validation of an evaluation framework that models the support provided by search systems for different types of user and their expected types of seeking behavior. Factors determining the types of users include previous knowledge and goals. After an overview is presented, the framework is validated in two ways. First, the novel integration of the two existing information-seeking models used in the framework is validated by the correlation of multiple expert and novice analysis. Second, the framework is validated against the results produced by two separated user studies. Further, the refinements made by the first validation technique are shown to increase the accuracy of the framework through the second technique. The successful validation process has shown that the framework can identify both strong and weak areas of search interface design in only a few hours. The results produced can be used to either revise and strengthen designs or inform the structure of a user study
Distinción semántica de compuestos léxicos en recuperación de información
La consideración de sintagmas no parece producir mejoras significativas en los
modelos clásicos de Recuperación de Información. En general, se acepta que los criterios de
proximidad proporcionan mejores resultados que un criterio de adyacencia. El trabajo que se
presenta explora la hipótesis de que no todos los compuestos léxicos deben considerarse de la
misma forma. Se propone un procedimiento automático de clasificación semántica de los
compuestos léxicos de WordNet sobre la base de sus componentes, y se estudia cómo afecta
esta distinción a la Recuperación de Información.Este trabajo ha sido parcialmente financiado
por el Ministerio de Ciencia y Tecnología a
través del proyecto Hermes (TIC2000-0335-C03-01)
Extracting Conceptual Terms from Medical Documents
Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which become inputs of next step. Step two includes keyphrase extraction, which can automatically identify important topical terms from candidate terms. Experiments were conducted to evaluate results of both steps. The experiment results show that our noun phrase extractor is effective in identifying noun phrases from medical documents, so is the keyphrase extractor in identifying document conceptual terms
More Effective Web Search Using Bigrams and Trigrams
This paper investigates the effectiveness of quoted bigrams and trigrams as query terms to target web search. Prior research in this area has largely focused on static corpora each containing only a few million documents, and has reported mixed (usually negative) results. We investigate the bigram/trigram extraction problem and present an extraction algorithm that shows promising results when applied to real-time web search. We also present a prototype augmented search software package that can leverage the results provided by a web search engine to assist the web searcher identify important phrases and related documents quickly. This software has received favourable feedback in a recent user survey
Evaluating the Potential of Explicit Phrases for Retrieval Quality
Abstract. This paper evaluates the potential impact of explicit phrases on retrieval quality through a case study with the TREC Terabyte benchmark. It compares the performance of user-and system-identified phrases with a standard score and a proximity-aware score, and shows that an optimal choice of phrases, including term permutations, can significantly improve query performance
Social Search with Missing Data: Which Ranking Algorithm?
Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods
Extraction of Keyphrases from Text: Evaluation of Four Algorithms
This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithms keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsofts Word 97, (2) an algorithm based on Eric Brills part-of-speech tagger, (3) the Summarize feature in Veritys Search 97, and (4) NRCs Extractor algorithm. For all five document collections, NRCs Extractor yields the best match with the manually generated keyphrases