48 research outputs found

    A Longitudinal Study of Exploratory and Keyword Search

    No full text
    Digital libraries are concerned with improving the access to collections to make their service more effective and valuable to users. In this paper, we present the results of a four-week longitudinal study investigating the use of both exploratory and keyword forms of search within an online video archive, where both forms of search were available concurrently in a single user interface. While we expected early use to be more exploratory and subsequent use to be directed, over the whole period there was a balance of exploratory and keyword searches and they were often used together. Further, to support the notion that facets support exploration, there were more than five times as many facet clicks than more complex forms of keyword search (boolean and advanced). From these results, we can conclude that there is real value in investing in exploratory search support, which was shown to be both popular and useful for extended use of the system

    A Validated Framework for Measuring Interface Support for Interactive Information Seeking

    No full text
    In this paper we present the validation of an evaluation framework that models the support provided by search systems for different types of user and their expected types of seeking behavior. Factors determining the types of users include previous knowledge and goals. After an overview is presented, the framework is validated in two ways. First, the novel integration of the two existing information-seeking models used in the framework is validated by the correlation of multiple expert and novice analysis. Second, the framework is validated against the results produced by two separated user studies. Further, the refinements made by the first validation technique are shown to increase the accuracy of the framework through the second technique. The successful validation process has shown that the framework can identify both strong and weak areas of search interface design in only a few hours. The results produced can be used to either revise and strengthen designs or inform the structure of a user study

    Distinción semántica de compuestos léxicos en recuperación de información

    Get PDF
    La consideración de sintagmas no parece producir mejoras significativas en los modelos clásicos de Recuperación de Información. En general, se acepta que los criterios de proximidad proporcionan mejores resultados que un criterio de adyacencia. El trabajo que se presenta explora la hipótesis de que no todos los compuestos léxicos deben considerarse de la misma forma. Se propone un procedimiento automático de clasificación semántica de los compuestos léxicos de WordNet sobre la base de sus componentes, y se estudia cómo afecta esta distinción a la Recuperación de Información.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología a través del proyecto Hermes (TIC2000-0335-C03-01)

    Extracting Conceptual Terms from Medical Documents

    Get PDF
    Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which become inputs of next step. Step two includes keyphrase extraction, which can automatically identify important topical terms from candidate terms. Experiments were conducted to evaluate results of both steps. The experiment results show that our noun phrase extractor is effective in identifying noun phrases from medical documents, so is the keyphrase extractor in identifying document conceptual terms

    More Effective Web Search Using Bigrams and Trigrams

    Get PDF
    This paper investigates the effectiveness of quoted bigrams and trigrams as query terms to target web search. Prior research in this area has largely focused on static corpora each containing only a few million documents, and has reported mixed (usually negative) results. We investigate the bigram/trigram extraction problem and present an extraction algorithm that shows promising results when applied to real-time web search. We also present a prototype augmented search software package that can leverage the results provided by a web search engine to assist the web searcher identify important phrases and related documents quickly. This software has received favourable feedback in a recent user survey

    Evaluating the Potential of Explicit Phrases for Retrieval Quality

    Get PDF
    Abstract. This paper evaluates the potential impact of explicit phrases on retrieval quality through a case study with the TREC Terabyte benchmark. It compares the performance of user-and system-identified phrases with a standard score and a proximity-aware score, and shows that an optimal choice of phrases, including term permutations, can significantly improve query performance

    Social Search with Missing Data: Which Ranking Algorithm?

    Get PDF
    Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods

    Extraction of Keyphrases from Text: Evaluation of Four Algorithms

    Get PDF
    This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithm’s keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsoft’s Word 97, (2) an algorithm based on Eric Brill’s part-of-speech tagger, (3) the Summarize feature in Verity’s Search 97, and (4) NRC’s Extractor algorithm. For all five document collections, NRC’s Extractor yields the best match with the manually generated keyphrases
    corecore