103 research outputs found

    The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

    Full text link
    The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.Comment: SIGIR 2023 resource paper, 13 page

    Query Suggestion and Data Fusion in Contextual Disambiguation

    Full text link

    Synsets improve short text clustering for search support: combining LDA and WordNet

    Get PDF
    In this study, I proposed a short text clustering approach with WordNet as the external resources to cluster documents from corpus.byu.edu. Experimental results show that our approach largely improved the clustering performance. The factors that have an influence on the performance of the topic model are the total number of documents, Synsets distribution among topics and words overlapping between the query’s Synsets. In addition, the performance will also be influenced by the missing Synset in WordNet. Finally, we provide an idea of using clustering approaches generating ranked query suggestion to disambiguate the query. Combining with Synsets of the query, text document clustering can provide an effective way to disambiguate user search query by organizing a large set of searching results into a small number of groups labeled with Synsets from WordNet.Master of Science in Information Scienc

    Exploratory information searching in the enterprise: a study of user satisfaction and task performance.

    Get PDF
    No prior research has been identified that investigates the causal factors for workplace exploratory search task performance. The impact of user, task, and environmental factors on user satisfaction and task performance was investigated through a mixed methods study with 26 experienced information professionals using enterprise search in an oil and gas enterprise. Some participants found 75% of high-value items, others found none, with an average of 27%. No association was found between self-reported search expertise and task performance, with a tendency for many participants to overestimate their search expertise. Successful searchers may have more accurate mental models of both search systems and the information space. Organizations may not have effective exploratory search task performance feedback loops, a lack of learning. This may be caused by management bias towards technology, not capability, a lack of systems thinking. Furthermore, organizations may not “know” they “don't know” their true level of search expertise, a lack of knowing. A metamodel is presented identifying the causal factors for workplace exploratory search task performance. Semistructured qualitative interviews with search staff from the defense, pharmaceutical, and aerospace sectors indicates the potential transferability of the finding that organizations may not know their search expertise levels

    The Use of Social Tags in Text and Image Searching on the Web.

    Full text link
    In recent years, tags have become a standard feature on a diverse range of sites on the Web, accompanying blog posts, photos, videos, and online news stories. Tags are descriptive terms attached to Internet resources. Despite the rapid adoption of tagging, how people use tags during the search process is not well understood. There is little empirical data on the use and perceptions of tags created by those other than the searcher. Previous research on tags focused on the motivations and behaviors of taggers, although non-taggers represent a larger proportion of Web users than taggers. This study examines how people use tags, created by others, during the search process. Forty-eight subjects were each assigned four search tasks in a within-subjects study. Subjects searched for text documents and images in a controlled laboratory setting, using information retrieval interfaces differing in their incorporation of tags. User behavior and perception data were collected through search logs and interviews. Both direct and indirect uses of tags across the search process were examined. Tags are used directly when they are clicked on, resulting in a new query, while tags are used indirectly when used for judgments of relevance or to obtain additional terms for query reformulation. Tags increased interactions with the information retrieval system, as subjects issued more queries and saw more search results when using the tagged interface. For both text and image searches, tags were used for query reformulation, predictive judgment, and evaluative judgment of relevance. Subjects interacted most frequently with tags on the search results page, using them for query reformulation and predictive judgment. Tags were more likely to be used for predictive judgment in text searches than in image searches. Subjects’ understanding of tags focused on the role of tags in search, especially findability through a search engine. Tags were not uniformly perceived as being user-generated; site owners and automatic generation were mentioned as sources of tags. Several implications for the design of search interfaces and presentation of tags to support information interactions are discussed in the conclusion.Ph.D.InformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89816/1/kimym_1.pd

    DIR 2011: Dutch_Belgian Information Retrieval Workshop Amsterdam

    Get PDF

    Named entity recognition and classification in search queries

    Get PDF
    Named Entity Recognition and Classification is the task of extracting from text, instances of different entity classes such as person, location, or company. This task has recently been applied to web search queries in order to better understand their semantics, where a search query consists of linguistic units that users submit to a search engine to convey their search need. Discovering and analysing the linguistic units comprising a search query enables search engines to reveal and meet users' search intents. As a result, recent research has concentrated on analysing the constituent units comprising search queries. However, since search queries are short, unstructured, and ambiguous, an approach to detect and classify named entities is presented in this thesis, in which queries are augmented with the text snippets of search results for search queries. The thesis makes the following contributions: 1. A novel method for detecting candidate named entities in search queries, which utilises both query grammatical annotation and query segmentation. 2. A novel method to classify the detected candidate entities into a set of target entity classes, by using a seed expansion approach; the method presented exploits the representation of the sets of contextual clues surrounding the entities in the snippets as vectors in a common vector space. 3. An exploratory analysis of three main categories of search refiners: nouns, verbs, and adjectives, that users often incorporate in entity-centric queries in order to further refine the entity-related search results. 4. A taxonomy of named entities derived from a search engine query log. By using a large commercial query log, experimental evidence is provided that the work presented herein is competitive with the existing research in the field of entity recognition and classification in search queries
    • …
    corecore