22,316 research outputs found

    Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

    Get PDF
    Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape

    Generating Synthetic Data for Neural Keyword-to-Question Models

    Full text link
    Search typically relies on keyword queries, but these are often semantically ambiguous. We propose to overcome this by offering users natural language questions, based on their keyword queries, to disambiguate their intent. This keyword-to-question task may be addressed using neural machine translation techniques. Neural translation models, however, require massive amounts of training data (keyword-question pairs), which is unavailable for this task. The main idea of this paper is to generate large amounts of synthetic training data from a small seed set of hand-labeled keyword-question pairs. Since natural language questions are available in large quantities, we develop models to automatically generate the corresponding keyword queries. Further, we introduce various filtering mechanisms to ensure that synthetic training data is of high quality. We demonstrate the feasibility of our approach using both automatic and manual evaluation. This is an extended version of the article published with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page

    User experiments with the Eurovision cross-language image retrieval system

    Get PDF
    In this paper we present Eurovision, a text-based system for cross-language (CL) image retrieval. The system is evaluated by multilingual users for two search tasks with the system configured in English and five other languages. To our knowledge this is the first published set of user experiments for CL image retrieval. We show that: (1) it is possible to create a usable multilingual search engine using little knowledge of any language other than English, (2) categorizing images assists the user's search, and (3) there are differences in the way users search between the proposed search tasks. Based on the two search tasks and user feedback, we describe important aspects of any CL image retrieval system

    Query recovery of short user queries: on query expansion with stopwords

    Get PDF
    User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate a linguistically well-formed query from a given user query as input to provide natural language processing tasks and cross-language information retrieval (CLIR). The evaluation of query translation shows that translation scores (NIST and BLEU) decrease after case folding, stopword removal, and stemming. A baseline method for query recovery reconstructs capitalization and stopwords, which considerably increases translation scores and significantly increases mean average precision for a standard CLIR task

    From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data

    Get PDF
    We compare two distinct approaches for querying data in the context of the life sciences. The first approach utilizes conventional databases to store the data and intuitive form-based interfaces to facilitate easy querying of the data. These interfaces could be seen as implementing a set of "pre-canned" queries commonly used by the life science researchers that we study. The second approach is based on semantic Web technologies and is knowledge (model) driven. It utilizes a large OWL ontology and same datasets as before but associated as RDF instances of the ontology concepts. An intuitive interface is provided that allows the formulation of RDF triples-based queries. Both these approaches are being used in parallel by a team of cell biologists in their daily research activities, with the objective of gradually replacing the conventional approach with the knowledge-driven one. This provides us with a valuable opportunity to compare and qualitatively evaluate the two approaches. We describe several benefits of the knowledge-driven approach in comparison to the traditional way of accessing data, and highlight a few limitations as well. We believe that our analysis not only explicitly highlights the specific benefits and limitations of semantic Web technologies in our context but also contributes toward effective ways of translating a question in a researcher's mind into precise computational queries with the intent of obtaining effective answers from the data. While researchers often assume the benefits of semantic Web technologies, we explicitly illustrate these in practice

    How to Find Suitable Ontologies Using an Ontology-based WWW Broker

    Get PDF
    Knowledge reuse by means of outologies now faces three important problems: (1) there are no standardized identifying features that characterize ontologies from the user point of view; (2) there are no web sites using the same logical organization, presenting relevant information about ontologies; and (3) the search for appropriate ontologies is hard, time-consuming and usually fruitless. To solve the above problems, we present: (1) a living set of features that allow us to characterize ontologies from the user point of view and have the same logical organization; (2) a living domain ontology about ontologies (called ReferenceOntology) that gathers, describes and has links to existing ontologies; and (3) (ONTO)2Agent, the ontology-based www broker about ontologies that uses the Reference Ontology as a source of its knowledge and retrieves descriptions of ontologies that satisfy a given set of constraints. (ONTO)~Agent is available at http://delicias.dia.fi.upm.es/REFERENCE ONTOLOGY
    corecore