22,316 research outputs found
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
Translating verbose information needs into crisp search queries is a
phenomenon that is ubiquitous but hardly understood. Insights into this process
could be valuable in several applications, including synthesizing large
privacy-friendly query logs from public Web sources which are readily available
to the academic research community. In this work, we take a step towards
understanding query formulation by tapping into the rich potential of community
question answering (CQA) forums. Specifically, we sample natural language (NL)
questions spanning diverse themes from the Stack Exchange platform, and conduct
a large-scale conversion experiment where crowdworkers submit search queries
they would use when looking for equivalent information. We provide a careful
analysis of this data, accounting for possible sources of bias during
conversion, along with insights into user-specific linguistic patterns and
search behaviors. We release a dataset of 7,000 question-query pairs from this
study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape
Generating Synthetic Data for Neural Keyword-to-Question Models
Search typically relies on keyword queries, but these are often semantically
ambiguous. We propose to overcome this by offering users natural language
questions, based on their keyword queries, to disambiguate their intent. This
keyword-to-question task may be addressed using neural machine translation
techniques. Neural translation models, however, require massive amounts of
training data (keyword-question pairs), which is unavailable for this task. The
main idea of this paper is to generate large amounts of synthetic training data
from a small seed set of hand-labeled keyword-question pairs. Since natural
language questions are available in large quantities, we develop models to
automatically generate the corresponding keyword queries. Further, we introduce
various filtering mechanisms to ensure that synthetic training data is of high
quality. We demonstrate the feasibility of our approach using both automatic
and manual evaluation. This is an extended version of the article published
with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page
User experiments with the Eurovision cross-language image retrieval system
In this paper we present Eurovision, a text-based system for cross-language (CL) image retrieval.
The system is evaluated by multilingual users for two search tasks with the system configured in
English and five other languages. To our knowledge this is the first published set of user
experiments for CL image retrieval. We show that: (1) it is possible to create a usable multilingual
search engine using little knowledge of any language other than English, (2) categorizing images
assists the user's search, and (3) there are differences in the way users search between the proposed
search tasks. Based on the two search tasks and user feedback, we describe important aspects of
any CL image retrieval system
Query recovery of short user queries: on query expansion with stopwords
User queries to search engines are observed to predominantly contain inflected content words but lack stopwords and capitalization. Thus, they often resemble natural language queries after case folding and stopword removal. Query recovery aims to generate a linguistically well-formed query from a given user query as input to provide natural language processing tasks and cross-language information retrieval (CLIR). The evaluation of query translation shows that translation scores (NIST and BLEU) decrease after case folding, stopword removal, and stemming. A baseline method for query recovery reconstructs capitalization and stopwords, which considerably increases translation scores and significantly increases mean average precision for a standard CLIR task
From Questions to Effective Answers: On the Utility of Knowledge-Driven Querying Systems for Life Sciences Data
We compare two distinct approaches for querying data in the context of the
life sciences. The first approach utilizes conventional databases to store the
data and intuitive form-based interfaces to facilitate easy querying of the
data. These interfaces could be seen as implementing a set of "pre-canned"
queries commonly used by the life science researchers that we study. The second
approach is based on semantic Web technologies and is knowledge (model) driven.
It utilizes a large OWL ontology and same datasets as before but associated as
RDF instances of the ontology concepts. An intuitive interface is provided that
allows the formulation of RDF triples-based queries. Both these approaches are
being used in parallel by a team of cell biologists in their daily research
activities, with the objective of gradually replacing the conventional approach
with the knowledge-driven one. This provides us with a valuable opportunity to
compare and qualitatively evaluate the two approaches. We describe several
benefits of the knowledge-driven approach in comparison to the traditional way
of accessing data, and highlight a few limitations as well. We believe that our
analysis not only explicitly highlights the specific benefits and limitations
of semantic Web technologies in our context but also contributes toward
effective ways of translating a question in a researcher's mind into precise
computational queries with the intent of obtaining effective answers from the
data. While researchers often assume the benefits of semantic Web technologies,
we explicitly illustrate these in practice
How to Find Suitable Ontologies Using an Ontology-based WWW Broker
Knowledge reuse by means of outologies now faces three important problems: (1) there are no standardized identifying features that characterize ontologies from the user point of view; (2) there are no web sites using the same logical organization, presenting relevant information about ontologies; and (3) the search for appropriate ontologies is hard, time-consuming and usually fruitless. To solve the above problems, we present: (1) a living set of features that allow us to characterize ontologies from the user point of view and have the same logical organization; (2) a living domain ontology about ontologies (called ReferenceOntology) that gathers, describes and has links to existing ontologies; and (3) (ONTO)2Agent, the ontology-based www broker about ontologies that uses the Reference Ontology as a source of its knowledge and retrieves descriptions of ontologies that satisfy a given set of constraints. (ONTO)~Agent is available at http://delicias.dia.fi.upm.es/REFERENCE ONTOLOGY
- …