64 research outputs found
A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems
Search-oriented conversational systems rely on information needs expressed in
natural language (NL). We focus here on the understanding of NL expressions for
building keyword-based queries. We propose a reinforcement-learning-driven
translation model framework able to 1) learn the translation from NL
expressions to queries in a supervised way, and, 2) to overcome the lack of
large-scale dataset by framing the translation model as a word selection
approach and injecting relevance feedback in the learning process. Experiments
are carried out on two TREC datasets and outline the effectiveness of our
approach.Comment: This is the author's pre-print version of the work. It is posted here
for your personal use, not for redistribution. Please cite the definitive
version which will be published in Proceedings of the 2018 EMNLP Workshop
SCAI: The 2nd International Workshop on Search-Oriented Conversational AI -
ISBN: 978-1-948087-75-
Overview of BioCreAtIvE: critical assessment of information extraction for biology
<p>Abstract</p> <p>Background</p> <p>The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28–31, 2004. The articles collected in this <it>BMC Bioinformatics </it>supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation.</p> <p>Results</p> <p>BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles.</p> <p>Conclusion</p> <p>The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries). The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text) were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.</p
CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning
Compared to standard retrieval tasks, passage retrieval for conversational
question answering (CQA) poses new challenges in understanding the current user
question, as each question needs to be interpreted within the dialogue context.
Moreover, it can be expensive to re-train well-established retrievers such as
search engines that are originally developed for non-conversational queries. To
facilitate their use, we develop a query rewriting model CONQRR that rewrites a
conversational question in the context into a standalone question. It is
trained with a novel reward function to directly optimize towards retrieval
using reinforcement learning and can be adapted to any off-the-shelf retriever.
We show that CONQRR achieves state-of-the-art results on a recent open-domain
CQA dataset containing conversations from three different sources, and is
effective for two different off-the-shelf retrievers. Our extensive analysis
also shows the robustness of CONQRR to out-of-domain dialogues as well as to
zero query rewriting supervision
Corpus-Level End-to-End Exploration for Interactive Systems
A core interest in building Artificial Intelligence (AI) agents is to let
them interact with and assist humans. One example is Dynamic Search (DS), which
models the process that a human works with a search engine agent to accomplish
a complex and goal-oriented task. Early DS agents using Reinforcement Learning
(RL) have only achieved limited success for (1) their lack of direct control
over which documents to return and (2) the difficulty to recover from wrong
search trajectories. In this paper, we present a novel corpus-level end-to-end
exploration (CE3) method to address these issues. In our method, an entire text
corpus is compressed into a global low-dimensional representation, which
enables the agent to gain access to the full state and action spaces, including
the under-explored areas. We also propose a new form of retrieval function,
whose linear approximation allows end-to-end manipulation of documents.
Experiments on the Text REtrieval Conference (TREC) Dynamic Domain (DD) Track
show that CE3 outperforms the state-of-the-art DS systems.Comment: Accepted into AAAI 202
Patent Retrieval in Chemistry based on semantically tagged Named Entities
Gurulingappa H, Müller B, Klinger R, et al. Patent Retrieval in Chemistry based on semantically tagged Named Entities. In: Voorhees EM, Buckland LP, eds. The Eighteenth Text RETrieval Conference (TREC 2009) Proceedings. Gaithersburg, Maryland, USA; 2009.This paper reports on the work that has been conducted
by Fraunhofer SCAI for Trec Chemistry
(Trec-Chem) track 2009. The team of Fraunhofer
SCAI participated in two tasks, namely Technology
Survey and Prior Art Search. The core of the framework
is an index of 1.2 million chemical patents provided
as a data set by Trec. For the technology
survey, three runs were submitted based on semantic
dictionaries and noun phrases. For the prior art
search task, several elds were introduced into the index
that contained normalized noun phrases, biomedical
as well as chemical entities. Altogether, 36 runs
were submitted for this task that were based on automatic
querying with tokens, noun phrases and entities
along with dierent search strategies
Integrating structure in the probabilistic model for Information Retrieval
International audienceIn databases or in the World Wide Web, many documents are in a structured format (e.g. XML). We propose in this article to extend the classical IR probabilistic model in order to take into account the structure through the weighting of tags. Our approach includes a learning step in which the weight of each tag is computed. This weight estimates the probability that the tag distinguishes the terms which are the most relevant. Our model has been evaluated on a large collection during INEX IR evaluation campaigns
- …