247 research outputs found
Recommended from our members
Extracting Relations from Large Plain-Text Collections
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queriesor for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns,that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention,In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents
Extracting Synonymous Gene and Protein Terms From Biological Literature
Genes and proteins are often associated with multiple names. More names are added as new functional or structural information is discovered. Because authors can use any one of the known names for a gene or protein, information retrieval and extraction would benefit from identifying the gene and protein terms that are synonyms of the same substance
Combining Strategies for Extracting Relations from Text Collections
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. Our Snowball system extracts these relations from document collections starting with only a handful of user-provided example tuples. Based on these tuples, Snowball generates patterns that are used, in turn, to find more tuples. In this paper we introduce a new pattern and tuple generation scheme for Snowball, with different strengths and weaknesses than those of our original system. We also show preliminary results on how we can combine the two versions of Snowball to extract tuples more accurately
An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback
While search is the predominant method of accessing information, formulating
effective queries remains a challenging task, especially for situations where
the users are not familiar with a domain, or searching for documents in other
languages, or looking for complex information such as events, which are not
easily expressible as queries. Providing example documents or passages of
interest, might be easier for a user, however, such query-by-example scenarios
are prone to concept drift, and are highly sensitive to the query generation
method. This demo illustrates complementary approaches of using LLMs
interactively, assisting and enabling the user to provide edits and feedback at
all stages of the query formulation process. The proposed Query Generation
Assistant is a novel search interface which supports automatic and interactive
query generation over a mono-linguial or multi-lingual document collection.
Specifically, the proposed assistive interface enables the users to refine the
queries generated by different LLMs, to provide feedback on the retrieved
documents or passages, and is able to incorporate the users' feedback as
prompts to generate more effective queries. The proposed interface is a
valuable experimental tool for exploring fine-tuning and prompting of LLMs for
query generation to qualitatively evaluate the effectiveness of retrieval and
ranking models, and for conducting Human-in-the-Loop (HITL) experiments for
complex search tasks where users struggle to formulate queries without such
assistance.Comment: Intelligence Advanced Research Projects Activity (IARPA) BETTER
Research Progra
- …