91 research outputs found
Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus
In Web search, entity-seeking queries often trigger a special Question
Answering (QA) system. It may use a parser to interpret the question to a
structured query, execute that on a knowledge graph (KG), and return direct
entity responses. QA systems based on precise parsing tend to be brittle: minor
syntax variations may dramatically change the response. Moreover, KG coverage
is patchy. At the other extreme, a large corpus may provide broader coverage,
but in an unstructured, unreliable form. We present AQQUCN, a QA system that
gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of
query syntax, between well-formed questions to short `telegraphic' keyword
sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals
from KGs and large corpora to directly rank KG entities, rather than commit to
one semantic interpretation of the query. AQQUCN models the ideal
interpretation as an unobservable or latent variable. Interpretations and
candidate entity responses are scored as pairs, by combining signals from
multiple convolutional networks that operate collectively on the query, KG and
corpus. On four public query workloads, amounting to over 8,000 queries with
diverse query syntax, we see 5--16% absolute improvement in mean average
precision (MAP), compared to the entity ranking performance of recent systems.
Our system is also competitive at entity set retrieval, almost doubling F1
scores for challenging short queries.Comment: Accepted to Information Retrieval Journa
A Method for Short Message Contextualization: Experiments at CLEF/INEX
International audienceThis paper presents the approach we developed for automatic multi-document summarization applied to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm from smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval and query expansion. The evaluation results indicate good performance of the approach
INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned
Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task
Social Book Search: A Methodology that Combines both Retrieval and Recommendation
University of Minnesota M.S. thesis.August 2014. Major: Computer Science. Advisor: Carolyn Crouch. 1 computer file (PDF); vii, 43 pages.Information Retrieval as an area of research aims at satisfying the information need of a user. Retrieval in the Information Age has expanded exponentially as its underlying technologies have expanded. Traditional IR systems that give response to a user's natural language search query are combined with recommendation through collaborative filtering [6]. This research focuses on a methodology that combines both traditional IR and recommender systems. It is done as part of the Social Book Search (SBS) Track, Suggestion task of INEX (INitiative for the Evaluation of XML Retrieval) 2014 [3]. The Social Book Search Track was introduced by INEX in 2011 with the purpose of providing support to users in terms of easy search and access to books by using metadata. One complexity of the task lies in handling both professional and social metadata which are different in terms of both kind and quantity. Methodology and experiments discussed are inspired by background research [1,2,4,5,6] on the Social Book Search track. Our IR team submitted six runs for the track to the INEX 2014 competition, five of which use a recommender system that re-ranks the otherwise traditional set of results. Background work done to establish a good foundation for the methodology used in the SBS 2014 task includes experiments performed on both the 2011 and 2013 Social Book Search tracks. This research focuses on the 2013 experiments and their impact on results produced for SBS 2014
Knowledge graph exploration for natural language understanding in web information retrieval
In this thesis, we study methods to leverage information from fully-structured knowledge bases
(KBs), in particular the encyclopedic knowledge graph (KG) DBpedia, for different text-related
tasks from the area of information retrieval (IR) and natural language processing (NLP). The
key idea is to apply entity linking (EL) methods that identify mentions of KB entities in text,
and then exploit the structured information within KGs. Developing entity-centric methods for
text understanding using KG exploration is the focus of this work.
We aim to show that structured background knowledge is a means for improving performance in
different IR and NLP tasks that traditionally only make use of the unstructured text input itself.
Thereby, the KB entities mentioned in text act as connection between the unstructured text and
the structured KG. We focus in particular on how to best leverage the knowledge as contained in
such fully-structured (RDF) KGs like DBpedia with their labeled edges/predicates – which is in
contrast to previous work on Wikipedia-based approaches we build upon, which typically relies
on unlabeled graphs only. The contribution of this thesis can be structured along its three parts:
In Part I, we apply EL and semantify short text snippets with KB entities. While only retrieving
types and categories from DBpedia for each entity, we are able to leverage this information
to create semantically coherent clusters of text snippets. This pipeline of connecting text to
background knowledge via the mentioned entities will be reused in all following chapters.
In Part II, we focus on semantic similarity and extend the idea of semantifying text with entities
by proposing in Chapter 5 a model that represents whole documents by their entities. In this
model, comparing documents semantically with each other is viewed as the task of comparing
the semantic relatedness of the respective entities, which we address in Chapter 4. We propose
an unsupervised graph weighting schema and show that weighting the DBpedia KG leads to
better results on an existing entity ranking dataset. The exploration of weighted KG paths turns
out to be also useful when trying to disambiguate the entities from an open information extraction
(OIE) system in Chapter 6. With this weighting schema, the integration of KG information
for computing semantic document similarity in Chapter 5 becomes the task of comparing the two
KG subgraphs with each other, which we address by an approximate subgraph matching. Based
on a well-established evaluation dataset for semantic document similarity, we show that our unsupervised
method achieves competitive performance similar to other state-of-the-art methods.
Our results from this part indicate that KGs can contain helpful background knowledge, in particular
when exploring KG paths, but that selecting the relevant parts of the graph is an important
yet difficult challenge.
In Part III, we shift to the task of relevance ranking and first study in Chapter 7 how to best
retrieve KB entities for a given keyword query. Combining again text with KB information, we
extract entities from the top-k retrieved, query-specific documents and then link the documents
to two different KBs, namely Wikipedia and DBpedia. In a learning-to-rank setting, we study
extensively which features from the text, theWikipedia KB, and the DBpedia KG can be helpful
for ranking entities with respect to the query. Experimental results on two datasets, which build
upon existing TREC document retrieval collections, indicate that the document-based mention
frequency of an entity and the Wikipedia-based query-to-entity similarity are both important
features for ranking. The KG paths in contrast play only a minor role in this setting, even when
integrated with a semantic kernel extension. In Chapter 8, we further extend the integration of
query-specific text documents and KG information, by extracting not only entities, but also relations
from text. In this exploratory study based on a self-created relevance dataset, we find that
not all extracted relations are relevant with respect to the query, but that they often contain information
not contained within the DBpedia KG. The main insight from the research presented in
this part is that in a query-specific setting, established IR methods for document retrieval provide
an important source of information even for entity-centric tasks, and that a close integration of
relevant text document and background knowledge is promising.
Finally, in the concluding chapter we argue that future research should further address the integration
of KG information with entities and relations extracted from (specific) text documents,
as their potential seems to be not fully explored yet. The same holds also true for a better KG
exploration, which has gained some scientific interest in recent years. It seems to us that both aspects
will remain interesting problems in the next years, also because of the growing importance
of KGs for web search and knowledge modeling in industry and academia
- …