Search CORE

2,671 research outputs found

Toward Entity-Aware Search

Author: Cheng Tao
Publication venue
Publication date: 01/12/2010
Field of study

As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

Illinois Digital Environment for Access to Learning and Scholarship Repository

Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

Author: Sawant Uma
Garg Saurabh
Chakrabarti Soumen
Ramakrishnan Ganesh
Publication venue
Publication date: 06/12/2018
Field of study

In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

arXiv.org e-Print Archive

Biblioteca Digital de la Comunidad de Madrid

Tailored deep learning techniques for information retrieval

Author: Nie Yifan
Publication venue
Publication date: 01/12/2021
Field of study

La recherche d'information vise à trouver des documents pertinents par rapport à une requête. Auparavant, de nombreux modèles traditionnels de la Recherche d'Informations ont été proposés. Ils essaient soit d'encoder la requête et les documents en vecteurs dans l'espace des termes et d'estimer la pertinence en calculant la similarité des deux vecteurs, soit d'estimer la pertinence par des modèles probabilistes. Cependant, pour les modèles d'espace vectoriel, l'encodage des requêtes et des documents dans l'espace des termes a ses limites: par exemple, il est difficile d'identifier les termes du document qui ont des sens similaires au termes exactes de la requête. Il est également difficile de représenter le contenu du texte à différents niveaux d'abstraction pouvant correspondre aux besoins différents d'information exprimés dans des requêtes. Avec le développement rapide des techniques d'apprentissage profond, il est possible d'apprendre des représentations utiles à travers une série de couches neurones, ce qui ouvre la voie à de meilleures représentations dans un espace dense latent plutôt que dans l'espace des termes, ce qui peut aider à identifier les termes non exactes mais qui portent les sens similaires. Il nous permet également de créer de différentes couches de représentation pour la requête et le document, permettant ainsi des correspondances entre la requête et les documents à différents niveaux d'abstractions, ce qui peut mieux répondre aux besoins d'informations pour différents types de requêtes. Enfin, les techniques d'apprentissage profond permettent également d'apprendre une meilleure fonction d'appariement. Dans cette thèse, nous explorons différentes techniques d'apprentissage profond pour traiter ces problèmes. Nous étudions d'abord la construction de plusieurs couches de représentation avec différents niveaux d'abstraction entre la requête et le document, pour des modèles basés sur la représentation et l'interaction. Nous proposons ensuite un modèle permettant de faire les matchings croisés des representations entre la requête et le document sur différentes couches pour mieux répondre au besoin de correspondance terme-phrase. Enfin, nous explorons l'apprentissage intégré d'une fonction de rang et les représentations de la requête et du document. Des expériences sur des jeux de données publics ont montré que nos méthods proposées dans cette thèse sont plus performantes que les méthodes existantes.Information Retrieval aims to find relevant documents to a query. Previously many traditional information retrieval models have been proposed. They either try to encode query and documents into vectors in term space and estimate the relevance by computing the similarity of the two vectors or estimate the relevance by probabilistic models. However for vector space models, encoding query and documents into term space has its limitations: for example, it's difficult to catch terms of similar meanings to the exact query term in the document. It is also difficult to represent the text in a hierarchy of abstractions to better match the information need expressed in the query. With the fast development of deep learning techniques, it is possible to learn useful representations through a series of neural layers, which paves the way to learn better representations in latent dense space rather the term space, which may help to match the non exact matched but similar terms. It also allows us to create different layers of representation for query and document thereby enabling matchings between query and documents at different levels of abstractions, which may better serve the information needs for different queries. Finally, deep learning techniques also allows to learn better ranking function. In this thesis, we explore several deep learning techniques to deal with the above problems. First, we study the effectiveness of building multiple abstraction layers between query and document, for representation- and interaction-based models. Then we propose a model allowing for cross-matching of query and document representations at different layers to better serve the need of term-phrase matching. Finally we propose an integrated learning framework of ranking function and neural features from query and document. Experiments on public datasets demonstrate that the methods we propose in this thesis are more effective than the existing ones

Dépôt Institutionnel Numérique

Table Search, Generation and Completion

Author: Zhang Shuo
Publication venue: University of Stavanger, Norway
Publication date: 01/01/2019
Field of study

PhD thesis in Information technologyTables are one of those “universal tools” that are practical and useful in many application scenarios. Tables can be used to collect and organize information from multiple sources and then turn that information into knowledge (and, ultimately, support decision-making) by performing various operations, like sorting, filtering, and joins. Because of this, a large number of tables exist already out there on the Web, which represent a vast and rich source of structured information that could be utilized. The focus of the thesis is on developing methods for assisting the user in completing a complex task by providing intelligent assistance for working with tables. Specifically, our interest is in relational tables, which describe a set of entities along with their attributes. Imagine the scenario that a user is working with a table, and has already entered some data in the table. Intelligent assistance can include providing recommendations for the empty table cells, searching for similar tables that can serve as a blueprint, or even generating automatically the entire a table that the user needs. The table-making task can thus be simplified into just a few button clicks. Motivated by the above scenario, we propose a set of novel tasks such as table search, table generation, and table completion. Table search is the task of returning a ranked list of tables in response to a query. Google, for instance, can now provide tables as direct answers to plenty of queries, especially when users are searching for a list of things. Figure 1.1 shows an example. Table generation is about automatically organizing entities and their attributes in a tabular format to facilitate a better overview. Table completion is concerned with the task of augmenting the input table with additional tabular data. Figure 1.2 illustrates a scenario that recommends row and column headings to populate the table with and automatically completes table values from verifiable sources. In this thesis, we propose methods and evaluation resources for addressing these tasks

NORA - Norwegian Open Research Archives

UiS Brage

Neural Networks forBuilding Semantic Models and Knowledge Graphs

Author
Publication venue: Politecnico di Torino
Publication date: 30/10/2020
Field of study

1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen677. INGEGNERIA INFORMATInoopenFutia, Giusepp

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Complex Knowledge Base Question Answering: A Survey

Author: He Gaole
Jiang Jing
Jiang Jinhao
Lan Yunshi
Wen Ji-Rong
Zhao Wayne Xin
Publication venue
Publication date: 01/01/2022
Field of study

Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.Comment: 20 pages, 4 tables, 7 figures. arXiv admin note: text overlap with arXiv:2105.1164

arXiv.org e-Print Archive

TU Delft Repository

Institutional Knowledge at Singapore Management University