5 research outputs found

    Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

    Full text link
    In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

    SMAPH: A Piggyback Approach for Entity-Linking in Web Queries

    Get PDF
    We study the problem of linking the terms of a web-search query to a semantic representation given by the set of entities (a.k.a. concepts) mentioned in it. We introduce SMAPH, a system that performs this task using the information coming from a web search engine, an approach we call “piggybacking.” We employ search engines to alleviate the noise and irregularities that characterize the language of queries. Snippets returned as search results also provide a context for the query that makes it easier to disambiguate the meaning of the query. From the search results, SMAPH builds a set of candidate entities with high coverage. This set is filtered by linking back the candidate entities to the terms occurring in the input query, ensuring high precision. A greedy disambiguation algorithm performs this filtering; it maximizes the coherence of the solution by itera- tively discovering the pertinent entities mentioned in the query. We propose three versions of SMAPH that outperform state-of-the-art solutions on the known benchmarks and on the GERDAQ dataset, a novel dataset that we have built specifically for this problem via crowd-sourcing and that we make publicly available

    Joint models for information and knowledge extraction

    Get PDF
    Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: • High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. • Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. • On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.Informations- und Wissensextraktion aus natürlichsprachlichen Texten sind Schlüsselthemen vieler wissensbassierter Anwendungen. Darunter fallen zum Beispiel Frage-Antwort-Systeme, semantische Suchmaschinen, oder Applikationen zur automatischen Zusammenfassung und zum maschinellem Lesen von Texten. Zur Lösung dieser Aufgaben müssen u.a. Teilaufgaben, wie die Erkennung und Disambiguierung benannter Entitäten, Koreferenzresolution, Relationsextraktion, Ereigniserkennung, oder Diskursparsen, durchgeführt werden. Solche Aufgaben stellen eine Herausforderung dar, da Texte natürlicher Sprache in der Regel unstrukturiert, verrauscht und mehrdeutig sind. Folgende zentrale Herausforderungen adressieren sowohl die Identifizierung und das Verknüpfen benannter Entitäten als auch das Erkennen von Beziehungen zwischen diesen Entitäten: • Hohe NERD Qualität. Die Erkennung und Disambiguierung benannter Entitäten (engl. "Named Entity Recognition and Disambiguation", kurz "NERD") wird in Extraktionspipelines in der Regel zuerst ausgeführt. Die Ergebnisse beeinflussen andere nachgelagerte Aufgaben. • Abdeckung und Qualität der Relationsextraktion. Modellbasierte Informationsextraktionsmethoden erzielen eine hohe Extraktionsqualität, bei allerdings niedriger Abdeckung. Offene Informationsextraktionsmethoden erfassen relationale Phrasen zwischen Entitäten. Allerdings leiden diese Methoden an niedriger Qualität durch mehrdeutige Entitäten und verrauschte Ausgaben. Diese Einschränkungen müssen überwunden werden. • On-the-Fly Wissensakquisition. Reale Anwendungen wie Frage-Antwort- Systeme, die Überwachung von Inhaltsströmen usw. erfordern On-the-Fly Wissensakquise. Die Entwicklung solcher ganzheitlichen Systeme stellt eine hohe Herausforderung dar, da ein hoher Durchsatz, eine hohe Extraktionsqualität sowie eine hohe Abdeckung erforderlich sind. Diese Arbeit adressiert diese Probleme und stellt neue Methoden vor, um den aktuellen Stand der Forschung zu erweitern. Diese sind: • Ein robustesModell zur integrierten Inferenz zur gemeinschaftlichen Erkennung und Disambiguierung von Entitäten. • Ein neues Modell zur Relationsextraktion und Disambiguierung von Wikipedia-ähnlichen Texten. • Ein ganzheitliches System zur Erstellung Anfrage-getriebener On-the-Fly Wissensbanken
    corecore