Search CORE

762 research outputs found

PowerAqua: fishing the semantic web

Author: A.Y. Levy
D. Mc Guinness
E. Mena
E. Rahm
F. Giunchiglia
N. Ide
P. Resnik
T. Berners-Lee
Publication venue
Publication date: 01/01/2006
Field of study

The Semantic Web (SW) offers an opportunity to develop novel, sophisticated forms of question answering (QA). Specifically, the availability of distributed semantic markup on a large scale opens the way to QA systems which can make use of such semantic information to provide precise, formally derived answers to questions. At the same time the distributed, heterogeneous, large-scale nature of the semantic information introduces significant challenges. In this paper we describe the design of a QA system, PowerAqua, designed to exploit semantic markup on the web to provide answers to questions posed in natural language. PowerAqua does not assume that the user has any prior information about the semantic resources. The system takes as input a natural language query, translates it into a set of logical queries, which are then answered by consulting and aggregating information derived from multiple heterogeneous semantic sources

CiteSeerX

Crossref

Open Research Online (The Open University)

Recommended from our members

Analysis of Statistical Question Classification for Fact-based Questions

Author: Metzler Donald
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2004
Field of study

Question classication systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classication is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious eort to create and often suer from being too specic. Statistical question classication methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role dierent syntactic and semantic features have on performance. We nd that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassication error and provide insight into ways they may be overcome

ScholarWorks@UMass Amherst

Taxonomy Induction using Hypernym Subsequences

Author: Biemann Chris
Cram Damien
Grefenstette Gregory
Gupta Amit
Kozareva Zornitsa
Nastase Vivi
Oakes Michael P
Ponzetto S.
Ponzetto Simone Paolo
Snow Rion
Publication venue
Publication date: 05/05/2017
Field of study

We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof- the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, no previous approaches have been empirically proven to manifest noise-robustness in the input vocabulary

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Smoothing Entailment Graphs with Language Models

Author: Johnson Mark
Li Tianyi
McKenna Nick
Steedman Mark
Publication venue
Publication date: 21/09/2023
Field of study

The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity in Entailment Graphs (EGs) built by Open Relation Extraction (ORE). EGs are computationally efficient and explainable models of natural language inference, but as symbolic models, they fail if a novel premise or hypothesis vertex is missing at test-time. We present theory and methodology for overcoming such sparsity in symbolic models. First, we introduce a theory of optimal smoothing of EGs by constructing transitive chains. We then demonstrate an efficient, open-domain, and unsupervised smoothing method using an off-the-shelf Language Model to find approximations of missing premise predicates. This improves recall by 25.1 and 16.3 percentage points on two difficult directional entailment datasets, while raising average precision and maintaining model explainability. Further, in a QA task we show that EG smoothing is most useful for answering questions with lesser supporting text, where missing premise predicates are more costly. Finally, controlled experiments with WordNet confirm our theory and show that hypothesis smoothing is difficult, but possible in principle.Comment: Published at AACL 202

arXiv.org e-Print Archive