426 research outputs found

    Ontology-based question answering systems over knowledge bases: a survey

    Get PDF
    Searching relevant, specific information in big data volumes is quite a challenging task. Despite the numerous strategies in the literature to tackle this problem, this task is usually carried out by resorting to a Question Answering (QA) systems. There are many ways to build a QA system, such as heuristic approaches, machine learning, and ontologies. Recent research focused their efforts on ontology-based methods since the resulting QA systems can benefit from knowledge modeling. In this paper, we present a systematic literature survey on ontology-based QA systems regarding any questions. We also detail the evaluation process carried out in these systems and discuss how each approach differs from the others in terms of the challenges faced and strategies employed. Finally, we present the most prominent research issues still open in the field

    Ontology-based Information Extraction with SOBA

    Get PDF
    In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities

    PowerAqua: fishing the semantic web

    Get PDF
    The Semantic Web (SW) offers an opportunity to develop novel, sophisticated forms of question answering (QA). Specifically, the availability of distributed semantic markup on a large scale opens the way to QA systems which can make use of such semantic information to provide precise, formally derived answers to questions. At the same time the distributed, heterogeneous, large-scale nature of the semantic information introduces significant challenges. In this paper we describe the design of a QA system, PowerAqua, designed to exploit semantic markup on the web to provide answers to questions posed in natural language. PowerAqua does not assume that the user has any prior information about the semantic resources. The system takes as input a natural language query, translates it into a set of logical queries, which are then answered by consulting and aggregating information derived from multiple heterogeneous semantic sources

    Learning to harvest information for the semantic web

    Get PDF
    This work was carried out within the AKT project (www.aktors.org), sponsored by the UK Engineering and Physical Sciences Research Council (grant GR/N15764/01), and the Dot.Kom project (www.dot-kom.org), sponsored by the EU IST asp part of Framework V (grant IST-2001-34038).In this paper we describe a methodology for harvesting in- formation from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. An- notated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.peer-reviewe

    Grouping axioms for more coherent ontology descriptions

    Get PDF
    Ontologies and datasets for the Semantic Web are encoded in OWL formalisms that are not easily comprehended by people. To make ontologies accessible to human domain experts, several research groups have developed ontology verbalisers using Natural Language Generation. In practice ontologies are usually composed of simple axioms, so that realising them separately is relatively easy; there remains however the problem of producing texts that are coherent and efficient. We describe in this paper some methods for producing sentences that aggregate over sets of axioms that share the same logical structure. Because these methods are based on logical structure rather than domain-specific concepts or language-specific syntax, they are generic both as regards domain and language

    fr2sql : Interrogation de bases de données en français

    No full text
    National audienceDatabases are increasingly common and are becoming increasingly important in actual applications and Web sites. They often used by people who do not have great competence in this domain and who do not know exactly their structure. This is why translators from natural language to SQL queries are developed. Unfortunately, most of these translators is confined to a single database due to the specificity of the base architecture. In this paper, we propose a method to query any database from french. We evaluate our application on two different databases and we also show that it supports more operations than most other translators.Les bases de données sont de plus en plus courantes et prennent de plus en plus d'ampleur au sein des applications et sites Web actuels. Elles sont souvent amenées à être utilisées par des personnes n'ayant pas une grande compétence en la matière et ne connaissant pas rigoureusement leur structure. C'est pour cette raison que des traducteurs du langage naturel aux requêtes SQL sont développés. Malheureusement, la plupart de ces traducteurs se cantonnent à une seule base du fait de la spécificité de l'architecture de celle-ci. Dans cet article, nous proposons une méthode visant à pouvoir interroger n'importe quelle base de données à partir de questions en français. Nous évaluons notre application sur deux bases à la structure différente et nous montrons également qu'elle supporte plus d'opérations que la plupart des autres traducteurs

    Computer Supported Indexing: A History and Evaluation of NASA's MAI System

    Get PDF
    Computer supported indexing systems may be categorized in several ways. One classification scheme refers to them as statistical, syntactic, semantic or knowledge-based. While a system may emphasize one of these aspects, most systems actually combine two or more of these mechanisms to maximize system efficiency. Statistical systems can be based on counts of words or word stems, statistical association, and correlation techniques that assign weights to word locations or provide lexical disambiguation, calculations regarding the likelihood of word co-occurrences, clustering of word stems and transformations, or any other computational method used to identify pertinent terms. If words are counted, the ones of median frequency become candidate index terms. Syntactical systems stress grammar and identify parts of speech. Concepts found in designated grammatical combinations, such as noun phrases, generate the suggested terms. Semantic systems are concerned with the context sensitivity of words in text. The primary goal of this type of indexing is to identify without regard to syntax the subject matter and the context-bearing words in the text being indexed. Knowledge-based systems provide a conceptual network that goes past thesaurus or equivalent relationships to knowing (e.g., in the National Library of Medicine (NLM) system) that because the tibia is part of the leg, a document relating to injuries to the tibia should he indexed to LEG INJURIES, not the broader MeSH term INJURIES, or knowing that the term FEMALE should automatically be added when the term PREGNANCY is assigned, and also that the indexer should be prompted to add either HUMAN or ANIMAL. Another way of categorizing indexing systems is to identify them as producing either assigned- or derived-term indexes

    Computer Supported Indexing: A History and Evaluation of NASA's MAI System

    Get PDF
    Computer supported or machine aided indexing (MAI) can be categorized in multiple ways. The system used by the National Aeronautics and Space Administration's (NASA's) Center for AeroSpace Information (CASI) is described as semantic and computational. It's based on the co-occurrence of domain-specific terminology in parts of a sentence, and the probability that an indexer will assign a particular index term when a given word or phrase is encountered in text. The NASA CASI system is run on demand by the indexer and responds in 3 to 9 seconds with a list of suggested, authorized terms. The system was originally based on a syntactic system used in the late 1970's by the Defense Technical Information Center (DTIC). The NASA mainframe-supported system consists of three components: two programs and a knowledge base (KB). The evolution of the system is described and flow charts illustrate the MAI procedures. Tests used to evaluate NASA's MAI system were limited to those that would not slow production. A very early test indicated that MAI saved about 3 minutes and provided several additional terms for each document indexed. It also was determined that time and other resources spent in careful construction of the KB pay off with high-quality output and indexer acceptance of MAI results

    Composite Semantic Relation Classification

    Get PDF
    Different semantic interpretation tasks such as text entailment and question answering require the classification of semantic relations between terms or entities within text. However, in most cases it is not possible to assign a direct semantic relation between entities/terms. This paper proposes an approach for composite semantic relation classification, extending the traditional semantic relation classification task. Different from existing approaches, which use machine learning models built over lexical and distributional word vector features, the proposed model uses the combination of a large commonsense knowledge base of binary relations, a distributional navigational algorithm and sequence classification to provide a solution for the composite semantic relation classification problem
    • …
    corecore