3,562 research outputs found
Ontology Population for Open-Source Intelligence
We present an approach based on GATE (General Architecture for Text Engineering) for the automatic population of ontologies from text documents. We describe some experimental results, which are encouraging in terms of extracted correct instances of the ontology. We then focus on a phase of our pipeline and discuss a variant thereof, which aims at reducing the manual effort needed to generate pre-defined dictionaries used in document annotation. Our additional experiments show promising results also in this case
Ontology population for open-source intelligence: A GATE-based solution
Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results
Recommended from our members
AQUA: an ontology driven question answering system
This paper describes AQUA our question answering over the Web. AQUA was designed to work over heterogeneous sources. This means that AQUA is equipped to work as closed domain and in addition to open-domain question answering. As a first instance, AQUA tries to answer a question using a Knowledge base. If a query cannot be satisfied over a knowledge base/database. Then, AQUA tries to find an answer on web pages (i.e. it uses as corpus the internet as resource). Our system uses NLP (Natural Language Processing), First order logic and Information Extraction technologies. AQUA has been tested using an ontology which describes academic life. Keywords Ontologies, Information Extraction, Machine Learnin
Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are
extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented
Analysing the problem and main approaches for ontology population
Knowledge systems are a suitable computational
approach to solve complex problems and to provide decision
support. Ontologies are an approach for knowledge
representation and Ontology Population looks for instantiating
the constituent elements of an ontology, like properties and
non-taxonomic relationships. Manual population by domain
experts and knowledge engineers is an expensive and time
consuming task. Thus, automatic or semi-automatic
approaches are needed. This paper discusses the problem of
Automatic Ontology Population and proposes a generic
process specifying its phases and what kind of techniques can
be used to perform the activities of each phase. Some
techniques representing the state of the art of this field are also
described along with the solutions they adopt for each phase of
the AOP process with their advantages and limitations. This
work is part of HERMES, a Brazil/Portugal research
cooperation project looking for techniques and tools for
automating the process of ontology learning and population.This work is supported by CNPq, CAPES and FAPEMA, research funding agencies of the Brazilian government
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
- …