12 research outputs found

    Development of an ontology construction component for the OBCIE (ontology-based components for information extraction) approach

    No full text
    Information extraction systems identify and retrieve certain types of information from natural language text. A recent development in the field of information extraction is the emergence of ontology-based information extraction as a sub-filed, where ontologies are used to guide the information extraction process and to present the extracted information. One of the challenges faced by fields of ontology-based information extraction and information extraction is the difficulty of reuse of prior work in developing new systems. A component-based approach for information extraction named OBCIE (Ontology-Based Components for Information Extraction) has been previously developed to address this issue. This paper presents the progress in developing an ontology construction component for the OBCIE approach, which identifies classes and relationships for a given domain. It is centered on discovering the information contained within the loose structure of Wikipedia pages

    Lexical Enrichment and Sense Disambiguation of Ontology Concepts

    No full text
    This paper presents a model to measure semantic similarity between custom ontology concepts and the taxonomy of WordNet and introduces a new ontology concept similarity measure. The similarity measure is based on a measure of weighted overlap of semantic cotopy of a concept in two taxonomies. The model can be applied to automatically enhance the vocabulary of terms in ontologies embedding equivalence classes of terms and other linguistic information directly in the ontology. This model is applied to the products and services domain where a Product Ontology is lexically enhanced and the effectiveness of the model is evaluated

    Lexical enrichment and sense disambiguation of ontology concepts

    No full text
    This paper presents a model to measure semantic similarity between custom ontology concepts and the taxonomy of WordNet and introduces a new ontology concept similarity measure. The similarity measure is based on a measure of weighted overlap of semantic cotopy of a concept in two taxonomies. The model can be applied to automatically enhance the vocabulary of terms in ontologies embeddingequivalence classes of terms and other linguistic information directly in the ontology. This model is applied to the products and services domain where a Product Ontology is lexically enhanced and the effectiveness of the model is evaluated

    Ontology-based information extraction and reservoir computing for topic detection from blogosphere's content : a case study about BBC backstage

    No full text
    This research study aims at detecting topics and extracting themes(subtopics) from the blogosphere’s content while bridging the gap between the Social Web and the Semantic Web. The goal is to detect certain types of information from collections of blogs’ and microblogs’ narratives that lack explicit semantics. The approach presented introduces a novel approach that blends together two young paradigms: Ontology-Based Information Extraction (OBIE) and Reservoir Computing (RC). The novelty of the work lies in integrating ontologies and RC as well as the pioneering use of RC with social media data. Experiments with retrospect data from blogs and Twitter microblogs provide valuable insights into the BBC Backstage initiative and prove the viability of the approach presented in terms of scalability, computational complexity,and performance

    Building a WordNet for Sinhala

    No full text
    Sinhala is one of the official languages of Sri Lanka and is used by over 19 million people. It belongs to the Indo-Aryan branch of the In-do-European languages and its origins date back to at least 2000 years. It has developed into its current form over a long period of time with influences from a wide variety of lan-guages including Tamil, Portuguese and Eng-lish. As for any other language, a WordNet is extremely important for Sinhala to take it into the digital era. This paper is based on the pro-ject to develop a WordNet for Sinhala based on the English (Princeton) WordNet. It de-scribes how we overcame the challenges in adding Sinhala specific characteristics which were deemed important by Sinhala language experts to the WordNet while keeping the structure of the original English WordNet. It also presents the details of the crowdsourcing system we developed as a part of the project - consisting of a NoSQL database in the backend and a web-based frontend. We con-clude by discussing the possibility of adapting this architecture for other languages and the road ahead for the Sinhala WordNet and Sin-hala NLP

    A framework for automatic population of ontology-based digital libraries

    No full text
    Maintaining updated ontology-based digital libraries faces two main issues. First, documents are often unstructured and in heterogeneous data formats, making it even more difficult to extract information and search in. Second, manual ontology population is time consuming and therefore automatic methods to support this process are needed. In this paper, we present an ontology-based framework aiming at populating ontologies. In particular, we propose an approach for triplet extraction from heterogeneous and unstructured documents in order to automatically populate ontology-based digital libraries. Finally, we evaluate the proposed framework on a real world case study