12 research outputs found
Development of an ontology construction component for the OBCIE (ontology-based components for information extraction) approach
Information extraction systems identify and retrieve certain types of information from natural language text. A recent development in the field of information extraction is the emergence of ontology-based information extraction as a sub-filed, where ontologies are used to guide the information extraction process and to present the extracted information.
One of the challenges faced by fields of ontology-based information extraction and information extraction is the difficulty of reuse of prior work in developing new systems. A component-based approach for information extraction named OBCIE (Ontology-Based Components for Information Extraction) has been previously developed to address this issue. This paper presents the progress in developing an ontology construction component for the OBCIE approach, which identifies classes and relationships for a given domain. It is centered on discovering the information contained within the loose
structure of Wikipedia pages
Lexical Enrichment and Sense Disambiguation of Ontology Concepts
This paper presents a model to measure semantic
similarity between custom ontology concepts and the
taxonomy of WordNet and introduces a new ontology concept
similarity measure. The similarity measure is based on a
measure of weighted overlap of semantic cotopy of a concept
in two taxonomies. The model can be applied to automatically
enhance the vocabulary of terms in ontologies embedding
equivalence classes of terms and other linguistic information
directly in the ontology. This model is applied to the products
and services domain where a Product Ontology is lexically
enhanced and the effectiveness of the model is evaluated
Lexical enrichment and sense disambiguation of ontology concepts
This paper presents a model to measure semantic
similarity between custom ontology concepts and the
taxonomy of WordNet and introduces a new ontology concept
similarity measure. The similarity measure is based on a
measure of weighted overlap of semantic cotopy of a concept
in two taxonomies. The model can be applied to automatically
enhance the vocabulary of terms in ontologies embeddingequivalence
classes of terms and other linguistic information
directly in the ontology. This model is applied to the products
and services domain where a Product Ontology is lexically
enhanced and the effectiveness of the model is evaluated
Ontology-based information extraction and reservoir computing for topic detection from blogosphere's content : a case study about BBC backstage
This research study aims at detecting topics and extracting themes(subtopics) from the blogosphere’s content while bridging the gap between the Social Web and the Semantic Web. The goal is to detect certain types of information from collections of blogs’ and microblogs’ narratives that lack explicit semantics. The approach presented introduces a novel approach that blends together two young paradigms: Ontology-Based Information Extraction (OBIE) and Reservoir Computing (RC). The novelty of the work lies in integrating ontologies and RC as well as the pioneering use of RC with social media data. Experiments with retrospect data from blogs and Twitter microblogs provide valuable insights into the BBC Backstage initiative and prove the viability of the approach presented in terms of scalability, computational complexity,and performance
Building a WordNet for Sinhala
Sinhala is one of the official languages of Sri Lanka and is used by over 19 million people. It belongs to the Indo-Aryan branch of the In-do-European languages and its origins date back to at least 2000 years. It has developed into its current form over a long period of time with influences from a wide variety of lan-guages including Tamil, Portuguese and Eng-lish. As for any other language, a WordNet is extremely important for Sinhala to take it into the digital era. This paper is based on the pro-ject to develop a WordNet for Sinhala based on the English (Princeton) WordNet. It de-scribes how we overcame the challenges in adding Sinhala specific characteristics which were deemed important by Sinhala language experts to the WordNet while keeping the structure of the original English WordNet. It also presents the details of the crowdsourcing system we developed as a part of the project - consisting of a NoSQL database in the backend and a web-based frontend. We con-clude by discussing the possibility of adapting this architecture for other languages and the road ahead for the Sinhala WordNet and Sin-hala NLP
SkipCor: Skip-Mention Coreference Resolution Using Linear-Chain Conditional Random Fields
Applying Information Extraction for Abstracting and Automating CLI-Based Configuration of Network Devices in Heterogeneous Environments
A framework for automatic population of ontology-based digital libraries
Maintaining updated ontology-based digital libraries faces two main issues. First, documents are often unstructured and in heterogeneous data formats, making it even more difficult to extract information and search in. Second, manual ontology population is time consuming and therefore automatic methods to support this process are needed. In this paper, we present an ontology-based framework aiming at populating ontologies. In particular, we propose an approach for triplet extraction from heterogeneous and unstructured documents in order to automatically populate ontology-based digital libraries. Finally, we evaluate the proposed framework on a real world case study