Search CORE

494 research outputs found

Ontology Population via NLP Techniques in Risk Management

Author: Anne-Marie Alquier
Jawad Makki
Violaine Prince
Publication venue
Publication date
Field of study

In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency . The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic Analysis

Research Papers in Economics

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Recommended from our members

HOLMES: A Hybrid Ontology-Learning Materials Engineering System

Author: Remolona Miguel Francisco Miravite
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Designing and discovering novel materials is challenging problem in many domains such as fuel additives, composites, pharmaceuticals, and so on. At the core of all this are models that capture how the different domain-specific data, information, and knowledge regarding the structures and properties of the materials are related to one another. This dissertation explores the difficult task of developing an artificial intelligence-based knowledge modeling environment, called Hybrid Ontology-Learning Materials Engineering System (HOLMES) that can assist humans in populating a materials science and engineering ontology through automatic information extraction from journal article abstracts. While what we propose may be adapted for a generic materials engineering application, our focus in this thesis is on the needs of the pharmaceutical industry. We develop the Columbia Ontology for Pharmaceutical Engineering (COPE), which is a modification of the Purdue Ontology for Pharmaceutical Engineering. COPE serves as the basis for HOLMES. The HOLMES framework starts with journal articles that are in the Portable Document Format (PDF) and ends with the assignment of the entries in the journal articles into ontologies. While this might seem to be a simple task of information extraction, to fully extract the information such that the ontology is filled as completely and correctly as possible is not easy when considering a fully developed ontology. In the development of the information extraction tasks, we note that there are new problems that have not arisen in previous information extraction work in the literature. The first is the necessity to extract auxiliary information in the form of concepts such as actions, ideas, problem specifications, properties, etc. The second problem is in the existence of multiple labels for a single token due to the existence of the aforementioned concepts. These two problems are the focus of this dissertation. In this work, the HOLMES framework is presented as a whole, describing our successful progress as well as unsolved problems, which might help future research on this topic. The ontology is then presented to help in the identification of the relevant information that needs to be retrieved. The annotations are next developed to create the data sets necessary for the machine learning algorithms to perform. Then, the current level of information extraction for these concepts is explored and expanded. This is done through the introduction of entity feature sets that are based on previously extracted entities from the entity recognition task. And finally, the new task of handling multiple labels for tagging a single entity is also explored by the use of multiple-label algorithms used primarily in image processing

Columbia University Academic Commons

Textpresso for Neuroscience: Searching the Full Text of Thousands of Neuroscience Research Papers

Author: Mueller Hans-Michael
Rangarajan Arun
Sternberg Paul W.
Teal Tracy K.
Publication venue: Humana Press Inc.
Publication date: 01/01/2008
Field of study

Textpresso is a text-mining system for scientific literature. Its two major features are access to the full text of research papers and the development and use of categories of biological concepts as well as categories that describe or relate objects. A search engine enables the user to search for one or a combination of these categories and/or keywords within an entire literature. Here we describe Textpresso for Neuroscience, part of the core Neuroscience Information Framework (NIF). The Textpresso site currently consists of 67,500 full text papers and 131,300 abstracts. We show that using categories in literature can make a pure keyword query more refined and meaningful. We also show how semantic queries can be formulated with categories only. We explain the build and content of the database and describe the main features of the web pages and the advanced search options. We also give detailed illustrations of the web service developed to provide programmatic access to Textpresso. This web service is used by the NIF interface to access Textpresso. The standalone website of Textpresso for Neuroscience can be accessed at http://www.textpresso.org/neuroscience

Springer - Publisher Connector

Caltech Authors

Building Quranic stories ontology using MappingMaster domain-specific language

Author: Abdullah Abdulhussein Mohsin
Alsalhee Rusul Yousif
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2022
Field of study

The Holy Quran, due to it is full of many inspiring stories and multiple lessons that need to understand it requires additional attention when it comes to searching issues and information retrieval. Many works were carried out in the Holy Quran field, but some of these dealt with a part of the Quran or covered it in general, and some of them did not support semantic research techniques and the possibility of understanding the Quranic knowledge by the people and computers. As for others, techniques of data analysis, processing, and ontology were adopted, which led to directed these to linguistic aspects more than semantic. Another weakness in the previous works, they have adopted the method manually entering ontology, which is costly and time-consuming. In this paper, we constructed the ontology of Quranic stories. This ontology depended in its construction on the MappingMaster domain-specific language (MappingMaster DSL)technology, through which concepts and individuals can be created and linked automatically to the ontology from Excel sheets. The conceptual structure was built using the object role modeling (ORM) modeling language. SPARQL query language used to test and evaluate the propsed ontology by asking many competency questions and as a result, the ontology answered all these questions well

ZENODO

Institute of Advanced Engineering and Science

Information extraction from medication leaflets

Author: Aguiar Bruno Lage
Publication venue
Publication date: 01/01/2012
Field of study

Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

Social and Semantic Web Technologies for the Text-to-Knowledge Translation Process in Biomedicine

Author: Alberto Labarga
Armando Blanco
Carlos Cano
Leonid Peshkin
Publication venue: 'IntechOpen'
Publication date: 08/01/2011
Field of study

IntechOpen

Social and Semantic Web Technologies for the Text-To-Knowledge Translation Process in Biomedicine

Author: Blanco Morón Armando
Cano Gutiérrez Carlos
Labarga Alberto
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

Currently, biomedical research critically depends on knowledge availability for flexible re-analysis and integrative post-processing. The voluminous biological data already stored in databases, put together with the abundant molecular data resulting from the rapid adoption of high-throughput techniques, have shown the potential to generate new biomedical discovery through integration with knowledge from the scientific literature. Reliable information extraction applications have been a long-sought goal of the biomedical text mining community. Both named entity recognition and conceptual analysis are needed in order to map the objects and concepts represented by natural language texts into a rigorous encoding, with direct links to online resources that explicitly expose those concepts semantics (see Figure 1).P08-TIC-4299 of J. ASevilla and TIN2009-13489 of DGICT, Madri

Repositorio Institucional Universidad de Granada

SCRE:special cargo relation extraction using representation learning

Author: Akcay Alp
de Jong Eelco
Reshadat Vahideh
Zervanou Kalliopi
Zhang Yingqian
Publication venue
Publication date: 01/09/2023
Field of study

The airfreight industry of shipping goods with special handling needs, also known as special cargo, often deals with non-transparent data and outdated technology, resulting in significant inefficiency. A special cargo ontology is a means of extracting, structuring, and storing domain knowledge and representing the concepts and relationships that can be processed by computers. This ontology can be used as the base of semantic data retrieval in many artificial intelligence applications, such as planning for special cargo shipments. Domain information extraction is an essential task in implementing and maintaining special cargo ontology. However, the absence of domain information makes instantiating the cargo ontology challenging. We propose a relation representation learning approach based on a hierarchical attention-based multi-task model and leverage it in the special cargo domain. The proposed relation representation learning architecture is applied for identifying and categorizing samples of various relation types in the special cargo ontology. The model is trained with domain-specific documents on a number of semantic tasks that vary from lightweight tasks in the bottom layers to the heavyweight tasks in the top layers of the model in a hierarchical setting. Therefore, it conveys complementary input features and learns a rich representation. We also train a domain-specific relation representation model that relies only on an entity-linked corpus of cargo shipment domain. These two relation representation models are then employed in a supervised multi-class classifier called Special Cargo Relation Extractor (SCRE). The results of the experiments show that the proposed relation representation models can represent the complex semantic information of the special cargo domain efficiently.</p

Pure OAI Repository