Search CORE

1,293 research outputs found

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Information Retrieval Systems Adapted to the Biomedical Domain

Author: Ai Kawazoe
Alexa McCray
Alexander Morgan
Christian Jacquemin
Cohen Aaron
Cornelius Rosse
GuoDong Zhou
Hamish Cunningham
Holger Stenzhorn
Hongfang Liu
Irena Spasic
Larisa Soldatova
Matthias Samwald
Poibeau Thierry
Ricardo Baeza-Yates
Robert Gaizauskas
Shipra Dingare
Sophia Ananiadou
Steffen Schulze-Kremer
Ulf Leser
Yoshimasa Tsuruoka
Publication venue: 'Ediciones Profesionales de la Informacion SL'
Publication date: 01/05/2010
Field of study

The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of the techniques used in this domain, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources.Comment: 6 pages, 4 table

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Temaria - Revistas digitales de biblioteconomía y documentación

Universidad Carlos III de Madrid e-Archivo

Reuse of terminological resources for efficient ontological engineering in Life Sciences

Author: A Anjum
A Jimeno-Yepes
A Jimeno-Yepes
A Kalyanpur
A Miles
A Tsymbal
Antonio Jimeno-Yepes
B Cuenca Grau
C Caracciolo
C Rosse
CM Duffy
Dietrich Rebholz-Schuhmann
E Beisswanger
E Jimenez-Ruiz
EM Ogilvie
Ernesto Jiménez-Ruiz
G Hirst
HS Pinto
I Horrocks
I Spasiæ
J Freund
K Frantzi
M Fernandez
O Bodenreider
O Bodenreider
P Bouquet
P Lambrix
P Shvaiko
R Berlanga
R Berlanga-Llavori
Rafael Berlanga-Llavori
S Schlobach
S Zillner
T Hauer
TR Gruber
V Nebot
V Viswanath
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

This paper is intended to explore how to use terminological resources for ontology engineering. Nowadays there are several biomedical ontologies describing overlapping domains, but there is not a clear correspondence between the concepts that are supposed to be equivalent or just similar. These resources are quite precious but their integration and further development are expensive. Terminologies may support the ontological development in several stages of the lifecycle of the ontology; e.g. ontology integration. In this paper we investigate the use of terminological resources during the ontology lifecycle. We claim that the proper creation and use of a shared thesaurus is a cornerstone for the successful application of the Semantic Web technology within life sciences. Moreover, we have applied our approach to a real scenario, the Health-e-Child (HeC) project, and we have evaluated the impact of filtering and re-organizing several resources. As a result, we have created a reference thesaurus for this project, named HeCTh

Crossref

City Research Online

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

PubMed Central

Repositori Institucional de la Universitat Jaume I

Oxford University Research Archive

University of Melbourne Institutional Repository

Biomedical ontology alignment: An approach based on representation learning

Author: Kalousis Alexandros
Kiritsis Dimitris
Kolyvakis Prodromos
Smith Barry
Publication venue
Publication date: 01/01/2018
Field of study

While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results

PhilPapers

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Directory of Open Access Journals

Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

Author: Liu Kaihong
Publication venue
Publication date: 03/01/2012
Field of study

While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

D-Scholarship@Pitt

$OntoMath^{PRO}$ Ontology: A Linked Data Hub for Mathematics

Author: C. Bizer
C. David
C. Lange
C. Lange
E. Sirin
E.V. Biryaltsev
F. Kamareddine
H. Barendregt
H.S. Barrows
M. Doerr
M. Kohlhase
N. Sloane
O. Nevzorova
O.A. Nevzorova
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.Comment: 15 pages, 6 images, 1 table, Knowledge Engineering and the Semantic Web - 5th International Conferenc

arXiv.org e-Print Archive

Crossref

A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain

Author: Tudhope Douglas
Vlachidis Andreas
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

The paper presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. Performance is evaluated via the Gold Standard method. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries.Relation Extraction performance benefits from a syntactic based definition of relation extraction patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive NLP modules relating to word-sense disambiguation, negation detection and noun phrase validation, together with controlled thesaurus expansion.The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven relation extraction rules for the recognition of semantic relationships from phrases of unstructured text. The semantic annotations have proven capable of supporting semantic query, document study and cross-searching via the ontology framework

Crossref

University of South Wales Research Explorer

UWE Bristol Research Repository

UCL Discovery

A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Author: Aubin Sophie
Derivière Julien
Hamon Thierry
Nazarenko Adeline
Poibeau Thierry
Publication venue
Publication date: 30/05/2007
Field of study

Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together

arXiv.org e-Print Archive

CiteSeerX

HAL-Paris 13

Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature

Author: Vlachidis Andreas
Publication venue: University of Glamorgan
Publication date: 01/07/2012
Field of study

The volume of archaeological reports being produced since the introduction of PG161 has significantly increased, as a result of the increased volume of archaeological investigations conducted by academic and commercial archaeology. It is highly desirable to be able to search effectively within and across such reports in order to find information that promotes quality research. A potential dissemination of information via semantic technologies offers the opportunity to improve archaeological practice, not only by enabling access to information but also by changing how information is structured and the way research is conducted. This thesis presents a method for automatic semantic indexing of archaeological greyliterature reports using rule-based Information Extraction techniques in combination with domain-specific ontological and terminological resources. This semantic annotation of contextual abstractions from archaeological grey-literature is driven by Natural Language Processing (NLP) techniques which are used to identify “rich” meaningful pieces of text, thus overcoming barriers in document indexing and retrieval imposed by the use of natural language. The semantic annotation system (OPTIMA) performs the NLP tasks of Named Entity Recognition, Relation Extraction, Negation Detection and Word Sense disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the ISO Standard (ISO 21127:2006) CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH, together with concepts from English Heritage thesauri and glossaries. The results demonstrate that the techniques can deliver semantic annotations of archaeological grey literature documents with respect to the domain conceptual models. Such semantic annotations have proven capable of supporting semantic query, document study and cross-searching via web based applications. The research outcomes have provided semantic annotations for the Semantic Technologies for Archaeological Resources (STAR) project, which explored the potential of semantic technologies in the integration of archaeological digital resources. The thesis represents the first discussion on the employment of CIDOC CRM and CRM-EH in semantic annotation of grey-literature documents using rule-based Information Extraction techniques driven by a supplementary exploitation of domain-specific ontological and terminological resources. It is anticipated that the methods can be generalised in the future to the broader field of Digital Humanities

University of South Wales Research Explorer

UCL Discovery

Alignment of the UMLS semantic network with BioTop: Methodology and assessment

Author: Ashburner
E. Beisswanger
E. M. van Mulligen
Heja
L. van den Hoek
McCray
Mons
Mulder
O. Bodenreider
S. Schulz
Sagotsky
Schulz
Smith
Smith
Stevens
Stix
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2009
Field of study

Motivation: For many years, the Unified Medical Language System (UMLS) semantic network (SN) has been used as an upper-level semantic framework for the categorization of terms from terminological resources in biomedicine. BioTop has recently been developed as an upper-level ontology for the biomedical domain. In contrast to the SN, it is founded upon strict ontological principles, using OWL DL as a formal representation language, which has become standard in the semantic Web. In order to make logic-based reasoning available for the resources annotated or categorized with the SN, a mapping ontology was developed aligning the SN with BioTop. Methods: The theoretical foundations and the practical realization of the alignment are being described, with a focus on the design decisions taken, the problems encountered and the adaptations of BioTop that became necessary. For evaluation purposes, UMLS concept pairs obtained from MEDLINE abstracts by a named entity recognition system were tested for possible semantic relationships. Furthermore, all semantic-type combinations that occur in the UMLS Metathesaurus were checked for satisfiability. Results: The effort-intensive alignment process required major design changes and enhancements of BioTop and brought up s

Crossref

PubMed Central

EUR Research Repository

Erasmus University Digital Repository