BACKGROUND: Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive. METHOD: We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology. RESULTS: We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances. CONCLUSION: We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels

Alexandre Leopoldo Gonçalves

Andrey Rzhetsky

Flávio Ceci

Ricardo Pietrobon

English

PubMed

Directory of Open Access Journals

PLoS ONE

Turning text into research networks: information retrieval and computational ontologies in the creation of scientific databases.

Background: Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive. Method: We focused on a large corpus containing information on researchers, research fields, and institutions. We  ased our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology. Results: We present a prototype demonstrating the applicability of the proposed strategy, along with a case study
describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances. Conclusion: We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of freetext information available at the institutional and national levels

Ceci, Flavio

Pietrobon, Ricardo

Gonçalves, Alexandre

RIUNI Institucional Repository

Turning text into research networks: information retrieval and computational ontologies in the creation of scientific Databases

Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive.We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology.We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances.We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels

Ceci Flávio

Pietrobon Ricardo

Gonçalves Alexandre Leopoldo

Public Library of Science (PLOS)

Turning Text into Research Networks: Information Retrieval and Computational Ontologies in the Creation of Scientific Databases

<div><h3>Background</h3><p>Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive.</p> <h3>Method</h3><p>We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology.</p> <h3>Results</h3><p>We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances.</p> <h3>Conclusion</h3><p>We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels.</p> </div

Flávio Ceci (189528)

Ricardo Pietrobon (279483)

Alexandre Leopoldo Gonçalves (189535)

FigShare

Crossref

A Concept-Driven Algorithm for Clustering Search Results.

A flexible framework to experiment with ontology learning techniques.

A Hybrid Approach for Taxonomy Learning from Text. COMPSTAT

A knowledge-based approach for retrieving scenario-specific medical text documents.

Advancing Topic Ontology Learning through Term Extraction.

Automated Ontology Learning and Validation Using Hypothesis Testing.

Available: http://www.oracle.com/,

Base Line Information Extraction: Multilingual Information Extraction from Text with Machine Learning and Natural Languages Techniques.

Bootstrapping named entity recognition with automatically generated gazetteer lists. EACL student session.

Data mining: practical machine learning tools and techniques.

ESpotter: Adaptive Named Entity Recognition for Web Browsing.

Evaluation of OntoLearn, a methodology for automatic learning of domain ontologies.

LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval.

Microsoft office website. Available: http://office.microsoft.com/en-us/access/,

Ontology learning and population from text: algorithms, evaluation and Applications.

Ontology. Entry in the Encyclopedia of Database Systems, Ling Liu and M. Tamer O ¨ zsu.

Semantic web for the working ontologist: modeling

Text2Onto - A Framework for Ontology Learning and Data-Driven Change Discovery.’’ Natural Language Processing and Information Systems.

The Lattes platform. Available: http://lattes.cnpq.br/english/index.htm,

Using Wikipedia for Automatic Word Sense Disambiguation.

Web Consortium (W3C) website. Available: http://www.w3.org/ TR/rdf-concepts Accessed

Web Semantics Ontology. Idea Group Inc (IGI),

file:///data/core-remote/dit/data/public_library_of_science/pdf/e6d/ZnRwOi8vZnRwLm5jYmkubmxtLm5paC5nb3YvcHViL3BtYy9vYV9wYWNrYWdlL2Q1LzMyL1BNQzMyNTAzOTIudGFyLmd6.pdf

Turning Text into Research Networks: Information Retrieval and Computational Ontologies in the Creation of Scientific Databases

Abstract

Similar works

Full text

Available Versions

Directory of Open Access Journals

RIUNI Institucional Repository

Public Library of Science (PLOS)

Public Library of Science (PLOS)

FigShare

Crossref