Search CORE

3 research outputs found

StemNet: An Evolving Service for Knowledge Networking in the Life Sciences

Author: Bajwa A.
Blasczyk R.
DeLuca D.
Hahn U.
Horn P.
Poprat M.
Wermter J.
Publication venue
Publication date: 02/05/2007
Field of study

Up until now, crucial life science information resources, whether bibliographic or factual databases, are isolated from each other. Moreover, semantic metadata intended to structure their contents is supplied in a manual form only. In the StemNet project we aim at developing a framework for semantic interoperability for these resources. This will facilitate the extraction of relevant information from textual sources and the generation of semantic metadata in a fully automatic manner. In this way, (from a computational perspective) unstructured life science documents are linked to structured biological fact databases, in particular to the identifiers of genes, proteins, etc. Thus, life scientists will be able to seamlessly access information from a homogeneous platform, despite the fact that the original information was unlinked and scattered over the whole variety of heterogeneous life science information resources and, therefore, almost inaccessible for integrated systematic search by academic, clinical, or industrial users

MPG.PuRe

A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC

Author: Akhondi Saber A.
Clematide Simon
Kors Jan A.
Rebholz-Schuhmann Dietrich
van Mulligen Erik M.
Publication venue
Publication date: 02/08/2017
Field of study

Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotate

RERO DOC Digital Library

Online assessment of protein interaction information extraction systems

Author: Leitner Florian
Publication venue
Publication date: 01/01/2012
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 01-03-201

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo