127 research outputs found
Recommended from our members
BioC: a minimalist approach to interoperability for biomedical text processing
A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm
Background: Shared tasks and community challenges represent key instruments to promote research, collaboration
and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks
relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually
labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid
growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a
pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents
in real time. To address this pressing need, a novel track called âTechnical interoperability and performance of annotation
serversâ was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track
was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online
biomedical named entity recognition systems of interest for medicinal chemistry applications.
Results: A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned
predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation
platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format
considerations, technical metrics and functional specifications. Participating annotation servers were implemented
in seven different programming languages and covered 12 general entity types. The continuous evaluation of server
responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502
requests from three different document provider settings. The median response time was below 3.74 s, with a median
of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over
100,000 requests in a 5-day period.
Conclusions: The presented track was a novel experimental task that systematically evaluated the technical performance
aspects of online entity recognition systems. It raised the interest of a significant number of participants.
Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text
documents.Portuguese Foundation for Science and Technology | Ref. UID/BIO/04469/2013Portuguese Foundation for Science and Technology | Ref. COMPETE 2020 (POCI-01-0145-FEDER-006684)Xunta de Galicia | Ref. ED431C2018/55-GRCEuropean Commission | Ref. H2020, n. 65402
The biomedical abbreviation recognition and resolution (BARR) track: Benchmarking, evaluation and importance of abbreviation recognition systems applied to Spanish biomedical abstracts
Healthcare professionals are generating a substantial volume of clinical data in narrative form. As healthcare providers are confronted with serious time constraints, they frequently use telegraphic phrases, domain-specific abbreviations and shorthand notes. Efficient clinical text processing tools need to cope with the recognition and resolution of abbreviations, a task that has been extensively studied for English documents. Despite the outstanding number of clinical documents written worldwide in Spanish, only a marginal amount of studies has been published on this subject. In clinical texts, as opposed to the medical literature, abbreviations are generally used without their definitions or expanded forms. The aim of the first Biomedical Abbreviation Recognition and Resolution (BARR) track, posed at the IberEval 2017 evaluation campaign, was to assess and promote the development of systems for generating a sense inventory of medical abbreviations. The BARR track required the detection of mentions of abbreviations or short forms and their corresponding long forms or definitions from Spanish medical abstracts. For this track, the organizers provided the BARR medical document collection, the BARR corpus of manually annotated abstracts labelled by domain experts and the BARR-Markyt evaluation platform. A total of 7 teams submitted 25 runs for the two BARR subtasks: (a) the identification of mentions of abbreviations and their definitions and (b) the correct detection of short form-long form pairs. Here we describe the BARR track setting, the obtained results and the methodologies used by participating systems. The BARR task summary, corpus, resources and evaluation tool for testing systems beyond this campaign are available at: http://temu.inab.org
.We acknowledge the Encomienda MINETAD-CNIO/OTG Sanidad Plan TL and Open-Minted (654021) H2020 project for funding.Postprint (published version
BC4GO: a full-text corpus for the BioCreative IV GO task
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain âŒ10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community
Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference
No abstract available
Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference
No abstract available
Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track
The TIPS track consisted in a novel experimental task under the umbrella
of the BioCreative text mining challenges with the aim to, for the first time
ever, carry out a text mining challenge with particular focus on the continuous
assessment of technical aspects of text annotation web servers, specifically of
biomedical online named entity recognition systems.
A total of 13 teams registered annotation servers, implemented in various programming
languages, supporting up to 12 different general annotation types. The
continuous evaluation period took place from February to March 2017. The systematic
and continuous evaluation of server responses accounted for testing periods
of low activity and moderate to high activity. Moreover three document
provider settings were covered, including also NCBI PubMed. For a total of
4,092,502 requests, the median response time for most servers was below 3.74 s
with a median of 10 annotations/document. Most of the servers showed great
reliability and stability, being able to process 100,000 requests in 5 days.info:eu-repo/semantics/publishedVersio
Recommended from our members
Challenges of Digitalisation in the Aerospace and Aviation Sectors
This report describes digital transformation in aerospace and aviation, and identifies some challenges that are likely to have parallels with the architecture, engineering and construction (AEC) sector
- âŠ