Search CORE

7 research outputs found

MultiGBS: A multi-layer graph approach to biomedical summarization

Author: Davoodijam Ensieh
Ghadiri Nasser
Rinaldi Fabio
Shahreza Maryam Lotfi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values

arXiv.org e-Print Archive

Western Sydney ResearchDirect

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm

Author: Blanco Míguez Aitor
Fernández Riverola Florentino
GARCIA LOURENÇO Analia Maria
Krallinger Martin
Pérez Pérez Martín
Pérez Rodríguez Gael
Valencia Alfonso
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/11/2022
Field of study

Background: Shared tasks and community challenges represent key instruments to promote research, collaboration and determine the state of the art of biomedical and chemical text mining technologies. Traditionally, such tasks relied on the comparison of automatically generated results against a so-called Gold Standard dataset of manually labelled textual data, regardless of efficiency and robustness of the underlying implementations. Due to the rapid growth of unstructured data collections, including patent databases and particularly the scientific literature, there is a pressing need to generate, assess and expose robust big data text mining solutions to semantically enrich documents in real time. To address this pressing need, a novel track called “Technical interoperability and performance of annotation servers” was launched under the umbrella of the BioCreative text mining evaluation effort. The aim of this track was to enable the continuous assessment of technical aspects of text annotation web servers, specifically of online biomedical named entity recognition systems of interest for medicinal chemistry applications. Results: A total of 15 out of 26 registered teams successfully implemented online annotation servers. They returned predictions during a two-month period in predefined formats and were evaluated through the BeCalm evaluation platform, specifically developed for this track. The track encompassed three levels of evaluation, i.e. data format considerations, technical metrics and functional specifications. Participating annotation servers were implemented in seven different programming languages and covered 12 general entity types. The continuous evaluation of server responses accounted for testing periods of low activity and moderate to high activity, encompassing overall 4,092,502 requests from three different document provider settings. The median response time was below 3.74 s, with a median of 10 annotations/document. Most of the servers showed great reliability and stability, being able to process over 100,000 requests in a 5-day period. Conclusions: The presented track was a novel experimental task that systematically evaluated the technical performance aspects of online entity recognition systems. It raised the interest of a significant number of participants. Future editions of the competition will address the ability to process documents in bulk as well as to annotate full-text documents.Portuguese Foundation for Science and Technology | Ref. UID/BIO/04469/2013Portuguese Foundation for Science and Technology | Ref. COMPETE 2020 (POCI-01-0145-FEDER-006684)Xunta de Galicia | Ref. ED431C2018/55-GRCEuropean Commission | Ref. H2020, n. 65402

Investigo

Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition

Author: Dehmer Matthias
Emmert-Streib Frank
Nguyen Thi Thuy Linh
Perera Nadeesha
Publication venue: 'MDPI AG'
Publication date: 01/03/2022
Field of study

publishedVersionPeer reviewe

Multidisciplinary Digital Publishing Institute

Trepo - Institutional Repository of Tampere University

OGER++: hybrid multi-type entity recognition

Author: Anna Jancso
Fabio Rinaldi
Lenz Furrer
Nicola Colic
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Abstract Background We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. Results We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. Conclusions Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining

Directory of Open Access Journals

OGER++: hybrid multi-type entity recognition

Author: Colic Nicola
Furrer Lenz
Jancso Anna
Rinaldi Fabio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/01/2019
Field of study

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining

ZORA