Search CORE

1 research outputs found

Development of a machine learning framework for biomedical text mining

Author: Costa Hugo
Rocha Miguel
Rodrigues Rúben
Publication venue
Publication date: 01/01/2016
Field of study

Biomedical text mining (BTM) aims to create methods for searching and structuring knowledge extracted from biomedical literature. Named entity recognition (NER), a BTM task, seeks to identify mentions to biological entities in texts. Dictionaries, regular expressions, natural language processing and machine learning (ML) algorithms are used in this task. Over the last years, @Note2, an open-source software framework, which includes user-friendly interfaces for important tasks in BTM, has been developed, but it did not include ML-based methods. In this work, the development of a framework, BioTML, including a number of ML-based approaches for NER is proposed, to fill the gap between @Note2 and state-of-the-art ML approaches. BioTML was integrated in @Note2 as a novel plug-in, where Hidden Markov Models, Conditional Random Fields and Support Vector Machines were implemented to address NER tasks, working with a set of over 60 feature types used to train ML models. The implementation was supported in open-source software, such as MALLET, LibSVM, ClearNLP or OpenNLP. Several manually annotated corpora were used in the validation of BioTML. The results are promising, while there is room for improvement.This work is co-funded by the North Portugal Regional Operational Pro- gramme, under the “Portugal 2020”, through the European Regional Development Fund (ERDF), within project SISBI- Ref NORTE-01-0247-FEDER-0

Universidade do Minho: RepositoriUM

Crossref