Search CORE

226 research outputs found

Mining associations and roles: role of feature extraction

Author: Nenadic Goran
Publication venue: Dagstuhl Seminar Proceedings. 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Publication date: 01/01/2008
Field of study

One of the ultimate aims of biomedical text mining would be to extract both explicit and implicit associations between different types of entities. In addition, assigning roles that entities have or may have in biological processes is also of interest. In this talk I will be discussing our experience in selecting and engineering textual features that can help in mining associations and roles from literature. Depending on tasks and entities involved, we have used four types of features: from simple words and terms, to words and semantic classes, to textual contexts, to contexts augmented with additional background attributes. The main epilogue is that both NLP- and domain-knowledge driven feature engineering are needed for successful mining of associations and roles

Dagstuhl Research Online Publication Server

Inferring Methodological Meta-knowledge from Large Biomedical Corpora

Author: Nenadic Goran
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

Waseda University Repository

A cascaded approach to normalising gene mentions in biomedical literature

Author: Keane John A.
Nenadic Goran
Yang Hui
Publication venue: 'Biomedical Informatics'
Publication date: 01/01/2007
Field of study

Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%

Crossref

University of Birmingham Research Portal

Open Research Online (The Open University)

PubMed Central

The University of Manchester - Institutional Repository

Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

Author: Belousov Maksim
Dixon William
Milosevic Nikola
Nenadic Goran
Publication venue
Publication date: 01/01/2018
Field of study

Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Table mining and data curation from biomedical literature

Author: Milosevic Nikola
Nenadic Goran
Publication venue: University of Manchester
Publication date: 01/12/2014
Field of study

The University of Manchester - Institutional Repository

Clinical text data in machine learning: Systematic review

Author: Nenadic Goran
Spasic Irena
Publication venue: 'JMIR Publications Inc.'
Publication date: 31/03/2020
Field of study

Background: Clinical narratives represent the main form of communication within healthcare providing a personalized account of patient history and assessments, offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective: The main aim of this study is to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigate the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods: Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multi-faceted interface, to perform a literature search against MEDLINE. We identified a total of 110 relevant studies and extracted information about the text data used to support machine learning, the NLP tasks supported and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation and any relevant statistics. Results: The vast majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable due to sensitive nature of data considered. Beside the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The vast majority of studies focused on the task of text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management and surveillance. Conclusions: We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which does not require data annotation

Online Research @ Cardiff

Biomedical Semantics: the Hub for Biomedical Research 2.0

Author: Nenadic Goran
Rebholz-Schuhmann Dietrich
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository