Search CORE

27 research outputs found

Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

Author: Bhasuran Balu
Murugesan Gurusamy
Natarajan Jeyakumar
Publication venue
Publication date: 03/10/2023
Field of study

Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table

arXiv.org e-Print Archive

A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience

Author: B Bhasuran
C O’Reilly
Christian O’Reilly
CJ Crasto
DLK Yamins
E Underwood
Elisabetta Iavarone
H Pan
H-M Müller
I Spasic
John McNaught
K Ambert
L French
L French
L French
M Habibi
MA Driel Van
Maolin Li
Matthew Shardlow
Meizhi Ju
N Okazaki
N Okazaki
PF Balan
R Richardet
S Hochreiter
S Tokui
S Tripathy
Sophia Ananiadou
The UniProt Consortium
X Vasques
Y Chen
Y Lecun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2018
Field of study

The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward task. Text mining can be used to help a curator extract relevant information from this literature in a systematic way. We propose the application of text mining methods for the neuroscience literature. Specifically, two computational neuroscientists annotated a corpus of entities pertinent to neuroscience using active learning techniques to enable swift, targeted annotation. We then trained machine learning models to recognise the entities that have been identified. The entities covered are Neuron Types, Brain Regions, Experimental Values, Units, Ion Currents, Channels, and Conductances and Model organisms. We tested a traditional rule-based approach, a conditional random field and a model using deep learning named entity recognition, finding that the deep learning model was superior. Our final results show that we can detect a range of named entities of interest to the neuroscientist with a macro average precision, recall and F1 score of 0.866, 0.817 and 0.837 respectively. The contributions of this work are as follows: 1) We provide a set of Named Entity Recognition (NER) tools that are capable of detecting neuroscience entities with performance above or similar to prior work. 2) We propose a methodology for training NER tools for neuroscience that requires very little training data to get strong performance. This can be adapted for any sub-domain within neuroscience. 3) We provide a small corpus with annotations for multiple entity types, as well as annotation guidelines to help others reproduce our experiments

Infoscience - École polytechnique fédérale de Lausanne

Crossref

E-space: Manchester Metropolitan University's Research Repository

ZENODO

The University of Manchester - Institutional Repository

Full feature set used in gene-disease relation extraction.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Full feature set used in gene-disease relation extraction.</p

FigShare

Automatic extraction of gene-disease associations from literature using joint ensemble learning

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

<div>A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature.</div

FigShare

Feature representation of gene-disease relation extraction.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

a) The sentence is tagged with both LOXL1 gene and Exfoliation glaucoma disease from EU-ADR corpus with PMCID: PMC2605423 b) Word window representation of syntax and semantic features c)Tokens positioned at the left and right (n-gram) of the candidates(LOXL1 and exfoliation glaucoma)d)Locating the words between the entities for relational and trigger words e) Phrasal feature from the relational word f) Finding context specific word using trigger word templates.</p

FigShare

Corpus characteristics of full set corpora.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Corpus characteristics of full set corpora.</p

FigShare

Performance comparison of the proposed system with the BeFree [26] system.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Performance comparison of the proposed system with the BeFree [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200699#pone.0200699.ref026" target="_blank">26</a>] system.</p

FigShare

Performance comparison of the proposed system with the PKDE4J [28] system.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Performance comparison of the proposed system with the PKDE4J [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200699#pone.0200699.ref028" target="_blank">28</a>] system.</p

FigShare

ROC with respect to FPR and TPR on four corpora upon 10-fold cross-validation.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

In this figure, a, b, c, and d represents the receiver operating curves of EU-ADR, GAD, CoMAGC and PolySearch corpora respectively.</p

FigShare

Schematic architecture of the gene-disease relation extraction system.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Schematic architecture of the gene-disease relation extraction system.</p

FigShare