Search CORE

25 research outputs found

Performance comparison of gene disease relation extraction on four different corpora.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Performance comparison of gene disease relation extraction on four different corpora.</p

The Francis Crick Institute

Full feature set used in gene-disease relation extraction.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Full feature set used in gene-disease relation extraction.</p

The Francis Crick Institute

Automatic extraction of gene-disease associations from literature using joint ensemble learning

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

<div>A wealth of knowledge concerning relations between genes and its associated diseases is present in biomedical literature. Mining these biological associations from literature can provide immense support to research ranging from drug-targetable pathways to biomarker discovery. However, time and cost of manual curation heavily slows it down. In this current scenario one of the crucial technologies is biomedical text mining, and relation extraction shows the promising result to explore the research of genes associated with diseases. By developing automatic extraction of gene-disease associations from the literature using joint ensemble learning we addressed this problem from a text mining perspective. In the proposed work, we employ a supervised machine learning approach in which a rich feature set covering conceptual, syntax and semantic properties jointly learned with word embedding are trained using ensemble support vector machine for extracting gene-disease relations from four gold standard corpora. Upon evaluating the machine learning approach shows promised results of 85.34%, 83.93%,87.39% and 85.57% of F-measure on EUADR, GAD, CoMAGC and PolySearch corpora respectively. We strongly believe that the presented novel approach combining rich syntax and semantic feature set with domain-specific word embedding through ensemble support vector machines evaluated on four gold standard corpora can act as a new baseline for future works in gene-disease relation extraction from literature.</div

The Francis Crick Institute

Schematic architecture of the gene-disease relation extraction system.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Schematic architecture of the gene-disease relation extraction system.</p

The Francis Crick Institute

Performance evaluation of gene disease relation extraction on four different corpora.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Performance evaluation of gene disease relation extraction on four different corpora.</p

The Francis Crick Institute

No. of positive and negative sentences annotated in each corpus.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

No. of positive and negative sentences annotated in each corpus.</p

The Francis Crick Institute

Corpus characteristics of full set corpora.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Corpus characteristics of full set corpora.</p

The Francis Crick Institute

Performance comparison of the proposed system with the BeFree [26] system.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Performance comparison of the proposed system with the BeFree [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200699#pone.0200699.ref026" target="_blank">26</a>] system.</p

The Francis Crick Institute

Feature representation of gene-disease relation extraction.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

a) The sentence is tagged with both LOXL1 gene and Exfoliation glaucoma disease from EU-ADR corpus with PMCID: PMC2605423 b) Word window representation of syntax and semantic features c)Tokens positioned at the left and right (n-gram) of the candidates(LOXL1 and exfoliation glaucoma)d)Locating the words between the entities for relational and trigger words e) Phrasal feature from the relational word f) Finding context specific word using trigger word templates.</p

The Francis Crick Institute

Performance comparison of the proposed system with the PKDE4J [28] system.

Author: Balu Bhasuran (5562725)
Jeyakumar Natarajan (20914)
Publication venue
Publication date
Field of study

Performance comparison of the proposed system with the PKDE4J [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200699#pone.0200699.ref028" target="_blank">28</a>] system.</p

The Francis Crick Institute