Search CORE

447 research outputs found

PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations

Author: Ananiadou S
Björne J
Ginter F
Ohta T
Pyysalo S
Salakoski T
Van de Peer Y
Van Landeghem S
Publication venue
Publication date: 01/01/2012
Field of study

The University of Manchester - Institutional Repository

Large-scale event extraction from literature with multi-level gene normalization

Author: Ananiadou Sophia
Bjorne Jari
Ginter Filip
Hakala Kai
Kao Hung-Yu
Lu Zhiyong
Pyysalo Sampo
Salakoski Tapio
Van de Peer Yves
Van Landeghem Sofie
Wei Chih-Hsuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license

Crossref

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

FigShare

University of Turku in the BioNLP'11 Shared Task

Author: A Jimeno Yepes
D McClosky
D McClosky
de Marneffe
E Buyko
E Charniak
Filip Ginter
H Kilicoglu
H Kilicoglu
I Tsochantaridis
J Björne
J Björne
J Björne
J Heimonen
J Jourde
Jari Björne
JD Kim
JD Kim
JD Kim
JP Euzéby
M Miwa
M Miwa
MC de Marneffe
MF Porter
N Nguyen
P Stenetorp
R Bossy
S Pyysalo
S Pyysalo
S Riedel
S Riedel
S Riedel
S Van Landeghem
S Van Landeghem
T Ohta
Tapio Salakoski
Y Kim
Z Ratkovic
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Biomedical Event Extraction with Machine Learning

Author: Björne Jari
Publication venue: Turku Centre for Computer Science
Publication date: 07/08/2014
Field of study

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.Siirretty Doriast

UTUPub

Biomedical Event Extraction with Machine Learning

Author: Björne Jari
Publication venue: TUCS Dissertations
Publication date: 28/10/2022
Field of study

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein--protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence ``Protein A causes protein B to bind protein C'' can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing.  Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.</p

UTUPub

Large-scale event extraction from literature with multi-level gene normalization

Author: Ananiadou Sophia
Björne Jari
Ginter Filip
Hakala Kai
Kao Hung-Yu
Lu Zhiyong
Pyysalo Sampo
Salakoski Tapio
Van de Peer Yves
Van Landeghem Sofie
Wei Chih-Hsuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/10/2022
Field of study

UTUPub

Epigenetic modelling: DNA methylation and working towards model parameterisation

Author: Porwal Jyoti
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2011
Field of study

The main focus of the research in this thesis is the investigation in DNA methylation mechanisms of epigenetics and the study of a specific database. As part of the latter work, the role of curation is described, and a new knowledge management system, PathEpigen1 , is reported that is currently being developed for colon cancer in the Sci-Sym centre. The database deals with genetic and epigenetic interactions and contains considerable data on molecular events such as genetic and epigenetic events. The data curation includes biomedical and biological information. An efficient method was devised to extract biological information from the literature to process, manage and upgrade data. We present a Deterministic Finite Automata (DFA) model for the DNA methylation mechanism controlled by DNA methyltransferase (DNMT) enzymes. This thesis provides a brief introduction to epigenetics, a survey of ongoing research on computational epigenetics and a description of the DNA methylation database. Furthermore, it also gives an overview of DNA methylation and its importance in cancer. The DFA models three states of methylation frequency (normal, de-novo and hypermethylated) in the cell. It has been executed on input of random strings of size 100. Out of the strings considered, we found that 26%, 37% and 37% correspond to normal, de-novo (cancer initiation) and hypermethylated (cancer) states, respectively

Irish Universities

DCU Online Research Access Service

Playing hide and seek on the genomic playground: unveiling biological function from literature

Author: Van Landeghem Sofie
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Tumour heterogeneity: the key advantages of single-cell analysis

Author: Benjamin Ory
Chekhun
De Luca
Dominique Heymann
Francois Lamoureux
Gasch
Jiang
Marie-Francoise Heymann
Marta Tellez-Gabriel
Siyar Ekinci
Somasundaram
Toss
Wlodkowic
Publication venue: 'MDPI AG'
Publication date: 01/12/2016
Field of study

Tumour heterogeneity refers to the fact that different tumour cells can show distinct morphological and phenotypic profiles, including cellular morphology, gene expression, metabolism, motility, proliferation and metastatic potential. This phenomenon occurs both between tumours (inter-tumour heterogeneity) and within tumours (intra-tumour heterogeneity), and it is caused by genetic and non-genetic factors. The heterogeneity of cancer cells introduces significant challenges in using molecular prognostic markers as well as for classifying patients that might benefit from specific therapies. Thus, research efforts for characterizing heterogeneity would be useful for a better understanding of the causes and progression of disease. It has been suggested that the study of heterogeneity within Circulating Tumour Cells (CTCs) could also reflect the full spectrum of mutations of the disease more accurately than a single biopsy of a primary or metastatic tumour. In previous years, many high throughput methodologies have raised for the study of heterogeneity at different levels (i.e., RNA, DNA, protein and epigenetic events). The aim of the current review is to stress clinical implications of tumour heterogeneity, as well as current available methodologies for their study, paying specific attention to those able to assess heterogeneity at the single cell level

Multidisciplinary Digital Publishing Institute

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

White Rose Research Online

Systems Analytics and Integration of Big Omics Data

Author: Hardiman Gary
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

Directory of Open Access Books (DOAB)