Search CORE

28,265 research outputs found

Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text

Author: Galstyan Aram
Garg Sahil
Hermjakob Ulf
Marcu Daniel
Publication venue
Publication date: 04/12/2015
Field of study

We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.Comment: Appearing in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Finding the Core-Genes of Chloroplasts

Author: AlKindy Bassam
Bahi Jacques M.
Couchot Jean-François
Guyeux Christophe
Mouly Arnaud
Salomon Michel
Publication venue
Publication date: 01/01/2014
Field of study

Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, \textit{i.e.}, genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplastic genomes, whose reasonable sizes allow a deep study. To achieve this goal, a collection of 99 chloroplasts are considered in this article. Two methodologies have been investigated, respectively based on sequence similarities and genes names taken from annotation tools. The obtained results will finally be evaluated in terms of biological relevance

arXiv.org e-Print Archive

HAL - Université de Franche-Comté

Bayesian sequence learning for predicting protein cleavage points

Author: Mayo Michael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

A challenging problem in data mining is the application of efficient techniques to automatically annotate the vast databases of biological sequence data. This paper describes one such application in this area, to the prediction of the position of signal peptide cleavage points along protein sequences. It is shown that the method, based on Bayesian statistics, is comparable in terms of accuracy to the existing state-of-the-art neural network techniques while providing explanatory information for its predictions

Research Commons@Waikato

A Labeled Graph Kernel for Relationship Extraction

Author: Galhardas Helena
Matos David
Simões Gonçalo
Publication venue
Publication date: 20/02/2013
Field of study

In this paper, we propose an approach for Relationship Extraction (RE) based on labeled graph kernels. The kernel we propose is a particularization of a random walk kernel that exploits two properties previously studied in the RE literature: (i) the words between the candidate entities or connecting them in a syntactic representation are particularly likely to carry information regarding the relationship; and (ii) combining information from distinct sources in a kernel may help the RE system make better decisions. We performed experiments on a dataset of protein-protein interactions and the results show that our approach obtains effectiveness values that are comparable with the state-of-the art kernel methods. Moreover, our approach is able to outperform the state-of-the-art kernels when combined with other kernel methods

arXiv.org e-Print Archive

CiteSeerX