Search CORE

601 research outputs found

Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling

Author: A Paccanaro
AP Dempster
AY Ng
C Caragea
C Caragea
C Yan
Cornelia Caragea
Drena Dobbs
H Berman
IS Dhillon
J Allers
J Davis
J Shi
JH Kim
Jivko Sinapov
M Terribilini
MI Jordan
N Qian
P Baldi
R Duda
S Russell
TG Dietterich
TG Diettrich
TM Mitchell
Vasant Honavar
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Identification of functionally important sites in biomolecular sequences has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Experimental determination of such sites lags far behind the number of known biomolecular sequences. Hence, there is a need to develop reliable computational methods for identifying functionally important sites from biomolecular sequences. Results: We present a mixture of experts approach to biomolecular sequence labeling that takes into account the global similarity between biomolecular sequences. Our approach combines unsupervised and supervised learning techniques. Given a set of sequences and a similarity measure defined on pairs of sequences, we learn a mixture of experts model by using spectral clustering to learn the hierarchical structure of the model and by using bayesian techniques to combine the predictions of the experts. We evaluate our approach on two biomolecular sequence labeling problems: RNA-protein and DNA-protein interface prediction problems. The results of our experiments show that global sequence similarity can be exploited to improve the performance of classifiers trained to label biomolecular sequence data. Conclusion: The mixture of experts model helps improve the performance of machine learning methods for identifying functionally important sites in biomolecular sequences.This is a proceeding from IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 10 (2009): S4, doi: 10.1186/1471-2105-10-S4-S4. Posted with permission.</p

Digital Repository @ Iowa State University (ISU)

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling

Author: Caragea Cornelia
Dobbs Drena
Honavar Vasant
Sinapov Jivko
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Article discussing models for increasing the reliability of computational methods for identifying functionally important sites from biomolecular sequences

UNT Digital Library

Recommended from our members

Natural and Accelerated Bioremediation Research Program

Author: Lawrence Berkeley National Laboratory
Publication venue: eScholarship, University of California
Publication date: 01/03/2002
Field of study

eScholarship - University of California

UNIPred: Unbalance-aware Network Integration and Prediction of protein functions

Author: Alberto Bertoni
Giorgio Valentini
Hopfield Networks
Marco Frasca
Publication venue
Publication date: 24/04/2020
Field of study

Abstract The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the Automated Function Prediction (AFP) problem. Most of supervised and semi-supervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (Unbalance-aware Network Integration and Prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semi-supervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three Eukaryotic model organisms show that the proposed method compares favourably with state-of-the-art learning algorithms for AFP

CiteSeerX

From Text to Knowledge

Author: Bundschus Markus
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 21/07/2010
Field of study

The global information space provided by the World Wide Web has changed dramatically the way knowledge is shared all over the world. To make this unbelievable huge information space accessible, search engines index the uploaded contents and provide efficient algorithmic machinery for ranking the importance of documents with respect to an input query. All major search engines such as Google, Yahoo or Bing are keyword-based, which is indisputable a very powerful tool for accessing information needs centered around documents. However, this unstructured, document-oriented paradigm of the World Wide Web has serious drawbacks, when searching for specific knowledge about real-world entities. When asking for advanced facts about entities, today's search engines are not very good in providing accurate answers. Hand-built knowledge bases such as Wikipedia or its structured counterpart DBpedia are excellent sources that provide common facts. However, these knowledge bases are far from being complete and most of the knowledge lies still buried in unstructured documents. Statistical machine learning methods have the great potential to help to bridge the gap between text and knowledge by (semi-)automatically transforming the unstructured representation of the today's World Wide Web to a more structured representation. This thesis is devoted to reduce this gap with Probabilistic Graphical Models. Probabilistic Graphical Models play a crucial role in modern pattern recognition as they merge two important fields of applied mathematics: Graph Theory and Probability Theory. The first part of the thesis will present a novel system called Text2SemRel that is able to (semi-)automatically construct knowledge bases from textual document collections. The resulting knowledge base consists of facts centered around entities and their relations. Essential part of the system is a novel algorithm for extracting relations between entity mentions that is based on Conditional Random Fields, which are Undirected Probabilistic Graphical Models. In the second part of the thesis, we will use the power of Directed Probabilistic Graphical Models to solve important knowledge discovery tasks in semantically annotated large document collections. In particular, we present extensions of the Latent Dirichlet Allocation framework that are able to learn in an unsupervised way the statistical semantic dependencies between unstructured representations such as documents and their semantic annotations. Semantic annotations of documents might refer to concepts originating from a thesaurus or ontology but also to user-generated informal tags in social tagging systems. These forms of annotations represent a first step towards the conversion to a more structured form of the World Wide Web. In the last part of the thesis, we prove the large-scale applicability of the proposed fact extraction system Text2SemRel. In particular, we extract semantic relations between genes and diseases from a large biomedical textual repository. The resulting knowledge base contains far more potential disease genes exceeding the number of disease genes that are currently stored in curated databases. Thus, the proposed system is able to unlock knowledge currently buried in the literature. The literature-derived human gene-disease network is subject of further analysis with respect to existing curated state of the art databases. We analyze the derived knowledge base quantitatively by comparing it with several curated databases with regard to size of the databases and properties of known disease genes among other things. Our experimental analysis shows that the facts extracted from the literature are of high quality

Digitale Hochschulschriften der LMU

Artificial intelligence for natural product drug discovery

Newcastle University E-Prints

固有表現抽出のための素性の一般化の研究

Author: Cho Han-Cheol
趙漢哲
Publication venue: 情報理工学系研究科コンピュータ科学専攻
Publication date: 24/03/2014
Field of study

学位の種別:課程博士University of Tokyo(東京大学

Machine Learning

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

Directory of Open Access Books (DOAB)

Recommended from our members

Variational Multi-Task Models for Image Analysis: Applications to Magnetic Resonance Imaging

Author: Corona Veronica
Publication venue: University of Cambridge
Publication date: 17/04/2020
Field of study

This thesis deals with the study and development of several variational multi-task models for solving inverse problems in imaging, with a particular focus on Magnetic Resonance Imaging (MRI). In most image processing problems, one usually deals with the reconstruction task, i.e., the task of reconstructing an image from indirect measurements, and then performs various operations, one after the other (i.e. sequentially), to improve the quality of the reconstruction and to extract useful information. However, recent developments in a variational context, have shown that performing those tasks jointly (i.e. in a multi-task framework) offers great benefits, and this is the perspective that we follow in this thesis. We go beyond traditional sequential approaches and set a new basis for variational multi-task methods for MRI analysis. We demonstrate that by sharing representation between tasks and carefully interconnecting them, one can create synergies across challenging problems and reduce error propagation. More precisely, firstly we propose a multi-task variational model to tackle the problems of image reconstruction and image segmentation using non-convex Bregman iteration. We describe theoretical and numerical details of the problem and its optimisation scheme. Moreover, we show that our multi-task model achieves better results in several examples and MRI applications than existing approaches in the same context. Secondly, we show that our approach can be extended to a multi-task reconstruction and segmentation model for the nonlinear inverse problem of velocity-encoded MRI. In this context, the aim is to estimate not only the magnitude from MRI data, but also the phase and its flow information, whilst simultaneously identify regions of interest through the segmentation task. Finally, we go beyond two-task frameworks and introduce for the first time a variational multi-task model to handle three imaging tasks. To this end, we design a variational multi-task framework addressing reconstruction, super-resolution and registration for improving the quality of MRI reconstruction. We demonstrate that our model is theoretically well-motivated and it outperforms sequential models whilst requiring less computational cost. Furthermore, we show through experimental results the potential of this approach for clinical applications

Apollo (Cambridge)