Search CORE

24,134 research outputs found

Recommended from our members

Characterisation of FAD-family folds using a machine learning approach

Author: Gilbert D
Tan A C
Tuson A
Publication venue: INCOB
Publication date: 01/01/2002
Field of study

Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in biological processes. They are major organic cofactors and electron carriers in both enzymatic activities and biochemical pathways. We have analysed the relationships between sequence and structure of FAD-containing proteins using a machine learning approach. Decision trees were generated using the C4.5 algorithm as a means of automatically generating rules from biological databases (TOPS, CATH and PDB). These rules were then used as background knowledge for an ILP system to characterise the four different classes of FAD-family folds classified in Dym and Eisenberg (2001). These FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR), p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily was characterised by a set of rules. The “knowledge patterns” generated from this approach are a set of rules containing conserved sequence motifs, secondary structure sequence elements and folding information. Every rule was then verified using statistical evaluation on the measured significance of each rule. We show that this machine learning approach is capable of learning and discovering interesting patterns from large biological databases and can generate “knowledge patterns” that characterise the FADcontaining proteins, and at the same time classify these proteins into four different families

Brunel University Research Archive

MEMOFinder: combining _de_ _novo_ motif prediction methods with a database of known motifs

Author: Bartek Wilczynski
Jerzy Tiuryn
Milosz Darzynkiewicz
Publication venue
Publication date: 11/09/2008
Field of study

*Background:* Methods for finding overrepresented sequence motifs are useful in several key areas of computational biology. They aim at detecting very weak signals responsible for biological processes requiring robust sequence identification like transcription-factor binding to DNA or docking sites in proteins. Currently, general performance of the model-based motif-finding methods is unsatisfactory; however, different methods are successful in different cases. This leads to the practical problem of combining results of different motif-finding tools, taking into account current knowledge collected in motif databases.
*Results:* We propose a new complete service allowing researchers to submit their sequences for analysis by four different motif-finding methods for clustering and comparison with a reference motif database. It is tailored for regulatory motif detection, however it allows for substantial amount of configuration regarding sequence background, motif database and parameters for motif-finding methods.
*Availability:* The method is available online as a webserver at: http://bioputer.mimuw.edu.pl/software/mmf/. In addition, the source code is released on a GNU General Public License

Nature Precedings

Motif Discovery through Predictive Modeling of Gene Regulation

Author: A. Battle
A.P. Gasch
C.E. Lawrence
E. Segal
E. Segal
E. Wingender
E.M. Conlon
G.Z. Hertz
H.J. Bussemaker
J.D. Hughes
N. Slonim
R.E. Schapire
T. Cover
T.I. Lee
T.L. Bailey
Y. Pilpel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a

k

-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.Comment: RECOMB 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Automated linear motif discovery from protein interaction network

Author: TAN SOON HENG
Publication venue
Publication date: 08/03/2006
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

The benefits of in silico modeling to identify possible small-molecule drugs and their off-target interactions

Author: Blomberg N
Choi SH
Hastings J
Hirschey J
Mire Zloh
Stewart B Kirton
Wang X
Publication venue: 'Future Science Ltd'
Publication date: 30/01/2019
Field of study

Accepted for publication in a future issue of Future Medicinal Chemistry.The research into the use of small molecules as drugs continues to be a key driver in the development of molecular databases, computer-aided drug design software and collaborative platforms. The evolution of computational approaches is driven by the essential criteria that a drug molecule has to fulfill, from the affinity to targets to minimal side effects while having adequate absorption, distribution, metabolism, and excretion (ADME) properties. A combination of ligand- and structure-based drug development approaches is already used to obtain consensus predictions of small molecule activities and their off-target interactions. Further integration of these methods into easy-to-use workflows informed by systems biology could realize the full potential of available data in the drug discovery and reduce the attrition of drug candidates.Peer reviewe

Crossref

University of Hertfordshire Research Archive

Typing tumors using pathways selected by somatic evolution.

Author: Huang Justin
Ideker Trey
Ma Jianzhu
Peng Jian
Shen John Paul
Wang Sheng
Zhang Wei
Publication venue: eScholarship, University of California
Publication date: 01/10/2018
Field of study

Many recent efforts to analyze cancer genomes involve aggregation of mutations within reference maps of molecular pathways and protein networks. Here, we find these pathway studies are impeded by molecular interactions that are functionally irrelevant to cancer or the patient's tumor type, as these interactions diminish the contrast of driver pathways relative to individual frequently mutated genes. This problem can be addressed by creating stringent tumor-specific networks of biophysical protein interactions, identified by signatures of epistatic selection during tumor evolution. Using such an evolutionarily selected pathway (ESP) map, we analyze the major cancer genome atlases to derive a hierarchical classification of tumor subtypes linked to characteristic mutated pathways. These pathways are clinically prognostic and predictive, including the TP53-AXIN-ARHGEF17 combination in liver and CYLC2-STK11-STK11IP in lung cancer, which we validate in independent cohorts. This ESP framework substantially improves the definition of cancer pathways and subtypes from tumor genome data

Directory of Open Access Journals

eScholarship - University of California