Search CORE

7,278 research outputs found

SLIDER: Mining correlated motifs in protein-protein interaction networks

Author: Boyen P.
Dijk A.D.J., van
Ham R.C.H.J., van
Neven F.
Publication venue
Publication date: 01/01/2009
Field of study

Abstract—Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that CMM is an NP-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the method SLIDER which uses local search with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that SLIDER outperforms existing motif-driven CMM methods and scales to large protein-protein interaction networks

Wageningen University & Research Publications

Automated linear motif discovery from protein interaction network

Author: TAN SOON HENG
Publication venue
Publication date: 08/03/2006
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

The Mathematics of Phylogenomics

Author: Pachter Lior
Sturmfels Bernd
Publication venue
Publication date: 01/01/2004
Field of study

The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

Digital Ecosystems: Ecosystem-Oriented Architectures

Author: Briscoe Gerard
De Wilde Philippe
Sadedin Suzanne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/08/2011
Field of study

We view Digital Ecosystems to be the digital counterparts of biological ecosystems. Here, we are concerned with the creation of these Digital Ecosystems, exploiting the self-organising properties of biological ecosystems to evolve high-level software applications. Therefore, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. The Digital Ecosystem was then measured experimentally through simulations, with measures originating from theoretical ecology, evaluating its likeness to biological ecosystems. This included its responsiveness to requests for applications from the user base, as a measure of the ecological succession (ecosystem maturity). Overall, we have advanced the understanding of Digital Ecosystems, creating Ecosystem-Oriented Architectures where the word ecosystem is more than just a metaphor.Comment: 39 pages, 26 figures, journa

arXiv.org e-Print Archive

Heriot Watt Pure

Kent Academic Repository

06201 Abstracts Collection -- Combinatorial and Algorithmic Foundations of Pattern and Association Discovery

Author: Ahlswede Rudolf
Apostolico Alberto
Levenshtein Vladimir I.
Publication venue: Dagstuhl Seminar Proceedings. 06201 - Combinatorial and Algorithmic Foundations of Pattern and Association Discovery
Publication date: 01/01/2006
Field of study

From 15.05.06 to 20.05.06, the Dagstuhl Seminar 06201 ``Combinatorial and Algorithmic Foundations of Pattern and Association Discovery\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

High resolution mapping of Twist to DNA in Drosophila embryos: Efficient functional analysis and evolutionary conservation

Author: Dunipace Leslie
Fisher-Aylor Katherine I.
McCue Kenneth
Ogawa Nobuo
Ozdemir Anil
Pepke Shirley
Samanta Manoj
Stathopoulos Angelike
Wold Barbara J.
Zeng Lucy
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/04/2011
Field of study

Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of ±150 bp surrounding Twist occupied summits, and enrichment of GA- and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic

Caltech Authors

Pattern discovery in sequence databases : algorithms and applications to DNA/protein classification

Author: Chirn Gung-Wei
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1996
Field of study

Sequence databases comprise sequence data, which are linear structural descriptions of many natural entities. Approximate pattern discovery in a sequence database can lead to important conclusions or prediction of new phenomena. Traditional database technology is not suitable for accomplishing the task, and new techniques need to be developed. In this dissertation, we propose several new techniques for discovering patterns in sequence databases. Our techniques incorporate pattern matching algorithms and novel heuristics for discovery and optimization. Experimental results of applying the techniques to both generated data and DNA/proteins show the effectiveness of the proposed techniques. We then develop several classifiers using our pattern discovery algorithms and a previously published fingerprint technique. When we apply the classifiers to classify DNA and protein sequences, they give information that is complementary to the best classifiers available today

Digital Commons @ New Jersey Institute of Technology (NJIT)

Design of a combinatorial DNA microarray for protein-DNA interaction studies

Author: CE Lawrence
CL Warren
CT Harbison
EH Davidson
H Bolouri
JD Hughes
JK Wang
Julian Mintseris
LW Hillier
Michael B Eisen
ML Bulyk
ML Bulyk
RD Egeland
S Mukherjee
SS Skiena
TI Lee
TJ Albert
V Matys
X Liu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Discovery of precise specificity of transcription factors is an important step on the way to understanding the complex mechanisms of gene regulation in eukaryotes. Recently, double-stranded protein-binding microarrays were developed as a potentially scalable approach to tackle transcription factor binding site identification. RESULTS: Here we present an algorithmic approach to experimental design of a microarray that allows for testing full specificity of a transcription factor binding to all possible DNA binding sites of a given length, with optimally efficient use of the array. This design is universal, works for any factor that binds a sequence motif and is not species-specific. Furthermore, simulation results show that data produced with the designed arrays is easier to analyze and would result in more precise identification of binding sites. CONCLUSION: In this study, we present a design of a double stranded DNA microarray for protein-DNA interaction studies and show that our algorithm allows optimally efficient use of the arrays for this purpose. We believe such a design will prove useful for transcription factor binding site identification and other biological problems

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

PubMed Central

UNT Digital Library

Mining non-contiguous mutation chain in biological sequences based on 3D-structure

Author: HUANG WEI
Publication venue
Publication date: 07/03/2011
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

A Combined Motif Discovery Method

Author: Lu Daming
Publication venue: ScholarWorks@UNO
Publication date: 06/08/2009
Field of study

A central problem in the bioinformatics is to find the binding sites for regulatory motifs. This is a challenging problem that leads us to a platform to apply a variety of data mining methods. In the efforts described here, a combined motif discovery method that uses mutual information and Gibbs sampling was developed. A new scoring schema was introduced with mutual information and joint information content involved. Simulated tempering was embedded into classic Gibbs sampling to avoid local optima. This method was applied to the 18 pieces DNA sequences containing CRP binding sites validated by Stormo and the results were compared with Bioprospector. Based on the results, the new scoring schema can get over the defect that the basic model PWM only contains single positioin information. Simulated tempering proved to be an adaptive adjustment of the search strategy and showed a much increased resistance to local optima