Search CORE

1,538 research outputs found

Equi-energy sampler with applications in statistical inference and statistical mechanics

Author: Kou S. C.
Wong Wing Hung
Zhou Qing
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We introduce a new sampling algorithm, the equi-energy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperature-domain methods, the equi-energy sampler, utilizing the temperature--energy duality, targets the energy directly. The focus on the energy function not only facilitates efficient sampling, but also provides a powerful means for statistical estimation, for example, the calculation of the density of states and microcanonical averages in statistical mechanics. The equi-energy sampler is applied to a variety of problems, including exponential regression in statistics, motif sampling in computational biology and protein folding in biophysics.Comment: This paper discussed in: [math.ST/0611217], [math.ST/0611219], [math.ST/0611221], [math.ST/0611222]. Rejoinder in [math.ST/0611224]. Published at http://dx.doi.org/10.1214/009053606000000515 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Getting started in probabilistic graphical models

Author: Airoldi Edoardo M
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2007
Field of study

Probabilistic graphical models (PGMs) have become a popular tool for computational analysis of biological data in a variety of domains. But, what exactly are they and how do they work? How can we use PGMs to discover patterns that are biologically relevant? And to what extent can PGMs help us formulate new hypotheses that are testable at the bench? This note sketches out some answers and illustrates the main ideas behind the statistical approach to biological pattern discovery.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

A Combined Motif Discovery Method

Author: Lu Daming
Publication venue: ScholarWorks@UNO
Publication date: 06/08/2009
Field of study

A central problem in the bioinformatics is to find the binding sites for regulatory motifs. This is a challenging problem that leads us to a platform to apply a variety of data mining methods. In the efforts described here, a combined motif discovery method that uses mutual information and Gibbs sampling was developed. A new scoring schema was introduced with mutual information and joint information content involved. Simulated tempering was embedded into classic Gibbs sampling to avoid local optima. This method was applied to the 18 pieces DNA sequences containing CRP binding sites validated by Stormo and the results were compared with Bioprospector. Based on the results, the new scoring schema can get over the defect that the basic model PWM only contains single positioin information. Simulated tempering proved to be an adaptive adjustment of the search strategy and showed a much increased resistance to local optima

A Combined Motif Discovery Method

Author: Lu Daming
Publication venue: ScholarWorks@UNO
Publication date: 06/08/2009
Field of study

University of New Orleans

STRUCTURE COMPARISON AND ALIGNMENT

Author: Emidio Capriotti
Ilya N. Shindyalov
Marc A. Marti-renom
Philip E. Bourne
Publication venue
Publication date: 01/01/2009
Field of study

Not availabl

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Author: BFJ Manly
CI Castillo-Davis
David Johnson
DB Searls
DB Searls
DD Womble
E Badidi
F Antequera
J Krueger
J Theilhaber
JD Wren
JD Wren
JF Costello
JM Claverie
Jonathan D Wren
JR Quinlan
K Davies
K Nakai
L Stein
Le Gruenwald
LV Zhang
M Ashburner
M Gardiner-Garden
M Safran
P Clark
RS Michalski
S Foissac
S Muggleton
SP Shah
TV Venkatesh
V Bajic
W Frawley
WM Shui
WM Shui
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central