Search CORE

17,527 research outputs found

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human

Author: Alvarez-Garcia
Angela Re
Calin
Chan
Chen
Corà
Corà
Corà
Daniela Taverna
Davide Corá
Elnitski
Esquela-Kerscher
Filipowicz
Gershengom
Griffiths-Jones
Gudmundsson
He
Hornstein
Hubbard
Iorio
Joglekar
John
Konagurthu
Krek
Ladd
Lai
Landgraf
Lee
Lewis
Lewis
Liu
Loots
Martinez
Matys
Mazurie
Michele Caselle
Milo
Nielsen
O’Donnell
Pan
Pesole
Phan
Saini
Shalgi
Shen-Orr
Tsang
Wagner
Wilkerson
Xie
Zeller
Zhang
Zhao
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2009
Field of study

In this work, we describe a computational framework for the genome-wide identification and characterization of mixed transcriptional/post-transcriptional regulatory circuits in humans. We concentrated in particular on feed-forward loops (FFL), in which a master transcription factor regulates a microRNA, and together with it, a set of joint target protein coding genes. The circuits were assembled with a two step procedure. We first constructed separately the transcriptional and post-transcriptional components of the human regulatory network by looking for conserved over-represented motifs in human and mouse promoters, and 3'-UTRs. Then, we combined the two subnetworks looking for mixed feed-forward regulatory interactions, finding a total of 638 putative (merged) FFLs. In order to investigate their biological relevance, we filtered these circuits using three selection criteria: (I) GeneOntology enrichment among the joint targets of the FFL, (II) independent computational evidence for the regulatory interactions of the FFL, extracted from external databases, and (III) relevance of the FFL in cancer. Most of the selected FFLs seem to be involved in various aspects of organism development and differentiation. We finally discuss a few of the most interesting cases in detail.Comment: 51 pages, 5 figures, 4 tables. Supporting information included. Accepted for publication in Molecular BioSystem

arXiv.org e-Print Archive

Crossref

A multiple-instance scoring method to predict tissue-specific cis-regulatory motifs and regions

Author: Jin Gu
Publication venue
Publication date: 01/12/2009
Field of study

Transcription is the central process of gene regulation. In higher eukaryotes, the transcription of a gene is usually regulated by multiple cis-regulatory regions (CRRs). In different tissues, different transcription factors bind to their cis-regulatory motifs in these CRRs to drive tissue-specific expression patterns of their target genes. By combining the genome-wide gene expression data with the genomic sequence data, we proposed multiple-instance scoring (MIS) method to predict the tissue-specific motifs and the corresponding CRRs. The method is mainly based on the assumption that only a subset of CRRs of the expressed gene should function in the studied tissue. By testing on the simulated datasets and the fly muscle dataset, MIS can identify true motifs when noise is high and shows higher specificity for predicting the tissue-specific functions of CRRs

Crossref

Nature Precedings

Recommended from our members

Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits.

Author: Benaglio Paola
D'Antonio Matteo
D'Antonio-Chronowska Agnieszka
DeBoever Christopher
Donovan Margaret KR
Drees Frauke
Frazer Kelly A
Gaulton Kyle J
Li He
Ma Wubin
Matsui Hiroko
Rosenfeld Michael G
Singhal Sanghamitra
Smith Erin N
Sotoodehnia Nona
van Setten Jessica
Yang Feng
Young Greenwald William W
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes

eScholarship - University of California

Blueprint for a high-performance biomaterial: full-length spider dragline silk genes.

Author: Ayoub Nadia A
Collin Matthew A
Garb Jessica E
Hayashi Cheryl Y
Tinghitella Robin M
Publication venue: eScholarship, University of California
Publication date: 01/06/2007
Field of study

Spider dragline (major ampullate) silk outperforms virtually all other natural and manmade materials in terms of tensile strength and toughness. For this reason, the mass-production of artificial spider silks through transgenic technologies has been a major goal of biomimetics research. Although all known arthropod silk proteins are extremely large (>200 kiloDaltons), recombinant spider silks have been designed from short and incomplete cDNAs, the only available sequences. Here we describe the first full-length spider silk gene sequences and their flanking regions. These genes encode the MaSp1 and MaSp2 proteins that compose the black widow's high-performance dragline silk. Each gene includes a single enormous exon (>9000 base pairs) that translates into a highly repetitive polypeptide. Patterns of variation among sequence repeats at the amino acid and nucleotide levels indicate that the interaction of selection, intergenic recombination, and intragenic recombination governs the evolution of these highly unusual, modular proteins. Phylogenetic footprinting revealed putative regulatory elements in non-coding flanking sequences. Conservation of both upstream and downstream flanking sequences was especially striking between the two paralogous black widow major ampullate silk genes. Because these genes are co-expressed within the same silk gland, there may have been selection for similarity in regulatory regions. Our new data provide complete templates for synthesis of recombinant silk proteins that significantly improve the degree to which artificial silks mimic natural spider dragline fibers

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources

Author: A Ambesi-Impiombato
A Bernard
A Beyer
A Sandelin
A Sandelin
A Siepel
AFA Smit
Alistair G. Rust
B Ren
CE Lawrence
CL Warren
CP Robert
CT Harbison
D GuhaThakurta
D Husmeier
D Husmeier
David Jones
DB Gordon
DJ Reiss
DJ Wilkinson
DT Holloway
DT Holloway
E Blanco
E Segal
E Segal
E Wingender
EH Davidson
G Chen
G Thijs
G Thijs
GD Stormo
GE Crawford
H Huang
H Lähdesmäki
H Steck
Harri Lähdesmäki
Ilya Shmulevich
IV Bajić
J Taylor
JD Hughes
JM Claverie
K Quandt
K Thomas
KD MacIsaac
KP Murphy
L Hertzberg
L Narlikar
L Narlikar
L Narlikar
L Zhang
M Eisenstein
M Kellis
M Levine
M Tompa
MA Beer
MC Frith
MF Berger
MJL de Hoon
ML Bulyk
N Friedman
N Rajewsky
ND Heintzman
O Hallikas
OV Kel-Margoulis
Q Zhou
R Siddharthan
R Staden
S Cawley
S Mukherjee
S Sinha
S Sinha
SB Montgomery
SJ Maerkl
SP Brooks
ST Jensen
T Chen
T Fawcett
T Reguly
TD Wu
TI Lee
TL Bailey
TL Bailey
VD Marinescu
W Pan
WJ Kent
WP Lehrach
WW Wasserman
X Liu
X Xie
XS Liu
Y Barash
Y Barash
Y Qi
Y Tamada
Publication venue: Public Library of Science
Publication date: 01/03/2008
Field of study

An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Heart enhancers with deeply conserved regulatory activity are established early in zebrafish development.

Author: Bruneau Benoit G
Devine Patrick
Scott Ian C
Song Mengyi
Wilson Michael D
Yuan Xuefei
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

During the phylotypic period, embryos from different genera show similar gene expression patterns, implying common regulatory mechanisms. Here we set out to identify enhancers involved in the initial events of cardiogenesis, which occurs during the phylotypic period. We isolate early cardiac progenitor cells from zebrafish embryos and characterize 3838 open chromatin regions specific to this cell population. Of these regions, 162 overlap with conserved non-coding elements (CNEs) that also map to open chromatin regions in human. Most of the zebrafish conserved open chromatin elements tested drive gene expression in the developing heart. Despite modest sequence identity, human orthologous open chromatin regions recapitulate the spatial temporal expression patterns of the zebrafish sequence, potentially providing a basis for phylotypic gene expression patterns. Genome-wide, we discover 5598 zebrafish-human conserved open chromatin regions, suggesting that a diverse repertoire of ancient enhancers is established prior to organogenesis and the phylotypic period

Directory of Open Access Journals

eScholarship - University of California

Phylogeny based discovery of regulatory elements

Author: Cohen Barak A
Fay Justin C
Gertz Jason
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Algorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; however, most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences. RESULTS: We developed a probabilistic framework that combines an HKY85 model, which assigns probabilities to different base substitutions between species, and weight matrix models of transcription factor binding sites, which describe the probabilities of observing particular nucleotides at specific positions in the binding site. The method incorporates the phylogenies of the species under consideration and takes into account the position specific variation of transcription factor binding sites. Using our framework we assessed the suitability of alignments of genomic sequences from commonly used species as substrates for comparative genomic approaches to regulatory motif finding. We then applied this technique to Saccharomyces cerevisiae and related species by examining all possible six base pair DNA sequences (hexamers) and identifying sequences that are conserved in a significant number of promoters. By combining similar conserved hexamers we reconstructed known cis-regulatory motifs and made predictions of previously unidentified motifs. We tested one prediction experimentally, finding it to be a regulatory element involved in the transcriptional response to glucose. CONCLUSION: The experimental validation of a regulatory element prediction missed by other large-scale motif finding studies demonstrates that our approach is a useful addition to the current suite of tools for finding regulatory motifs

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker