Search CORE

5,106 research outputs found

AMD, an Automated Motif Discovery Tool Using Stepwise Refinement of Gapped Consensuses

Author: A Barski
A Chakravarty
A Marson
A Stark
AD Smith
AD Smith
Ashok Aiyar
C Bi
C Linhart
CT Harbison
D GuhaThakurta
DS Johnson
E Redhead
E Valen
E Wijaya
FP Roth
G Laux
G Pavesi
H Ji
HJ Bussemaker
JD Hughes
Ji Zhang
Jiantao Shi
JM Vaquerizas
JS Carroll
K Weigelt
Kankan Wang
KD MacIsaac
L Ettwiller
M Kellis
M Lupien
M Tompa
MC Frith
Mingjie Chen
O Elemento
P Tamayo
PA Pevzner
S Prabhakar
S Sinha
SA Vokes
TL Bailey
V Matys
Wentao Yang
X Xie
X Xie
XS Liu
Y Zhang
Yanzhi Du
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Motif discovery is essential for deciphering regulatory codes from high throughput genomic data, such as those from ChIP-chip/seq experiments. However, there remains a lack of effective and efficient methods for the identification of long and gapped motifs in many relevant tools reported to date. We describe here an automated tool that allows for de novo discovery of transcription factor binding sites, regardless of whether the motifs are long or short, gapped or contiguous

CiteSeerX

Directory of Open Access Journals

USF binding sequences from the HS4 insulator element impose early replication timing on a vertebrate replicator

Author: Boggetto N.
Cadoret J-C.
Chilaka S.
Hassan-Zadeh V.
Ma M.
Prioleau M-N.
West A. G.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

The nuclear genomes of vertebrates show a highly organized program of DNA replication where GC-rich isochores are replicated early in S-phase, while AT-rich isochores are late replicating. GC-rich regions are gene dense and are enriched for active transcription, suggesting a connection between gene regulation and replication timing. Insulator elements can organize independent domains of gene transcription and are suitable candidates for being key regulators of replication timing. We have tested the impact of inserting a strong replication origin flanked by the β-globin HS4 insulator on the replication timing of naturally late replicating regions in two different avian cell types, DT40 (lymphoid) and 6C2 (erythroid). We find that the HS4 insulator has the capacity to impose a shift to earlier replication. This shift requires the presence of HS4 on both sides of the replication origin and results in an advance of replication timing of the target locus from the second half of S-phase to the first half when a transcribed gene is positioned nearby. Moreover, we find that the USF transcription factor binding site is the key cis-element inside the HS4 insulator that controls replication timing. Taken together, our data identify a combination of cis-elements that might constitute the basic unit of multi-replicon megabase-sized early domains of DNA replication

Directory of Open Access Journals

Enlighten

FigShare

Bayesian statistical analysis of bacterial diversity

Author: Tang Jing
Publication venue: 'University of Helsinki Libraries'
Publication date: 15/05/2009
Field of study

Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment

Helsingin yliopiston digitaalinen arkisto