Search CORE

Bayesian modeling of recombination events in bacterial populations

Author: A Baldwin
A Baldwin
A Baldwin
A Rambaut
A Skalka
Adam Baldwin
C Fraser
Chris Dowson
CP Robert
CX Chan
D Falush
D Husmeier
D Posada
DJ Hand
E Mahenthiralingam
E Mahenthiralingam
EHL Aarts
Eshwar Mahenthiralingam
FM Cohan
J Corander
J Corander
J Corander
J Corander
J Felsenstein
J Hein
J Maynard Smith
JG Lawrence
JS Sinsheimer
Jukka Corander
JV Braun
M Arenas
M Hasegawa
MA Suchard
MJ Schervish
NC Grassly
P Marttinen
Pekka Marttinen
R Jain
RA Elton
S Sawyer
SA Sisson
VN Minin
VN Minin
William P Hanage
WJ Wiersinga
WP Hanage
X Didelot
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

Online Research @ Cardiff

Springer - Publisher Connector

Warwick Research Archives Portal Repository

Investigating dynamic and energetic determinants of protein nucleic acid recognition: analysis of the zinc finger zif268-DNA complexes

Author: Caselle Michele
Colombo Giorgio
Moroni Elisabetta
Morra Giulia
Torella Rubben
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein-DNA recognition underlies fundamental biological processes ranging from transcription to replication and modification. Herein, we present a computational study of the sequence modulation of internal dynamic properties and of intraprotein networks of aminoacid interactions that determine the stability and specificity of protein-DNA complexes. Results To this aim, we apply novel theoretical approaches to analyze the dynamics and energetics of biological systems starting from MD trajectories. As model system, we chose different sequences of Zinc Fingers (ZF) of the Zif268 family bound with different sequences of DNA. The complexes differ for their experimental stability properties, but share the same overall 3 D structure and do not undergo structural modifications during the simulations. The results of our analysis suggest that the energy landscape for DNA binding may be populated by dynamically different states, even in the absence of major conformational changes. Energetic couplings between residues change in response to protein and/or DNA sequence variations thus modulating the selectivity of recognition and the relative importance of different regions for binding. Conclusions The results show differences in the organization of the intra-protein energy-networks responsible for the stabilization of the protein conformations recognizing and binding DNA. These, in turn, are reflected into different modulation of the ZF's internal dynamics. The results also show a correlation between energetic and dynamic properties of the different proteins and their specificity/selectivity for DNA sequences. Finally, a dynamic and energetic model for the recognition of DNA by Zinc Fingers is proposed.</p

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Springer - Publisher Connector

Binding of Transcription Factor GabR to DNA Requires Recognition of DNA Shape at a Location Distinct from its Cognate Binding Site

Author: Al-Zyoud Walid A.
Baker Matthew AB.
Böcking Till
Coster Adelle CF.
Duff Anthony P.
Ganuelas Lorraine A.
Gaus Katharina
Giannoulatou Eleni
Ho Joshua WK\u3e
Hynson Robert MG.
Lee Lawrence K.
Liu Dali
Stewart Alastair G.
Publication venue: Loyola eCommons
Publication date: 17/12/2015
Field of study

Mechanisms for transcription factor recognition of specific DNA base sequences are well characterized and recent studies demonstrate that the shape of these cognate binding sites is also important. Here, we uncover a new mechanism where the transcription factor GabR simultaneously recognizes two cognate binding sites and the shape of a 29 bp DNA sequence that bridges these sites. Small-angle X-ray scattering and multi-angle laser light scattering are consistent with a model where the DNA undergoes a conformational change to bend around GabR during binding. In silico predictions suggest that the bridging DNA sequence is likely to be bendable in one direction and kinetic analysis of mutant DNA sequences with biolayer interferometry, allowed the independent quantification of the relative contribution of DNA base and shape recognition in the GabR–DNA interaction. These indicate that the two cognate binding sites as well as the bendability of the DNA sequence in between these sites are required to form a stable complex. The mechanism of GabR–DNA interaction provides an example where the correct shape of DNA, at a clearly distinct location from the cognate binding site, is required for transcription factor binding and has implications for bioinformatics searches for novel binding sites

arXiv.org e-Print Archive

UNSWorks

Loyola eCommons

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis.

Author: Dhar Amrit
Matsen Frederick A
Minin Vladimir N
Ralph Duncan K
Publication venue: eScholarship, University of California
Publication date: 27/06/2019
Field of study

The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling

eScholarship - University of California

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Into the unknown: expression profiling without genome sequence information in CHO by next generation sequencing

Author: Aggarwal
Altschul
Andreas Weith
Birch
Birzele
Choi
Christoph Clemens
De Leon
Fabian Birzele
Fu
Gentleman
Hendrick
Hillier
Hitto Kaufmann
Hubbard
Jochen Schaub
Kahvejian
Kantardjieff
Langmead
Li
Lieberman-Aiden
Lu
Marioni
McKernan
Metzker
Mortazavi
Nicolas
Park
Patrick Baum
Schaub
Seth
Shendure
Smyth
Sultan
Tang
Tirone
Tobias Hildebrandt
Tomasini
Torsten W. Schulz
Vencio
Wang
Werner Rust
Wlaschin
Wlaschin
Yee
Yee
Zerbino
Publication venue: Oxford University Press
Publication date
Field of study

The arrival of next-generation sequencing (NGS) technologies has led to novel opportunities for expression profiling and genome analysis by utilizing vast amounts of short read sequence data. Here, we demonstrate that expression profiling in organisms lacking any genome or transcriptome sequence information is feasible by combining Illumina’s mRNA-seq technology with a novel bioinformatics pipeline that integrates assembled and annotated Chinese hamster ovary (CHO) sequences with information derived from related organisms. We applied this pipeline to the analysis of CHO cells which were chosen as a model system owing to its relevance in the production of therapeutic proteins. Specifically, we analysed CHO cells undergoing butyrate treatment which is known to affect cell cycle regulation and to increase the specific productivity of recombinant proteins. By this means, we identified sequences for >13 000 CHO genes which added sequence information of ∼5000 novel genes to the CHO model. More than 6000 transcript sequences are predicted to be complete, as they covered >95% of the corresponding mouse orthologs. Detailed analysis of selected biological functions such as DNA replication and cell cycle control, demonstrated the potential of NGS expression profiling in organisms without extended genome sequence to improve both data quantity and quality