2,527 research outputs found
Spatial clustering of array CGH features in combination with hierarchical multiple testing
We propose a new approach for clustering DNA features using array CGH data
from multiple tumor samples. We distinguish data-collapsing: joining contiguous
DNA clones or probes with extremely similar data into regions, from clustering:
joining contiguous, correlated regions based on a maximum likelihood principle.
The model-based clustering algorithm accounts for the apparent spatial patterns
in the data. We evaluate the randomness of the clustering result by a cluster
stability score in combination with cross-validation. Moreover, we argue that
the clustering really captures spatial genomic dependency by showing that
coincidental clustering of independent regions is very unlikely. Using the
region and cluster information, we combine testing of these for association
with a clinical variable in an hierarchical multiple testing approach. This
allows for interpreting the significance of both regions and clusters while
controlling the Family-Wise Error Rate simultaneously. We prove that in the
context of permutation tests and permutation-invariant clusters it is allowed
to perform clustering and testing on the same data set. Our procedures are
illustrated on two cancer data sets
Genomic Diversity among Beijing and non-Beijing Mycobacterium tuberculosis Isolates from Myanmar
Background: The Beijing family of Mycobacterium tuberculosis is dominant in countries in East Asia. Genomic
polymorphisms are a source of diversity within the M. tuberculosis genome and may account for the variation of virulence among M. tuberculosis isolates. Till date there are no studies that have examined the genomic composition of M. tuberculosis isolates from the high TB-burden country, Myanmar.
Methodology/Principle Findings: Twenty-two M. tuberculosis isolates from Myanmar were screened on whole-genome
arrays containing genes from M. tuberculosis H37Rv, M. tuberculosis CDC1551 and M. bovis AF22197. Screening identified 198 deletions or extra regions in the clinical isolates compared to H37Rv. Twenty-two regions differentiated between Beijing and non-Beijing isolates and were verified by PCR on an additional 40 isolates. Six regions (Rv0071-0074 [RD105], Rv1572-1576c [RD149], Rv1585c-1587c [RD149], MT1798-Rv1755c [RD152], Rv1761c [RD152] and Rv0279c) were deleted in Beijing isolates, of
which 4 (Rv1572-1576c, Rv1585c-1587c, MT1798-Rv1755c and Rv1761c) were variably deleted among ST42 isolates, indicating a closer relationship between the Beijing and ST42 lineages. The TbD1 region, Mb1582-Mb1583 was deleted in Beijing and ST42 isolates. One M. bovis gene of unknown function, Mb3184c was present in all isolates, except 11 of 13 ST42 isolates. The CDC1551 gene, MT1360 coding for a putative adenylate cyclase, was present in all Beijing and ST42 isolates (except 1). The pks15/1 gene, coding for a putative virulence factor, was intact in all Beijing and non-Beijing isolates, except in ST42 and ST53 isolates.
Conclusion: This study describes previously unreported deletions/extra regions in Beijing and non-Beijing M. tuberculosis isolates. The modern and highly frequent ST42 lineage showed a closer relationship to the hypervirulent Beijing lineage than to the ancient non-Beijing lineages. The pks15/1 gene was disrupted only in modern non-Beijing isolates. This is the first report of an in-depth analysis on the genomic diversity of M. tuberculosis isolates from Myanmar
Generalized Species Sampling Priors with Latent Beta reinforcements
Many popular Bayesian nonparametric priors can be characterized in terms of
exchangeable species sampling sequences. However, in some applications,
exchangeability may not be appropriate. We introduce a {novel and
probabilistically coherent family of non-exchangeable species sampling
sequences characterized by a tractable predictive probability function with
weights driven by a sequence of independent Beta random variables. We compare
their theoretical clustering properties with those of the Dirichlet Process and
the two parameters Poisson-Dirichlet process. The proposed construction
provides a complete characterization of the joint process, differently from
existing work. We then propose the use of such process as prior distribution in
a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte
Carlo sampler for posterior inference. We evaluate the performance of the prior
and the robustness of the resulting inference in a simulation study, providing
a comparison with popular Dirichlet Processes mixtures and Hidden Markov
Models. Finally, we develop an application to the detection of chromosomal
aberrations in breast cancer by leveraging array CGH data.Comment: For correspondence purposes, Edoardo M. Airoldi's email is
[email protected]; Federico Bassetti's email is
[email protected]; Michele Guindani's email is
[email protected] ; Fabrizo Leisen's email is
[email protected]. To appear in the Journal of the American
Statistical Associatio
Recommended from our members
Genetic Variation in Spatio-Temporal Confined USA300 Community-Associated MRSA Isolates: A Shift from Clonal Dispersion to Genetic Evolution?
NTRODUCTION: Community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) are increasingly isolated, with USA300-0114 being the predominant clone in the USA. Comparative whole genome sequencing of USA300 isolates collected in 2002, 2003 and 2005 showed a limited number of single nucleotide polymorphisms and regions of difference. This suggests that USA300 has undergone rapid clonal expansion without great genomic diversification. However, whole genome comparison of CA-MRSA has been limited to isolates belonging to USA300. The aim of this study was to compare the genetic repertoire of different CA-MRSA clones with that of HA-MRSA from the USA and Europe through comparative genomic hybridization (CGH) to identify genetic clues that may explain the successful and rapid emergence of CA-MRSA.
MATERIALS AND METHODS: Hierarchical clustering based on CGH of 48 MRSA isolates from the community and nosocomial infections from Europe and the USA revealed dispersed clustering of the 19 CA-MRSA isolates. This means that these 19 CA-MRSA isolates do not share a unique genetic make-up. Only the PVL genes were commonly present in all CA-MRSA isolates. However, 10 genes were variably present among 14 USA300 isolates. Most of these genes were present on mobile elements.
CONCLUSION: The genetic variation present among the 14 USA300 isolates is remarkable considering the fact that the isolates were recovered within one month and originated from a confined geographic area, suggesting continuous evolution of this clone
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Three-dimensional scanless holographic optogenetics with temporal focusing (3D-SHOT).
Optical methods capable of manipulating neural activity with cellular resolution and millisecond precision in three dimensions will accelerate the pace of neuroscience research. Existing approaches for targeting individual neurons, however, fall short of these requirements. Here we present a new multiphoton photo-excitation method, termed three-dimensional scanless holographic optogenetics with temporal focusing (3D-SHOT), which allows precise, simultaneous photo-activation of arbitrary sets of neurons anywhere within the addressable volume of a microscope. This technique uses point-cloud holography to place multiple copies of a temporally focused disc matching the dimensions of a neurons cell body. Experiments in cultured cells, brain slices, and in living mice demonstrate single-neuron spatial resolution even when optically targeting randomly distributed groups of neurons in 3D. This approach opens new avenues for mapping and manipulating neural circuits, allowing a real-time, cellular resolution interface to the brain
- …