37,437 research outputs found
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
Reconstructing DNA copy number by joint segmentation of multiple sequences
The variation in DNA copy number carries information on the modalities of
genome evolution and misregulation of DNA replication in cancer cells; its
study can be helpful to localize tumor suppressor genes, distinguish different
populations of cancerous cell, as well identify genomic variations responsible
for disease phenotypes. A number of different high throughput technologies can
be used to identify copy number variable sites, and the literature documents
multiple effective algorithms. We focus here on the specific problem of
detecting regions where variation in copy number is relatively common in the
sample at hand: this encompasses the cases of copy number polymorphisms,
related samples, technical replicates, and cancerous sub-populations from the
same individual. We present an algorithm based on regularization approaches
with significant computational advantages and competitive accuracy. We
illustrate its applicability with simulated and real data sets.Comment: 54 pages, 5 figure
Comparison of TCGA and GENIE genomic datasets for the detection of clinically actionable alterations in breast cancer.
Whole exome sequencing (WES), targeted gene panel sequencing and single nucleotide polymorphism (SNP) arrays are increasingly used for the identification of actionable alterations that are critical to cancer care. Here, we compared The Cancer Genome Atlas (TCGA) and the Genomics Evidence Neoplasia Information Exchange (GENIE) breast cancer genomic datasets (array and next generation sequencing (NGS) data) in detecting genomic alterations in clinically relevant genes. We performed an in silico analysis to determine the concordance in the frequencies of actionable mutations and copy number alterations/aberrations (CNAs) in the two most common breast cancer histologies, invasive lobular and invasive ductal carcinoma. We found that targeted sequencing identified a larger number of mutational hotspots and clinically significant amplifications that would have been missed by WES and SNP arrays in many actionable genes such as PIK3CA, EGFR, AKT3, FGFR1, ERBB2, ERBB3 and ESR1. The striking differences between the number of mutational hotspots and CNAs generated from these platforms highlight a number of factors that should be considered in the interpretation of array and NGS-based genomic data for precision medicine. Targeted panel sequencing was preferable to WES to define the full spectrum of somatic mutations present in a tumor
Recommended from our members
Impacts of florfenicol on the microbiota landscape and resistome as revealed by metagenomic analysis.
BACKGROUND:Drug-resistant fish pathogens can cause significant economic loss to fish farmers. Since 2012, florfenicol has become an approved drug for treating both septicemia and columnaris diseases in freshwater fish. Due to the limited drug options available for aquaculture, the impact of the therapeutical florfenicol treatment on the microbiota landscape as well as the resistome present in the aquaculture farm environment needs to be evaluated. RESULTS:Time-series metagenomic analyses were conducted to the aquatic microbiota present in the tank-based catfish production systems, in which catfish received standard therapeutic 10-day florfenicol treatment following the federal veterinary regulations. Results showed that the florfenicol treatment shifted the structure of the microbiota and reduced the biodiversity of it by acting as a strong stressor. Planctomycetes, Chloroflexi, and 13 other phyla were susceptible to the florfenicol treatment and their abundance was inhibited by the treatment. In contrast, the abundance of several bacteria belonging to the Proteobacteria, Bacteroidetes, Actinobacteria, and Verrucomicrobia phyla increased. These bacteria with increased abundance either harbor florfenicol-resistant genes (FRGs) or had beneficial mutations. The florfenicol treatment promoted the proliferation of florfenicol-resistant genes. The copy number of phenicol-specific resistance genes as well as multiple classes of antibiotic-resistant genes (ARGs) exhibited strong correlations across different genetic exchange communities (p < 0.05), indicating the horizontal transfer of florfenicol-resistant genes among these bacterial species or genera. Florfenicol treatment also induced mutation-driven resistance. Significant changes in single-nucleotide polymorphism (SNP) allele frequencies were observed in membrane transporters, genes involved in recombination, and in genes with primary functions of a resistance phenotype. CONCLUSIONS:The therapeutical level of florfenicol treatment significantly altered the microbiome and resistome present in catfish tanks. Both intra-population and inter-population horizontal ARG transfer was observed, with the intra-population transfer being more common. The oxazolidinone/phenicol-resistant gene optrA was the most prevalent transferred ARG. In addition to horizontal gene transfer, bacteria could also acquire florfenicol resistance by regulating the innate efflux systems via mutations. The observations made by this study are of great importance for guiding the strategic use of florfenicol, thus preventing the formation, persistence, and spreading of florfenicol-resistant bacteria and resistance genes in aquaculture
Gene expression in Leishmania is regulated predominantly by gene dosage
ABSTRACT Leishmania tropica, a unicellular eukaryotic parasite present in North and East Africa, the Middle East, and the Indian subcontinent, has been linked to large outbreaks of cutaneous leishmaniasis in displaced populations in Iraq, Jordan, and Syria. Here, we report the genome sequence of this pathogen and 7,863 identified protein-coding genes, and we show that the majority of clinical isolates possess high levels of allelic diversity, genetic admixture, heterozygosity, and extensive aneuploidy. By utilizing paired genome-wide high-throughput DNA sequencing (DNA-seq) with RNA-seq, we found that gene dosage, at the level of individual genes or chromosomal “somy” (a general term covering disomy, trisomy, tetrasomy, etc.), accounted for greater than 85% of total gene expression variation in genes with a 2-fold or greater change in expression. High gene copy number variation (CNV) among membrane-bound transporters, a class of proteins previously implicated in drug resistance, was found for the most highly differentially expressed genes. Our results suggest that gene dosage is an adaptive trait that confers phenotypic plasticity among natural Leishmania populations by rapid down- or upregulation of transporter proteins to limit the effects of environmental stresses, such as drug selection. IMPORTANCE Leishmania is a genus of unicellular eukaryotic parasites that is responsible for a spectrum of human diseases that range from cutaneous leishmaniasis (CL) and mucocutaneous leishmaniasis (MCL) to life-threatening visceral leishmaniasis (VL). Developmental and strain-specific gene expression is largely thought to be due to mRNA message stability or posttranscriptional regulatory networks for this species, whose genome is organized into polycistronic gene clusters in the absence of promoter-mediated regulation of transcription initiation of nuclear genes. Genetic hybridization has been demonstrated to yield dramatic structural genomic variation, but whether such changes in gene dosage impact gene expression has not been formally investigated. Here we show that the predominant mechanism determining transcript abundance differences (>85%) in Leishmania tropica is that of gene dosage at the level of individual genes or chromosomal somy
- …