259 research outputs found

    Aspects of coverage in medical DNA sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations.</p> <p>Results</p> <p>We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant.</p> <p>Conclusion</p> <p>Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.</p

    Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments

    Get PDF
    Mathematical aspects of coverage and gaps in genome assembly have received substantial attention by bioinformaticians. Typical problems under consideration suppose that reads can be experimentally obtained from a single genome and that the number of reads will be set to cover a large percentage of that genome at a desired depth. In metagenomics experiments genomes from multiple species are simultaneously analyzed and obtaining large numbers of reads per genome is unlikely. We propose the probability of obtaining at least one contig of a desired minimum size from each novel genome in the pool without restriction based on depth of coverage as a metric for metagenomic experimental design. We derive an approximation to the distribution of maximum contig size for single genome assemblies using relatively few reads. This approximation is verified in simulation studies and applied to a number of different metagenomic experimental design problems, ranging in difficulty from detecting a single novel genome in a pool of known species to detecting each of a random number of novel genomes collectively sized and with abundances corresponding to given distributions in a single pool

    Comprehensive characterization of the multiple myeloma immune microenvironment using integrated scRNA-seq, CyTOF, and CITE-seq analysis

    Get PDF
    UNLABELLED: As part of the Multiple Myeloma Research Foundation (MMRF) immune atlas pilot project, we compared immune cells of multiple myeloma bone marrow samples from 18 patients assessed by single-cell RNA sequencing (scRNA-seq), mass cytometry (CyTOF), and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) to understand the concordance of measurements among single-cell techniques. Cell type abundances are relatively consistent across the three approaches, while variations are observed in T cells, macrophages, and monocytes. Concordance and correlation analysis of cell type marker gene expression across different modalities highlighted the importance of choosing cell type marker genes best suited to particular modalities. By integrating data from these three assays, we found International Staging System stage 3 patients exhibited decreased CD4 SIGNIFICANCE: scRNA-seq, CyTOF, and CITE-seq are increasingly used for evaluating cellular heterogeneity. Understanding their concordances is of great interest. To date, this study is the most comprehensive examination of the measurement of the immune microenvironment in multiple myeloma using the three techniques. Moreover, we identified markers predicted to be significantly associated with multiple myeloma rapid progression

    Numerical and Experimental Investigation of Circulation in Short Cylinders

    Full text link
    In preparation for an experimental study of magnetorotational instability (MRI) in liquid metal, we explore Couette flows having height comparable to the gap between cylinders, centrifugally stable rotation, and high Reynolds number. Experiments in water are compared with numerical simulations. Simulations show that endcaps corotating with the outer cylinder drive a strong poloidal circulation that redistributes angular momentum. Predicted azimuthal flow profiles agree well with experimental measurements. Spin-down times scale with Reynolds number as expected for laminar Ekman circulation; extrapolation from two-dimensional simulations at Re3200Re\le 3200 agrees remarkably well with experiment at Re106Re\sim 10^6. This suggests that turbulence does not dominate the effective viscosity. Further detailed numerical studies reveal a strong radially inward flow near both endcaps. After turning vertically along the inner cylinder, these flows converge at the midplane and depart the boundary in a radial jet. To minimize this circulation in the MRI experiment, endcaps consisting of multiple, differentially rotating rings are proposed. Simulations predict that an adequate approximation to the ideal Couette profile can be obtained with a few rings

    Tumor Evolution in Two Patients with Basal-like Breast Cancer: A Retrospective Genomics Study of Multiple Metastases

    Get PDF
    Metastasis is the main cause of cancer patient deaths and remains a poorly characterized process. It is still unclear when in tumor progression the ability to metastasize arises and whether this ability is inherent to the primary tumor or is acquired well after primary tumor formation. Next-generation sequencing and analytical methods to define clonal heterogeneity provide a means for identifying genetic events and the temporal relationships between these events in the primary and metastatic tumors within an individual

    Personalized Pathway Enrichment Map of Putative Cancer Genes from Next Generation Sequencing Data

    Get PDF
    BACKGROUND: Pathway analysis of a set of genes represents an important area in large-scale omic data analysis. However, the application of traditional pathway enrichment methods to next-generation sequencing (NGS) data is prone to several potential biases, including genomic/genetic factors (e.g., the particular disease and gene length) and environmental factors (e.g., personal life-style and frequency and dosage of exposure to mutagens). Therefore, novel methods are urgently needed for these new data types, especially for individual-specific genome data. METHODOLOGY: In this study, we proposed a novel method for the pathway analysis of NGS mutation data by explicitly taking into account the gene-wise mutation rate. We estimated the gene-wise mutation rate based on the individual-specific background mutation rate along with the gene length. Taking the mutation rate as a weight for each gene, our weighted resampling strategy builds the null distribution for each pathway while matching the gene length patterns. The empirical P value obtained then provides an adjusted statistical evaluation. PRINCIPAL FINDINGS/CONCLUSIONS: We demonstrated our weighted resampling method to a lung adenocarcinomas dataset and a glioblastoma dataset, and compared it to other widely applied methods. By explicitly adjusting gene-length, the weighted resampling method performs as well as the standard methods for significant pathways with strong evidence. Importantly, our method could effectively reject many marginally significant pathways detected by standard methods, including several long-gene-based, cancer-unrelated pathways. We further demonstrated that by reducing such biases, pathway crosstalk for each individual and pathway co-mutation map across multiple individuals can be objectively explored and evaluated. This method performs pathway analysis in a sample-centered fashion, and provides an alternative way for accurate analysis of cancer-personalized genomes. It can be extended to other types of genomic data (genotyping and methylation) that have similar bias problems

    Joint Analysis of Multiple Metagenomic Samples

    Get PDF
    The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed “binning”) algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough
    corecore