236 research outputs found

    Occupancy Modeling, Maximum Contig Size Probabilities and Designing Metagenomics Experiments

    Get PDF
    Mathematical aspects of coverage and gaps in genome assembly have received substantial attention by bioinformaticians. Typical problems under consideration suppose that reads can be experimentally obtained from a single genome and that the number of reads will be set to cover a large percentage of that genome at a desired depth. In metagenomics experiments genomes from multiple species are simultaneously analyzed and obtaining large numbers of reads per genome is unlikely. We propose the probability of obtaining at least one contig of a desired minimum size from each novel genome in the pool without restriction based on depth of coverage as a metric for metagenomic experimental design. We derive an approximation to the distribution of maximum contig size for single genome assemblies using relatively few reads. This approximation is verified in simulation studies and applied to a number of different metagenomic experimental design problems, ranging in difficulty from detecting a single novel genome in a pool of known species to detecting each of a random number of novel genomes collectively sized and with abundances corresponding to given distributions in a single pool

    MuSiC: Identifying mutational significance in cancer genomes

    Get PDF
    Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery

    Numerical and Experimental Investigation of Circulation in Short Cylinders

    Full text link
    In preparation for an experimental study of magnetorotational instability (MRI) in liquid metal, we explore Couette flows having height comparable to the gap between cylinders, centrifugally stable rotation, and high Reynolds number. Experiments in water are compared with numerical simulations. Simulations show that endcaps corotating with the outer cylinder drive a strong poloidal circulation that redistributes angular momentum. Predicted azimuthal flow profiles agree well with experimental measurements. Spin-down times scale with Reynolds number as expected for laminar Ekman circulation; extrapolation from two-dimensional simulations at Re≤3200Re\le 3200 agrees remarkably well with experiment at Re∼106Re\sim 10^6. This suggests that turbulence does not dominate the effective viscosity. Further detailed numerical studies reveal a strong radially inward flow near both endcaps. After turning vertically along the inner cylinder, these flows converge at the midplane and depart the boundary in a radial jet. To minimize this circulation in the MRI experiment, endcaps consisting of multiple, differentially rotating rings are proposed. Simulations predict that an adequate approximation to the ideal Couette profile can be obtained with a few rings

    Early developmental specification of the thyroid gland depends on han-expressing surrounding tissue and on FGF signals

    Get PDF
    The thyroid is an endocrine gland in all vertebrates that develops from the ventral floor of the anterior pharyngeal endoderm. Unravelling the molecular mechanisms of thyroid development helps to understand congenital hypothyroidism caused by the absence or reduction of this gland in newborn humans. Severely reduced or absent thyroid-specific developmental genes concomitant with the complete loss of the functional gland in the zebrafish hands off (han, hand2) mutant reveals the han gene as playing a novel, crucial role in thyroid development. han-expressing tissues surround the thyroid primordium throughout development. Fate mapping reveals that, even before the onset of thyroid-specific developmental gene expression, thyroid precursor cells are in close contact with han-expressing cardiac lateral plate mesoderm. Grafting experiments show that han is required in surrounding tissue, and not in a cell-autonomous manner, for thyroid development. Loss of han expression in the branchial arches and arch-associated cells after morpholino knock-down of upstream regulator genes does not impair thyroid development, indicating that other han-expressing structures, most probably cardiac mesoderm, are responsible for the thyroid defects in han mutants. The zebrafish ace (fgf8) mutant has similar thyroid defects as han mutants, and chemical suppression of fibroblast growth factor (FGF) signalling confirms that this pathway is required for thyroid development. FGF-soaked beads can restore thyroid development in han mutants, showing that FGFs act downstream of or in parallel to han. These data suggest that loss of FGF-expressing tissue in han mutants is responsible for the thyroid defects

    Tumor Evolution in Two Patients with Basal-like Breast Cancer: A Retrospective Genomics Study of Multiple Metastases

    Get PDF
    Metastasis is the main cause of cancer patient deaths and remains a poorly characterized process. It is still unclear when in tumor progression the ability to metastasize arises and whether this ability is inherent to the primary tumor or is acquired well after primary tumor formation. Next-generation sequencing and analytical methods to define clonal heterogeneity provide a means for identifying genetic events and the temporal relationships between these events in the primary and metastatic tumors within an individual

    Design and implementation of a generalized laboratory data model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Investigators in the biological sciences continue to exploit laboratory automation methods and have dramatically increased the rates at which they can generate data. In many environments, the methods themselves also evolve in a rapid and fluid manner. These observations point to the importance of robust information management systems in the modern laboratory. Designing and implementing such systems is non-trivial and it appears that in many cases a database project ultimately proves unserviceable.</p> <p>Results</p> <p>We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in <it>ad hoc </it>ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions.</p> <p>Conclusion</p> <p>The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.</p

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    Full text link
    The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts
    • …
    corecore