386 research outputs found

    Genome assembly forensics: finding the elusive mis-assembly

    Get PDF
    A collection of software tools is combined for the first time in an automated pipeline for detecting large-scale genome assembly errors and for validating genome assemblies

    Hawkeye: An interactive visual analytics tool for genome assemblies

    Get PDF
    Genome sequencing remains an inexact science, and genome sequences can contain significant errors if they are not carefully examined. Hawkeye is our new visual analytics tool for genome assemblies, designed to aid in identifying and correcting assembly errors. Users can analyze all levels of an assembly along with summary statistics and assembly metrics, and are guided by a ranking component towards likely mis-assemblies. Hawkeye is freely available and released as part of the open source AMOS project http://amos.sourceforge.net/hawkeye. © 2007 Schatz et al.; licensee BioMed Central Ltd

    High Performance Computing for DNA Sequence Alignment and Assembly

    Get PDF
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical

    Building and Improving Reference Genome Assemblies: This paper reviews the problems and algorithms of assembling a complete genome from millions of short DNA sequencing reads

    Get PDF
    A genome sequence assembly provides the foundation for studies of genotypic and phenotypic variation, genome structure, and evolution of the target organism. In the past four decades, there has been a surge of new sequencing technologies, and with these developments, computational scientists have developed new algorithms to improve genome assembly. Here we discuss the relationship between sequencing technology improvements and assembly algorithm development and how these are applied to extend and improve human and nonhuman genome assemblies. © 1963-2012 IEEE

    Whole-genome sequence analysis for pathogen detection and diagnostics

    Get PDF
    This dissertation focuses on computational methods for improving the accuracy of commonly used nucleic acid tests for pathogen detection and diagnostics. Three specific biomolecular techniques are addressed: polymerase chain reaction, microarray comparative genomic hybridization, and whole-genome sequencing. These methods are potentially the future of diagnostics, but each requires sophisticated computational design or analysis to operate effectively. This dissertation presents novel computational methods that unlock the potential of these diagnostics by efficiently analyzing whole-genome DNA sequences. Improvements in the accuracy and resolution of each of these diagnostic tests promises more effective diagnosis of illness and rapid detection of pathogens in the environment. For designing real-time detection assays, an efficient data structure and search algorithm are presented to identify the most distinguishing sequences of a pathogen that are absent from all other sequenced genomes. Results are presented that show these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic, phylogenetic near neighbors. For microarray, novel pan-genomic design and analysis methods are presented for the characterization of unknown microbial isolates. To demonstrate the effectiveness of these methods, pan-genomic arrays are applied to the study of multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies, which are capable of identifying assembly errors in even finished genomes. These validated assemblies provide the ultimate nucleic acid diagnostic, revealing the entire sequence of a genome

    The Diploid Genome Sequence of an Individual Human

    Get PDF
    Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information

    Civil Disturbances: Battles for Justice in New York City

    Get PDF
    This Collection contains a number of essays that are a part of Civil Disturbances, a collaborative project between artists and lawyers that commemorates various public interest law suits and social justice efforts in New York City. The project itself consists of twenty signs, each representing one specific case, that were designed to be both provoking and informative. This specific Collection contains printings of eight of the signs, as well as separate writings on issues and cases including: disabled people\u27s accessibility to the Empire State Building, child welfare, children\u27s rights, women and the FDNY, rights of the homeless, and welfare benefits. Each essay presents a unique look at a specific social justice problem in New York, the case it was associated with, and sometimes a look at the accompanying sign as part of the project

    Feature-based design of solids with local composition control

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Ocean Engineering, 2004.Includes bibliographical references (leaves 126-134).This thesis presents a parametric and feature-based methodology for the design of solids with local composition control (LCC). A suite of composition design features are conceptualized and implemented. The designer can use them singly or in combination, to specify the composition of complex components. Each material composition design feature relates directly to the geometry of the design, often relying on user interaction to specify critical aspects of the geometry. This approach allows the designer to simultaneously edit geometry and composition by varying parameters until a satisfactory result is attained. The identified LCC features are those based on volume, transition, pattern, and (user-defined) surface features. The material composition functions include functions parametrized with respect to distance or distances to user-defined geometric features; and functions that use Laplace's equation to blend smoothly various boundary conditions including values and gradients of the material composition on the boundaries. The Euclidean digital distance transform and the boundary element method are adapted to the efficient computation of composition functions. Theoretical and experimental complexity, accuracy and convergence analyses are presented. The developed model is a multi-level and graph-based representation, thereby allowing for controls on the model validity and efficiency in model management. The representations underlying the composition design features are analytic in nature and therefore concise. Evaluation for visualization and fabrication is performed only at the resolutions required for these purposes, thereby reducing the computational burden.by Hongye Liu.Ph.D

    NASA Tech Briefs, January 1989

    Get PDF
    Topics include: Electronic Components & and Circuits. Electronic Systems, A Physical Sciences, Materials, Computer Programs, Mechanics, Machinery, Fabrication Technology, Mathematics and Information Sciences, and Life Sciences
    corecore