29 research outputs found

    Assembling genomes using short-read sequencing technology

    Get PDF
    Short-read sequencing technology can bring gigabase genome assemblies in under a million dollars

    Genome sequences of six Phytophthora species threatening forest ecosystems

    Get PDF
    AbstractThe Phytophthora genus comprises of some of the most destructive plant pathogens and attack a wide range of hosts including economically valuable tree species, both angiosperm and gymnosperm. Many known species of Phytophthora are invasive and have been introduced through nursery and agricultural trade. As part of a larger project aimed at utilizing genomic data for forest disease diagnostics, pathogen detection and monitoring (The TAIGA project: Tree Aggressors Identification using Genomic Approaches; http://taigaforesthealth.com/), we sequenced the genomes of six important Phytophthora species that are important invasive pathogens of trees and a serious threat to the international trade of forest products. This genomic data was used to develop highly sensitive and specific detection assays and for genome comparisons and to make evolutionary inferences and will be useful to the broader plant and tree health community. These WGS data have been deposited in the International Nucleotide Sequence Database Collaboration (DDBJ/ENA/GenBank) under the accession numbers AUPN01000000, AUVH01000000, AUWJ02000000, AUUF02000000, AWVV02000000 and AWVW02000000

    Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma

    Get PDF
    Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis

    UniqTag: Content-Derived Unique and Stable Identifiers for Gene Annotation

    No full text
    <div><p>When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative <i>k</i>-mer, a string of length <i>k</i>, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at <a href="https://github.com/sjackman/uniqtag" target="_blank">https://github.com/sjackman/uniqtag</a> sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at <a href="https://github.com/sjackman/uniqtag-paper" target="_blank">https://github.com/sjackman/uniqtag-paper</a>.</p></div

    The number of common UniqTag identifiers between build 75 of the Ensembl human genome and nine other builds, the number of common gene and protein identifiers between builds, and the number of genes with peptide sequences that are identical between builds.

    No full text
    <p>The number of common UniqTag identifiers between build 75 of the Ensembl human genome and nine other builds, the number of common gene and protein identifiers between builds, and the number of genes with peptide sequences that are identical between builds.</p

    ABySS: A parallel assembler for short read sequence data

    No full text
    Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs ≥100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes
    corecore