61 research outputs found

    A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    Get PDF
    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.

    A comparative genomics multitool for scientific discovery and conservation

    Get PDF
    A whole-genome alignment of 240 phylogenetically diverse species of eutherian mammal-including 131 previously uncharacterized species-from the Zoonomia Project provides data that support biological discovery, medical research and conservation. The Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.Peer reviewe

    Dissecting the Shared Genetic Architecture of Suicide Attempt, Psychiatric Disorders, and Known Risk Factors

    Get PDF
    Background Suicide is a leading cause of death worldwide, and nonfatal suicide attempts, which occur far more frequently, are a major source of disability and social and economic burden. Both have substantial genetic etiology, which is partially shared and partially distinct from that of related psychiatric disorders. Methods We conducted a genome-wide association study (GWAS) of 29,782 suicide attempt (SA) cases and 519,961 controls in the International Suicide Genetics Consortium (ISGC). The GWAS of SA was conditioned on psychiatric disorders using GWAS summary statistics via multitrait-based conditional and joint analysis, to remove genetic effects on SA mediated by psychiatric disorders. We investigated the shared and divergent genetic architectures of SA, psychiatric disorders, and other known risk factors. Results Two loci reached genome-wide significance for SA: the major histocompatibility complex and an intergenic locus on chromosome 7, the latter of which remained associated with SA after conditioning on psychiatric disorders and replicated in an independent cohort from the Million Veteran Program. This locus has been implicated in risk-taking behavior, smoking, and insomnia. SA showed strong genetic correlation with psychiatric disorders, particularly major depression, and also with smoking, pain, risk-taking behavior, sleep disturbances, lower educational attainment, reproductive traits, lower socioeconomic status, and poorer general health. After conditioning on psychiatric disorders, the genetic correlations between SA and psychiatric disorders decreased, whereas those with nonpsychiatric traits remained largely unchanged. Conclusions Our results identify a risk locus that contributes more strongly to SA than other phenotypes and suggest a shared underlying biology between SA and known risk factors that is not mediated by psychiatric disorders.Peer reviewe

    Erratum: Corrigendum: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution

    Get PDF
    International Chicken Genome Sequencing Consortium. The Original Article was published on 09 December 2004. Nature432, 695–716 (2004). In Table 5 of this Article, the last four values listed in the ‘Copy number’ column were incorrect. These should be: LTR elements, 30,000; DNA transposons, 20,000; simple repeats, 140,000; and satellites, 4,000. These errors do not affect any of the conclusions in our paper. Additional information. The online version of the original article can be found at 10.1038/nature0315

    Cohort Profile: Burden of Obstructive Lung Disease (BOLD) study

    Get PDF
    The Burden of Obstructive Lung Disease (BOLD) study was established to assess the prevalence of chronic airflow obstruction, a key characteristic of chronic obstructive pulmonary disease, and its risk factors in adults (≥40 years) from general populations across the world. The baseline study was conducted between 2003 and 2016, in 41 sites across Africa, Asia, Europe, North America, the Caribbean and Oceania, and collected high-quality pre- and post-bronchodilator spirometry from 28 828 participants. The follow-up study was conducted between 2019 and 2021, in 18 sites across Africa, Asia, Europe and the Caribbean. At baseline, there were in these sites 12 502 participants with high-quality spirometry. A total of 6452 were followed up, with 5936 completing the study core questionnaire. Of these, 4044 also provided high-quality pre- and post-bronchodilator spirometry. On both occasions, the core questionnaire covered information on respiratory symptoms, doctor diagnoses, health care use, medication use and ealth status, as well as potential risk factors. Information on occupation, environmental exposures and diet was also collected

    MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation

    No full text

    Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families.

    No full text
    The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family

    Curation Guidelines for de novo Generated Transposable Element Families.

    No full text
    Transposable elements (TEs) have the ability to alter individual genomic landscapes and shape the course of evolution for species in which they reside. Such profound changes can be understood by studying the biology of the organism and the interplay of the TEs it hosts. Characterizing and curating TEs across a wide range of species is a fundamental first step in this endeavor. This protocol employs techniques honed while developing TE libraries for a wide range of organisms and specifically addresses: (1) the extension of truncated de novo results into full-length TE families; (2) the iterative refinement of TE multiple sequence alignments; and (3) the use of alignment visualization to assess model completeness and subfamily structure. © 2021 Wiley Periodicals LLC. Basic Protocol: Extension and edge polishing of consensi and seed alignments derived from de novo repeat finders Support Protocol: Generating seed alignments using a library of consensi and a genome assembly

    Methodologies for the De novo Discovery of Transposable Element Families

    No full text
    The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field
    corecore