160 research outputs found

    The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms

    Get PDF
    In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics

    dbVar structural variant cluster set for data analysis and variant comparison

    Get PDF
    dbVar houses over 3 million submitted structural variants (SSV) from 120 human studies including copy number variations (CNV), insertions, deletions, inversions, translocations, and complex chromosomal rearrangements. Users can submit multiple SSVs to dbVAR that are presumably identical, but were ascertained by different platforms and samples, to calculate whether the variant is rare or common in the population and allow for cross validation. However, because SSV genomic location reporting can vary – including fuzzy locations where the start and/or end points are not precisely known – analysis, comparison, annotation, and reporting of SSVs across studies can be difficult. This project was initiated by the Structural Variant Comparison Group for the purpose of generating a non-redundant set of genomic regions defined by counts of concordance for all human SSVs placed on RefSeq assembly GRCh38 (RefSeq accession GCF_000001405.26). We intend that the availability of these regions, called structural variant clusters (SVCs), will facilitate the analysis, annotation, and exchange of SV data and allow for simplified display in genomic sequence viewers for improved variant interpretation. Sets of SVCs were generated by variant type for each of the 120 studies as well as for a combined set across all studies. Starting from 3.64 million SSVs, 2.5 million and 3.4 million non-redundant SVCs with count \u3e=1 were generated by variant type for each study and across all studies, respectively. In addition, we have developed utilities for annotating, searching, and filtering SVC data in GVF format for computing summary statistics, exporting data for genomic viewers, and annotating the SVC using external data sources

    PubRunner: a light-weight framework for updating text mining results

    Get PDF
    Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In order for biomedical text mining to become an indispensable tool used by researchers, this problem must be addressed. To this end, we present PubRunner, a framework for regularly running text mining tools on the latest publications. PubRunner is lightweight, simple to use, and can be integrated with an existing text mining tool. The workflow involves downloading the latest abstracts from PubMed, executing a user-defined tool, pushing the resulting data to a public FTP or Zenodo dataset, and publicizing the location of these results on the public PubRunner website. We illustrate the use of this tool by re-running the commonly used word2vec tool on the latest PubMed abstracts to generate up-to-date word vector representations for the biomedical domain. This shows a proof of concept that we hope will encourage text mining developers to build tools that truly will aid biologists in exploring the latest publications

    Extending TCGA queries to automatically identify analogous genomic data from dbGaP [version 1; referees: 2 approved, 1 approved with reservations]

    Get PDF
    Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources

    NovoGraph: Human genome graph construction from multiple long-read de novo assemblies [version 2; referees: 2 approved]

    Get PDF
    Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped

    The role of recent admixture in forming the contemporary West Eurasian genomic landscape

    Get PDF
    Over the past few years, studies of DNA isolated from human fossils and archaeological remains have generated considerable novel insight into the history of our species. Several landmark papers have described the genomes of ancient humans across West Eurasia, demonstrating the presence of large-scale, dynamic population movements over the last 10,000 years, such that ancestry across present-day populations is likely to be a mixture of several ancient groups [1-7]. While these efforts are bringing the details of West Eurasian prehistory into increasing focus, studies aimed at understanding the processes behind the generation of the current West Eurasian genetic landscape have been limited by the number of populations sampled or have been either too regional or global in their outlook [8-11]. Here, using recently described haplotype-based techniques [11], we present the results of a systematic survey of recent admixture history across Western Eurasia and show that admixture is a universal property across almost all groups. Admixture in all regions except North Western Europe involved the influx of genetic material from outside of West Eurasia, which we date to specific time periods. Within Northern, Western, and Central Europe, admixture tended to occur between local groups during the period 300 to 1200 CE. Comparisons of the genetic profiles of West Eurasians before and after admixture show that population movements within the last 1,500 years are likely to have maintained differentiation among groups. Our analysis provides a timeline of the gene flow events that have generated the contemporary genetic landscape of West Eurasia

    Pervasive Growth Reduction in Norway Spruce Forests following Wind Disturbance

    Get PDF
    Background: In recent decades the frequency and severity of natural disturbances by e.g., strong winds and insect outbreaks has increased considerably in many forest ecosystems around the world. Future climate change is expected to further intensify disturbance regimes, which makes addressing disturbances in ecosystem management a top priority. As a prerequisite a broader understanding of disturbance impacts and ecosystem responses is needed. With regard to the effects of strong winds – the most detrimental disturbance agent in Europe – monitoring and management has focused on structural damage, i.e., tree mortality from uprooting and stem breakage. Effects on the functioning of trees surviving the storm (e.g., their productivity and allocation) have been rarely accounted for to date. Methodology/Principal Findings: Here we show that growth reduction was significant and pervasive in a 6.79?million hectare forest landscape in southern Sweden following the storm Gudrun (January 2005). Wind-related growth reduction in Norway spruce (Picea abies (L.) Karst.) forests surviving the storm exceeded 10 % in the worst hit regions, and was closely related to maximum gust wind speed (R 2 = 0.849) and structural wind damage (R 2 = 0.782). At the landscape scale, windrelated growth reduction amounted to 3.0 million m 3 in the three years following Gudrun. It thus exceeds secondary damage from bark beetles after Gudrun as well as the long-term average storm damage from uprooting and stem breakage in Sweden

    Designing the selenium and bladder cancer trial (SELEBLAT), a phase lll randomized chemoprevention study with selenium on recurrence of bladder cancer in Belgium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In Belgium, bladder cancer is the fifth most common cancer in males (5.2%) and the sixth most frequent cause of death from cancer in males (3.8%). Previous epidemiological studies have consistently reported that selenium concentrations were inversely associated with the risk of bladder cancer. This suggests that selenium may also be suitable for chemoprevention of recurrence.</p> <p>Method</p> <p>The SELEBLAT study opened in September 2009 and is still recruiting all patients with non-invasive transitional cell carcinoma of the bladder on TURB operation in 15 Belgian hospitals. Recruitment progress can be monitored live at <url>http://www.seleblat.org.</url> Patients are randomly assigned to selenium yeast (200 μg/day) supplementation for 3 years or matching placebo, in addition to standard care. The objective is to determine the effect of selenium on the recurrence of bladder cancer. Randomization is stratified by treatment centre. A computerized algorithm randomly assigns the patients to a treatment arm. All study personnel and participants are blinded to treatment assignment for the duration of the study.</p> <p>Design</p> <p>The SELEnium and BLAdder cancer Trial (SELEBLAT) is a phase III randomized, placebo-controlled, academic, double-blind superior trial.</p> <p>Discussion</p> <p>This is the first report on a selenium randomized trial in bladder cancer patients.</p> <p>Trial registration</p> <p>ClinicalTrials.gov identifier: <a href="http://www.clinicaltrials.gov/ct2/show/NCT00729287">NCT00729287</a></p
    • …
    corecore