15 research outputs found

    Evaluating and Improving the Efficiency of Software and Algorithms for Sequence Data Analysis

    Get PDF
    With the ever-growing size of sequence data sets, data processing and analysis are an increasingly large portion of the time and money spent on nucleic acid sequencing projects. Correspondingly, the performance of the software and algorithms used to perform that analysis has a direct effect on the time and expense involved. Although the analytical methods are widely varied, certain types of software and algorithms are applicable to a number of areas. Targeting improvements to these common elements has the potential for wide reaching rewards. This dissertation research consisted of several projects to characterize and improve upon the efficiency of several common elements of sequence data analysis software and algorithms. The first project sought to improve the efficiency of the short read mapping process, as mapping is the most time consuming step in many data analysis pipelines. The result was a new short read mapping algorithm and software, demonstrated to be more computationally efficient than existing software and enabling more of the raw data to be utilized. While developing this software, it was discovered that a widely used bioinformatics software library introduced a great deal of inefficiency into the application. Given the potential impact of similar libraries to other applications, and because little research had been done to evaluate library efficiency, the second project evaluated the efficiency of seven of the most popular bioinformatics software libraries, written in C++, Java, Python, and Perl. This evaluation showed that two of libraries written in the most popular language, Java, were an order of magnitude slower and used more memory than expected based on the language in which they were implemented. The third and final project, therefore, was the development of a new general-purpose bioinformatics software library for Java. This library, known as BioMojo, incorporated a new design approach resulting in vastly improved efficiency. Assessing the performance of this new library using the benchmark methods developed for the second project showed that BioMojo outperformed all of the other libraries across all benchmark tasks, being up to 30 times more CPU efficient than existing Java libraries

    Metagenomic analysis of planktonic microbial consortia from a non-tidal urban-impacted segment of James River

    Get PDF
    Knowledge of the diversity and ecological function of the microbial consortia of James River in Virginia, USA, is essential to developing a more complete understanding of the ecology of this model river system. Metagenomic analysis of James River\u27s planktonic microbial community was performed for the first time using an unamplified genomic library and a 16S rDNA amplicon library prepared and sequenced by Ion PGM and MiSeq, respectively. From the 0.46-Gb WGS library (GenBank:SRR1146621; MG-RAST:4532156.3), 4 × 10 6 reads revealed \u3e3 × 10 6 genes, 240 families of prokaryotes, and 155 families of eukaryotes. From the 0.68-Gb 16S library (GenBank:SRR2124995; MG-RAST:4631271.3; EMB:2184), 4 × 10 6 reads revealed 259 families of eubacteria. Results of the WGS and 16S analyses were highly consistent and indicated that more than half of the bacterial sequences were Proteobacteria, predominantly Comamonadaceae. The most numerous genera in this group were Acidovorax (including iron oxidizers, nitrotolulene degraders, and plant pathogens), which accounted for 10 % of assigned bacterial reads.Polaromonas were another 6 % of all bacterial reads, with many assignments to groups capable of degrading polycyclic aromatic hydrocarbons. Albidiferax (iron reducers) and Variovorax(biodegraders of a variety of natural biogenic compounds as well as anthropogenic contaminants such as polycyclic aromatic hydrocarbons and endocrine disruptors) each accounted for an additional 3 % of bacterial reads. Comparison of these data to other publically-available aquatic metagenomes revealed that this stretch of James River is highly similar to the upper Mississippi River, and that these river systems are more similar to aquaculture and sludge ecosystems than they are to lakes or to a pristine section of the upper Amazon River. Taken together, these analyses exposed previously unknown aspects of microbial biodiversity, documented the ecological responses of microbes to urban effects, and revealed the noteworthy presence of 22 human-pathogenic bacterial genera (e.g., Enterobacteriaceae, pathogenic Pseudomonadaceae, and ‘Vibrionales\u27) and 6 pathogenic eukaryotic genera (e.g., Trypanosomatidae and Vahlkampfiidae). This information about pathogen diversity may be used to promote human epidemiological studies, enhance existing water quality monitoring efforts, and increase awareness of the possible health risks associated with recreational use of James River

    Metagenomic analysis of planktonic microbial consortia from a non-tidal urban-impacted segment of James River

    Get PDF
    Knowledge of the diversity and ecological function of the microbial consortia of James River in Virginia, USA, is essential to developing a more complete understanding of the ecology of this model river system. Metagenomic analysis of James River\u27s planktonic microbial community was performed for the first time using an unamplified genomic library and a 16S rDNA amplicon library prepared and sequenced by Ion PGM and MiSeq, respectively. From the 0.46-Gb WGS library (GenBank:SRR1146621; MG-RAST:4532156.3), 4 × 106 reads revealed \u3e3 × 106 genes, 240 families of prokaryotes, and 155 families of eukaryotes. From the 0.68-Gb 16S library (GenBank:SRR2124995; MG-RAST:4631271.3; EMB:2184), 4 × 106 reads revealed 259 families of eubacteria. Results of the WGS and 16S analyses were highly consistent and indicated that more than half of the bacterial sequences were Proteobacteria, predominantly Comamonadaceae. The most numerous genera in this group were Acidovorax (including iron oxidizers, nitrotolulene degraders, and plant pathogens), which accounted for 10 % of assigned bacterial reads. Polaromonas were another 6 % of all bacterial reads, with many assignments to groups capable of degrading polycyclic aromatic hydrocarbons. Albidiferax (iron reducers) and Variovorax (biodegraders of a variety of natural biogenic compounds as well as anthropogenic contaminants such as polycyclic aromatic hydrocarbons and endocrine disruptors) each accounted for an additional 3 % of bacterial reads. Comparison of these data to other publically-available aquatic metagenomes revealed that this stretch of James River is highly similar to the upper Mississippi River, and that these river systems are more similar to aquaculture and sludge ecosystems than they are to lakes or to a pristine section of the upper Amazon River. Taken together, these analyses exposed previously unknown aspects of microbial biodiversity, documented the ecological responses of microbes to urban effects, and revealed the noteworthy presence of 22 human-pathogenic bacterial genera (e.g., Enterobacteriaceae, pathogenic Pseudomonadaceae, and ‘Vibrionales\u27) and 6 pathogenic eukaryotic genera (e.g., Trypanosomatidae and Vahlkampfiidae). This information about pathogen diversity may be used to promote human epidemiological studies, enhance existing water quality monitoring efforts, and increase awareness of the possible health risks associated with recreational use of James River

    The stability of educational achievement across school years is largely explained by genetic factors.

    Get PDF
    Little is known about the etiology of developmental change and continuity in educational achievement. Here, we study achievement from primary school to the end of compulsory education for 6000 twin pairs in the UK-representative Twins Early Development Study sample. Results showed that educational achievement is highly heritable across school years and across subjects studied at school (twin heritability ~60%; SNP heritability ~30%); achievement is highly stable (phenotypic correlations ~0.70 from ages 7 to 16). Twin analyses, applying simplex and common pathway models, showed that genetic factors accounted for most of this stability (70%), even after controlling for intelligence (60%). Shared environmental factors also contributed to the stability, while change was mostly accounted for by individual-specific environmental factors. Polygenic scores, derived from a genome-wide association analysis of adult years of education, also showed stable effects on school achievement. We conclude that the remarkable stability of achievement is largely driven genetically even after accounting for intelligence

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    The Changing Landscape for Stroke\ua0Prevention in AF: Findings From the GLORIA-AF Registry Phase 2

    Get PDF
    Background GLORIA-AF (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients with Atrial Fibrillation) is a prospective, global registry program describing antithrombotic treatment patterns in patients with newly diagnosed nonvalvular atrial fibrillation at risk of stroke. Phase 2 began when dabigatran, the first non\u2013vitamin K antagonist oral anticoagulant (NOAC), became available. Objectives This study sought to describe phase 2 baseline data and compare these with the pre-NOAC era collected during phase 1. Methods During phase 2, 15,641 consenting patients were enrolled (November 2011 to December 2014); 15,092 were eligible. This pre-specified cross-sectional analysis describes eligible patients\u2019 baseline characteristics. Atrial fibrillation disease characteristics, medical outcomes, and concomitant diseases and medications were collected. Data were analyzed using descriptive statistics. Results Of the total patients, 45.5% were female; median age was 71 (interquartile range: 64, 78) years. Patients were from Europe (47.1%), North America (22.5%), Asia (20.3%), Latin America (6.0%), and the Middle East/Africa (4.0%). Most had high stroke risk (CHA2DS2-VASc [Congestive heart failure, Hypertension, Age  6575 years, Diabetes mellitus, previous Stroke, Vascular disease, Age 65 to 74 years, Sex category] score  652; 86.1%); 13.9% had moderate risk (CHA2DS2-VASc = 1). Overall, 79.9% received oral anticoagulants, of whom 47.6% received NOAC and 32.3% vitamin K antagonists (VKA); 12.1% received antiplatelet agents; 7.8% received no antithrombotic treatment. For comparison, the proportion of phase 1 patients (of N = 1,063 all eligible) prescribed VKA was 32.8%, acetylsalicylic acid 41.7%, and no therapy 20.2%. In Europe in phase 2, treatment with NOAC was more common than VKA (52.3% and 37.8%, respectively); 6.0% of patients received antiplatelet treatment; and 3.8% received no antithrombotic treatment. In North America, 52.1%, 26.2%, and 14.0% of patients received NOAC, VKA, and antiplatelet drugs, respectively; 7.5% received no antithrombotic treatment. NOAC use was less common in Asia (27.7%), where 27.5% of patients received VKA, 25.0% antiplatelet drugs, and 19.8% no antithrombotic treatment. Conclusions The baseline data from GLORIA-AF phase 2 demonstrate that in newly diagnosed nonvalvular atrial fibrillation patients, NOAC have been highly adopted into practice, becoming more frequently prescribed than VKA in Europe and North America. Worldwide, however, a large proportion of patients remain undertreated, particularly in Asia and North America. (Global Registry on Long-Term Oral Antithrombotic Treatment in Patients With Atrial Fibrillation [GLORIA-AF]; NCT01468701
    corecore