10 research outputs found

    Application-based fault tolerance techniques for sparse matrix solvers

    Get PDF
    High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution. </jats:p

    The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces.

    Get PDF
    The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources. OMA can be freely accessed at https://omabrowser.org

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p

    Prioritising candidate genes causing QTL using hierarchical orthologous groups

    No full text
    Abstract Motivation A key goal in plant biotechnology applications is the identification of genes associated to particular phenotypic traits (for example: yield, fruit size, root length). Quantitative Trait Loci (QTL) studies identify genomic regions associated with a trait of interest. However, to infer potential causal genes in these regions, each of which can contain hundreds of genes, these data are usually intersected with prior functional knowledge of the genes. This process is however laborious, particularly if the experiment is performed in a non-model species, and the statistical significance of the inferred candidates is typically unknown. Results This paper introduces QTLSearch, a method and software tool to search for candidate causal genes in QTL studies by combining Gene Ontology annotations across many species, leveraging hierarchical orthologous groups. The usefulness of this approach is demonstrated by re-analysing two metabolic QTL studies: one in Arabidopsis thaliana, the other in Oryza sativa subsp. indica. Even after controlling for statistical significance, QTLSearch inferred potential causal genes for more QTL than BLAST-based functional propagation against UniProtKB/Swiss-Prot, and for more QTL than in the original studies. Availability and implementation QTLSearch is distributed under the LGPLv3 license. It is available to install from the Python Package Index (as qtlsearch), with the source available from https://bitbucket.org/alex-warwickvesztrocy/qtlsearch. Supplementary information Supplementary data are available at Bioinformatics online

    Multifaceted quality assessment of gene repertoire annotation with OMArk

    No full text
    &lt;p&gt;Dataset associated to the OMArk paper.&lt;/p&gt;&lt;p&gt;Contain eight archives:&lt;/p&gt;&lt;p&gt;Supplementary_Tables&lt;/p&gt;&lt;p&gt;The Supplementary Table files referred to in the paper&lt;/p&gt;&lt;p&gt;OMAmerDB:&lt;/p&gt;&lt;p&gt;The OMAmer database constructed using the whole dataset of the&nbsp;OMA database (November 2022 Release) and used in the paper. An OMAmer database is necessary to run OMArk.&lt;/p&gt;&lt;p&gt;Simulation:&lt;br&gt;Proteomes with artificially introduced errors, contaminants&nbsp;or depleted completeness, used to assess OMArk's performance. The archive contains the generated proteomes (Simulated_Data)&nbsp;and their OMArk quality assessments (omark). They also contains the OMAmer results (OMAmerResults) that were used to run OMArk and BUSCO completeness assessments (BUSCO).&lt;/p&gt;&lt;p&gt;*Note that for storage efficiency, only the non-redundant part of the data (added errors, added contamination, random fraction of&nbsp;proteomes) are stored there. The full modified proteome can be regenerated from these data and the source proteomes.&lt;/p&gt;&lt;p&gt;Reference Proteomes:&lt;/p&gt;&lt;p&gt;The UniProt Reference Proteomes (Proteomes) (2021_04) and their proteome quality assesment results according to OMArk. The archive&nbsp;contains the source proteome FASTA (Source folder),&nbsp; OMAmer results for these proteomes&nbsp;(omamer folder) , OMArk results (omark folder), and BUSCO completeness&nbsp;assesments (BUSCO folder). It also contains a subfolder that contains part of the Contamination detection experiment (Contamination folder).&lt;/p&gt;&lt;p&gt;Ensembl_Metazoa_AssemblyChange.&lt;br&gt;&lt;br&gt;Contains Ensembl Metazoa proteomes with version change between version 52 and 54 as well as their quality assesment resuls for both version. The archive contains the source proteomes FASTA (Source folder), a Splice file that group together all proteins coded by the same gene (Splice folder), omamer results for the proteomes (omamer folder) and the omark results (omark folder)&lt;/p&gt;&lt;p&gt;MissingGenesBLAST&lt;br&gt;&lt;br&gt;Contains sequences of HOGs considered as missing in the Human proteome, that was used to look for sequences in the human genome.&lt;/p&gt;&lt;p&gt;Ensembl_NCBI_Results&lt;/p&gt;&lt;p&gt;Contains OMArk and BUSCO results for Ensembl and NCBI proteomes. These results were then used to evaluate OMArk biais due to source of proteomes in the OMA database.&lt;/p&gt;&lt;p&gt;Notebooks&lt;br&gt;Jupyter Notebooks that were used to perform the analysis described in the paper&lt;br&gt;&lt;br&gt;&nbsp;&lt;/p&gt

    Quality assessment of gene repertoire annotations with OMArk

    No full text
    In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.ISSN:1546-1696ISSN:1087-015
    corecore