78 research outputs found

    Computational Assessment of Genetic Variation beyond Single Nucleotide Changes

    Get PDF
    Advances in sequencing technology have greatly reduced the costs incurred in collecting raw sequencing data and researchers now have access to very large datasets of genomic alterations. Computational tools are necessary in order to interpret and discover biologically relevant genetic variation from sequencing data. Current computational tools, however, have overwhelmingly focused on single nucleotide changes. Much less work has been devoted to computational tools to prioritize insertion and deletion variants and chromosomal abnormalities. Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a pressing need for bioinformatics classifiers. Chromosomal abnormalities can produce a wide range of genetic disorders including in miscarriages, developmental disorders, and carcinogenesis. While numerous tools have been developed to detect chromosomal abnormalities, these tools have limited utility at lower cell admixtures. In this dissertation, I focus on the development of computational approaches beyond single nucleotide variants. I introduce a novel computational approach to assess indels variants (Chapters 2-3). I compare this method to existing computational approaches and investigate potential ways to improve indel prediction. Next, I develop a bioinformatics approach entitled WALDO (Within-sample AneupLoidy DiscOvery) specifically designed to detect chromosomal abnormalities as well as microsatellite instability (Chapters 4-6)

    Correction: Membranes Linked by Trans-Snare Complexes Require Lipids Prone to Non-Bilayer Structure for Progression to Fusion

    Get PDF
    Like other intracellular fusion events, the homotypic fusion of yeast vacuoles requires a Rab GTPase, a large Rab effector complex, SNARE proteins which can form a 4-helical bundle, and the SNARE disassembly chaperones Sec17p and Sec18p. In addition to these proteins, specific vacuole lipids are required for efficient fusion in vivo and with the purified organelle. Reconstitution of vacuole fusion with all purified components reveals that high SNARE levels can mask the requirement for a complex mixture of vacuole lipids. At lower, more physiological SNARE levels, neutral lipids with small headgroups that tend to form non-bilayer structures (phosphatidylethanolamine, diacylglycerol, and ergosterol) are essential. Membranes without these three lipids can dock and complete trans -SNARE pairing but cannot rearrange their lipids for fusion

    Identifying Mendelian disease genes with the Variant Effect Scoring Tool

    Get PDF
    Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency > 1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies

    Assessing the pathogenicity of insertion and deletion variants with the Variant Effect Scoring Tool (VEST-Indel)

    Get PDF
    Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features—DNA and protein sequence conservation, indel length, and occurrence in repeat regions—are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new “PubMed” feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method

    Assessment of the performance of CORDEX Regional Climate Models in Simulating Eastern Africa Rainfall

    Get PDF
    This study evaluates the ability of 10 regional climate models (RCMs) from the Coordinated Regional Climate Downscaling Experiment (CORDEX) in simulating the characteristics of rainfall patterns over eastern Africa. The seasonal climatology, annual rainfall cycles, and interannual variability of RCM output have been assessed over three homogeneous subregions against a number of observational datasets. The ability of the RCMs in simulating large-scale global climate forcing signals is further assessed by compositing the El Niño–Southern Oscillation (ENSO) and Indian Ocean dipole (IOD) events. It is found that most RCMs reasonably simulate the main features of the rainfall climatology over the three subregions and also reproduce the majority of the documented regional responses to ENSO and IOD forcings. At the same time the analysis shows significant biases in individual models depending on subregion and season; however, the ensemble mean has better agreement with observation than individual models. In general, the analysis herein demonstrates that the multimodel ensemble mean simulates eastern Africa rainfall adequately and can therefore be used for the assessment of future climate projections for the region

    Trace elements at the intersection of marine biological and geochemical evolution

    Get PDF
    Life requires a wide variety of bioessential trace elements to act as structural components and reactive centers in metalloenzymes. These requirements differ between organisms and have evolved over geological time, likely guided in some part by environmental conditions. Until recently, most of what was understood regarding trace element concentrations in the Precambrian oceans was inferred by extrapolation, geochemical modeling, and/or genomic studies. However, in the past decade, the increasing availability of trace element and isotopic data for sedimentary rocks of all ages has yielded new, and potentially more direct, insights into secular changes in seawater composition – and ultimately the evolution of the marine biosphere. Compiled records of many bioessential trace elements (including Ni, Mo, P, Zn, Co, Cr, Se, and I) provide new insight into how trace element abundance in Earth's ancient oceans may have been linked to biological evolution. Several of these trace elements display redox-sensitive behavior, while others are redox-sensitive but not bioessential (e.g., Cr, U). Their temporal trends in sedimentary archives provide useful constraints on changes in atmosphere-ocean redox conditions that are linked to biological evolution, for example, the activity of oxygen-producing, photosynthetic cyanobacteria. In this review, we summarize available Precambrian trace element proxy data, and discuss how temporal trends in the seawater concentrations of specific trace elements may be linked to the evolution of both simple and complex life. We also examine several biologically relevant and/or redox-sensitive trace elements that have yet to be fully examined in the sedimentary rock record (e.g., Cu, Cd, W) and suggest several directions for future studies

    Proteins linked to autosomal dominant and autosomal recessive disorders harbor characteristic rare missense mutation distribution patterns

    No full text
    The role of rare missense variants in disease causation remains difficult to interpret. We explore whether the clustering pattern of rare missense variants (MAF < 0.01) in a protein is associated with mode of inheritance. Mutations in genes associated with autosomal dominant (AD) conditions are known to result in either loss or gain of function, whereas mutations in genes associated with autosomal recessive (AR) conditions invariably result in loss-of-function. Loss-of-function mutations tend to be distributed uniformly along protein sequence, whereas gain-of-function mutations tend to localize to key regions. It has not previously been ascertained whether these patterns hold in general for rare missense mutations. We consider the extent to which rare missense variants are located within annotated protein domains and whether they form clusters, using a new unbiased method called CLUstering by Mutation Position. These approaches quantified a significant difference in clustering between AD and AR diseases. Proteins linked to AD diseases exhibited more clustering of rare missense mutations than those linked to AR diseases (Wilcoxon P = 5.7 × 10−4, permutation P = 8.4 × 10−4). Rare missense mutation in proteins linked to either AD or AR diseases was more clustered than controls (1000G) (Wilcoxon P = 2.8 × 10−15 for AD and P = 4.5 × 10−4 for AR, permutation P = 3.1 × 10−12 for AD and P = 0.03 for AR). The differences in clustering patterns persisted even after removal of the most prominent genes. Testing for such non-random patterns may reveal novel aspects of disease etiology in large sample studies

    Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST‐Indel)

    No full text
    Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features—DNA and protein sequence conservation, indel length, and occurrence in repeat regions—are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new “PubMed” feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method
    • 

    corecore