1,143 research outputs found

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

    Protein function prediction using domain families

    Get PDF
    Here we assessed the use of domain families for predicting the functions of whole proteins. These 'functional families' (FunFams) were derived using a protocol that combines sequence clustering with supervised cluster evaluation, relying on available high-quality Gene Ontology (GO) annotation data in the latter step. In essence, the protocol groups domain sequences belonging to the same superfamily into families based on the GO annotations of their parent proteins. An initial test based on enzyme sequences confirmed that the FunFams resemble enzyme (domain) families much better than do families produced by sequence clustering alone. For the CAFA 2011 experiment, we further associated the FunFams with GO terms probabilistically. All target proteins were first submitted to domain superfamily assignment, followed by FunFam assignment and, eventually, function assignment. The latter included an integration step for multi-domain target proteins. The CAFA results put our domain-based approach among the top ten of 31 competing groups and 56 prediction methods, confirming that it outperforms simple pairwise whole-protein sequence comparisons

    Assessment of Different Dimensions of Shame Proneness: Validation of the SHAME

    Get PDF
    A large body of research revealed that shame is associated with adaptive and maladaptive correlates. The aim of this work was to validate a new dimensional instrument (SHAME), which was developed to disentangle adaptive and maladaptive dimensions of shame proneness. Confirmatory factor analyses supported the three-factorial structure (bodily, cognitive, and existential shame) in American (n = 502) and German (n = 496) community samples, using invariance testing. Bifactormodel analyses exhibited distinct associations of adaptive (bodily and cognitive shame) and maladaptive (existential shame) dimensions of shame with psychopathology and social functioning. Network analyses highlighted the role of existential shame in psychopathology, especially for a clinical sample of patients with Borderline Personality Disorder (n = 92). By placing shame pronenesss into a network of similar and dissimilar constructs, the current findings serve as a foundation for drawing conclusions about the adaptive and maladaptive nature of shame

    CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

    Get PDF
    Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction

    Lebensgestaltung junger Ehen: Kurzinformation

    Full text link
    Das Zentralinstitut für Jugendforschung Leipzig hat 1973 eine Studie zur Lebensgestaltung junger Ehen durchgeführt, bei der 1.109 verheiratete Männer und Frauen im Alter von 18 bis 28 Jahren aus zehn Betrieben erfaßt wurden. Der vorliegende Kurzbericht beinhaltet die Erstresultate der Untersuchung, unterschieden nach Geschlecht. Untersucht wurden fünf Bereiche: (1) Die ideologische Einstellung - Einstellung zum sozialistischen Staat, zu gesellschaftspolitischen Sachverhalten, zur FDJ, zur staatlichen Unterstützung junger Ehen und zu gesellschaftlichen Aktivitäten der jungen Ehepartner; (2) Lebensbedingungen junger Ehepartner - Wohnraum, Arbeitsbedingungen, zukünftige Tätigkeitsplanung junger Frauen; (3) Eheleitbilder und Kinderwünsche; (4) Lebensweise junger Ehepartner - voreheliche Beziehungen, gegenwärtige Partnerbeziehungen, Qualifizierung, Freizeit, finanzielle Regelungen und Außenbeziehungen junger Verheirateter; (5) Veränderungen durch die Ehe. (pka

    Protocols to capture the functional plasticity of protein domain superfamilies

    Get PDF
    Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function

    Achsenbildung in Cnidariern: Die Rolle der Wnt- und BMP/Chordin Signaltransduktionswege

    Get PDF
    Ein entscheidender Schritt in der Evolution der Tiere war die Entwicklung von Körperbauplänen mit zwei Achsen aus solchen mit nur einer Achse. Um diesen Schritt zu verstehen, ist es notwendig, die molekularen Prozesse die die Achsenbildung steuern, sowohl in bilateralsymmetrischen (zwei Achsen) als auch in radiärsymmetrischen Tieren zu kennen. Da diese molekularen Mechanismen in Bilateriern hoch konserviert sind, versuchten wir zu erforschen, ob homologe molekulare Signalwege bereits in dem radiärsymmetrischen Süßwasserpolypen Hydra (Phylum Cnidaria) existieren und dort an Achsenbildungsprozessen beteiligt sind. Wir haben gefunden, dass Moleküle der konservierten Wnt- und BMP/Chordin-Signaltransduktionswege in Hydra existieren und zeigen, dass ihre Expressionsmuster während regulärer und experimentell induzierter Musterbildungsprozesse eine Funktion in der Achsenbildung der Polypen nahe legen. Implikationen für die Evolution der Körperbaupläne werden diskutiert
    corecore