8,313 research outputs found
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
Recommended from our members
Production networks in the cultural and creative sector: case studies from the publishing industry
The CICERONE project investigates cultural and creative industries through case study research, with a focus on production networks. This report, part of WP2, examines the publishing industry within this framework. It aims to understand the industry’s hidden aspects, address statistical issues in measurement, and explore the industry’s transformation and integration of cultural and economic values. The report provides an overview of the production network, explores statistical challenges, and presents qualitative analyses of two case studies. It concludes by highlighting the potential of the Global Production Network (GPN) approach for analyzing, researching, policymaking, and intervening in the European publishing network.
The CICERONE project’s case study research delves into the publishing industry, investigating its production networks and examining key aspects often unseen by the public. The report addresses statistical challenges in measuring the industry and sheds light on its ongoing transformations and integration of cultural and economic values. It presents an overview of the production network, explores statistical issues, and provides qualitative analyses of two case studies. The report emphasizes the potential of the GPN approach for analyzing and intervening in the European publishing network, ultimately contributing to research, policymaking, and understanding within the industry
Recommended from our members
Rigorous Experimentation For Reinforcement Learning
Scientific fields make advancements by leveraging the knowledge created by others to push the boundary of understanding. The primary tool in many fields for generating knowledge is empirical experimentation. Although common, generating accurate knowledge from empirical experiments is often challenging due to inherent randomness in execution and confounding variables that can obscure the correct interpretation of the results. As such, researchers must hold themselves and others to a high degree of rigor when designing experiments. Unfortunately, most reinforcement learning (RL) experiments lack this rigor, making the knowledge generated from experiments dubious. This dissertation proposes methods to address central issues in RL experimentation.
Evaluating the performance of an RL algorithm is the most common type of experiment in RL literature. Most performance evaluations are often incapable of answering a specific research question and produce misleading results. Thus, the first issue we address is how to create a performance evaluation procedure that holds up to scientific standards.
Despite the prevalence of performance evaluation, these types of experiments produce limited knowledge, e.g., they can only show how well an algorithm worked and not why, and they require significant amounts of time and computational resources. As an alternative, this dissertation proposes that scientific testing, the process of conducting carefully controlled experiments designed to further the knowledge and understanding of how an algorithm works, should be the primary form of experimentation.
Lastly, this dissertation provides a case study using policy gradient methods, showing how scientific testing can replace performance evaluation as the primary form of experimentation. As a result, this dissertation can motivate others in the field to adopt more rigorous experimental practices
Energy Supplies in the Countries from the Visegrad Group
The purpose of this Special Issue was to collect and present research results and experiences on energy supply in the Visegrad Group countries. This research considers both macroeconomic and microeconomic aspects. It was important to determine how the V4 countries deal with energy management, how they have undergone or are undergoing energy transformation and in what direction they are heading. The articles concerned aspects of the energy balance in the V4 countries compared to the EU, including the production of renewable energy, as well as changes in its individual sectors (transport and food production). The energy efficiency of low-emission vehicles in public transport and goods deliveries are also discussed, as well as the energy efficiency of farms and energy storage facilities and the impact of the energy sector on the quality of the environment
Constitutions of Value
Gathering an interdisciplinary range of cutting-edge scholars, this book addresses legal constitutions of value.
Global value production and transnational value practices that rely on exploitation and extraction have left us with toxic commons and a damaged planet. Against this situation, the book examines law’s fundamental role in institutions of value production and valuation. Utilising pathbreaking theoretical approaches, it problematizes mainstream efforts to redeem institutions of value production by recoupling them with progressive values. Aiming beyond radical critique, the book opens up the possibility of imagining and enacting new and different value practices.
This wide-ranging and accessible book will appeal to international lawyers, socio-legal scholars, those working at the intersections of law and economy and others, in politics, economics, environmental studies and elsewhere, who are concerned with rethinking our current ideas of what has value, what does not, and whether and how value may be revalued
Science and Innovations for Food Systems Transformation
This Open Access book compiles the findings of the Scientific Group of the United Nations Food Systems Summit 2021 and its research partners. The Scientific Group was an independent group of 28 food systems scientists from all over the world with a mandate from the Deputy Secretary-General of the United Nations. The chapters provide science- and research-based, state-of-the-art, solution-oriented knowledge and evidence to inform the transformation of contemporary food systems in order to achieve more sustainable, equitable and resilient systems
Knowledge Graph Building Blocks: An easy-to-use Framework for developing FAIREr Knowledge Graphs
Knowledge graphs and ontologies provide promising technical solutions for
implementing the FAIR Principles for Findable, Accessible, Interoperable, and
Reusable data and metadata. However, they also come with their own challenges.
Nine such challenges are discussed and associated with the criterion of
cognitive interoperability and specific FAIREr principles (FAIR + Explorability
raised) that they fail to meet. We introduce an easy-to-use, open source
knowledge graph framework that is based on knowledge graph building blocks
(KGBBs). KGBBs are small information modules for knowledge-processing, each
based on a specific type of semantic unit. By interrelating several KGBBs, one
can specify a KGBB-driven FAIREr knowledge graph. Besides implementing semantic
units, the KGBB Framework clearly distinguishes and decouples an internal
in-memory data model from data storage, data display, and data access/export
models. We argue that this decoupling is essential for solving many problems of
knowledge management systems. We discuss the architecture of the KGBB Framework
as we envision it, comprising (i) an openly accessible KGBB-Repository for
different types of KGBBs, (ii) a KGBB-Engine for managing and operating FAIREr
knowledge graphs (including automatic provenance tracking, editing changelog,
and versioning of semantic units); (iii) a repository for KGBB-Functions; (iv)
a low-code KGBB-Editor with which domain experts can create new KGBBs and
specify their own FAIREr knowledge graph without having to think about semantic
modelling. We conclude with discussing the nine challenges and how the KGBB
Framework provides solutions for the issues they raise. While most of what we
discuss here is entirely conceptual, we can point to two prototypes that
demonstrate the principle feasibility of using semantic units and KGBBs to
manage and structure knowledge graphs
A study of iron-oxidising bacteria, their habitats and their associated biogenic iron ochre, sampled from the Greater Glasgow area
Leptothrix ochracea is a species of iron-oxidising bacterium that is frequently found in ferrous iron rich waters throughout the world where it contributes to the production of biogenic iron ochre. When viewed microscopically, biogenic iron ochre produced by L.ochracea is found to comprise hollow microtubular filaments. The ability to produce iron oxide containing microtubes under ambient conditions has led to L.ochracea being an attractive bacterium to study however there is currently no isolated axenic culture of L.ochracea nor is a thorough understanding of the exopolymeric secretions which act as a scaffold for microtubular formation. This research aims to address these areas by studying biogenic iron ochre mats collected from several sample sites surrounding the greater Glasgow area.
Chapter 3 provides a provides a characterisation of the sampled biogenic iron ochre, primarily by SEM-EDX and XRD, to confirm the presence of filamentous material and investigate its composition and phase. Further to this there is also the development of a protocol to extract and characterise organic material associated with the samples along with a study of the interaction of thiol containing reducing agents with the material.
Chapter 4 provides a characterisation of the bacterial communities present at three of the sample sites via the high-throughput Illumina sequencing of the V3 – V4 region of the 16S rRNA gene. This is an important area of the study as it is essential to understand which other bacteria are found within the biogenic iron ochre mats as this will provide insight to the biogeochemistry occurring. By understanding the other bacteria present and biogeochemistry it may be possible to develop an artificial environment in which to grow and isolate L.ochracea.
Chapter 5 provides the development of protocols to isolate various types of bacteria from biogenic iron ochre mats. This chapter begins with the utilisation of solid agar media and solid gellan gum media supplemented with ferrous iron salts to isolate single colonies of bacteria. This chapter then investigates other isolation techniques including liquid enrichment growths and gradient tubes and culminates with the development of a protocol to isolate single filaments of bacteria via a micromanipulator. Isolated bacteria had their genomic DNA extracted which was then amplified by PCR and sequenced via highthroughput Illumina sequencing.
Chapter 6 provides a thorough characterisation of the sample sites used throughout this study. Characterisation includes multiple photographs, measurements and descriptions, the physicochemical conditions present, the inorganic species present and the concentration of dissolved inorganic and organic carbon present. Orbitrap mass spectrometry is then used to characterise the organic material present. The recent history and underlying geology of selected sample sites is also investigated to assess whether anthropogenic activity or natural geology have a greater effect on the chemistry occurring within the sample sites. As with Chapter 4, by fully understanding the sample sites and chemistry occurring within them it may be possible to develop artificial media and environments that could be used to grow and isolate L.ochracea
- …