6,364 research outputs found
Using machine learning to predict pathogenicity of genomic variants throughout the human genome
Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität.
Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores.
Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt.
Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity.
Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants.
The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency.
In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org
NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs
Spawning duplicate requests, called cloning, is a powerful technique to
reduce tail latency by masking service-time variability. However, traditional
client-based cloning is static and harmful to performance under high load,
while a recent coordinator-based approach is slow and not scalable. Both
approaches are insufficient to serve modern microsecond-scale Remote Procedure
Calls (RPCs). To this end, we present NetClone, a request cloning system that
performs cloning decisions dynamically within nanoseconds at scale. Rather than
the client or the coordinator, NetClone performs request cloning in the network
switch by leveraging the capability of programmable switch ASICs. Specifically,
NetClone replicates requests based on server states and blocks redundant
responses using request fingerprints in the switch data plane. To realize the
idea while satisfying the strict hardware constraints, we address several
technical challenges when designing a custom switch data plane. NetClone can be
integrated with emerging in-network request schedulers like RackSched. We
implement a NetClone prototype with an Intel Tofino switch and a cluster of
commodity servers. Our experimental results show that NetClone can improve the
tail latency of microsecond-scale RPCs for synthetic and real-world application
workloads and is robust to various system conditions.Comment: 13 pages, ACM SIGCOMM 202
Resilience and food security in a food systems context
This open access book compiles a series of chapters written by internationally recognized experts known for their in-depth but critical views on questions of resilience and food security. The book assesses rigorously and critically the contribution of the concept of resilience in advancing our understanding and ability to design and implement development interventions in relation to food security and humanitarian crises. For this, the book departs from the narrow beaten tracks of agriculture and trade, which have influenced the mainstream debate on food security for nearly 60 years, and adopts instead a wider, more holistic perspective, framed around food systems. The foundation for this new approach is the recognition that in the current post-globalization era, the food and nutritional security of the world’s population no longer depends just on the performance of agriculture and policies on trade, but rather on the capacity of the entire (food) system to produce, process, transport and distribute safe, affordable and nutritious food for all, in ways that remain environmentally sustainable. In that context, adopting a food system perspective provides a more appropriate frame as it incites to broaden the conventional thinking and to acknowledge the systemic nature of the different processes and actors involved. This book is written for a large audience, from academics to policymakers, students to practitioners
Integrating materials supply in strategic mine planning of underground coal mines
In July 2005 the Australian Coal Industry’s Research Program (ACARP) commissioned Gary Gibson to identify constraints that would prevent development production rates from achieving full capacity. A “TOP 5” constraint was “The logistics of supply transport distribution and handling of roof support consumables is an issue at older extensive mines immediately while the achievement of higher development rates will compound this issue at most mines.” Then in 2020, Walker, Harvey, Baafi, Kiridena, and Porter were commissioned by ACARP to investigate Australian best practice and progress made since Gibson’s 2005 report. This report was titled: - “Benchmarking study in underground coal mining logistics.” It found that even though logistics continue to be recognised as a critical constraint across many operations particularly at a tactical / day to day level, no strategic thought had been given to logistics in underground coal mines, rather it was always assumed that logistics could keep up with any future planned design and productivity. This subsequently meant that without estimating the impact of any logistical constraint in a life of mine plan, the risk of overvaluing a mining operation is high.
This thesis attempts to rectify this shortfall and has developed a system to strategically identify logistics bottlenecks and the impacts that mine planning parameters might have on these at any point in time throughout a life of mine plan. By identifying any logistics constraints as early as possible, the best opportunity to rectify the problem at the least expense is realised. At the very worst if a logistics constraint was unsolvable then it could be understood, planned for, and reflected in the mine’s ongoing financial valuations. The system developed in this thesis, using a suite of unique algorithms, is designed to “bolt onto” existing mine plans in the XPAC mine scheduling software package, and identify at a strategic level the number of material delivery loads required to maintain planned productivity for a mining operation. Once an event was identified the system then drills down using FlexSim discrete event simulation to a tactical level to confirm the predicted impact and understand if a solution can be transferred back as a long-term solution. Most importantly the system developed in this thesis was designed to communicate to multiple non-technical stakeholders through simple graphical outputs if there is a risk to planned production levels due to a logistics constraint
A Decision Support System for Economic Viability and Environmental Impact Assessment of Vertical Farms
Vertical farming (VF) is the practice of growing crops or animals using the vertical dimension via multi-tier racks or vertically inclined surfaces. In this thesis, I focus on the emerging industry of plant-specific VF. Vertical plant farming (VPF) is a promising and relatively novel practice that can be conducted in buildings with environmental control and artificial lighting. However, the nascent sector has experienced challenges in economic viability, standardisation, and environmental sustainability. Practitioners and academics call for a comprehensive financial analysis of VPF, but efforts are stifled by a lack of valid and available data.
A review of economic estimation and horticultural software identifies a need for a decision support system (DSS) that facilitates risk-empowered business planning for vertical farmers. This thesis proposes an open-source DSS framework to evaluate business sustainability through financial risk and environmental impact assessments. Data from the literature, alongside lessons learned from industry practitioners, would be centralised in the proposed DSS using imprecise data techniques. These techniques have been applied in engineering but are seldom used in financial forecasting. This could benefit complex sectors which only have scarce data to predict business viability.
To begin the execution of the DSS framework, VPF practitioners were interviewed using a mixed-methods approach. Learnings from over 19 shuttered and operational VPF projects provide insights into the barriers inhibiting scalability and identifying risks to form a risk taxonomy. Labour was the most commonly reported top challenge. Therefore, research was conducted to explore lean principles to improve productivity.
A probabilistic model representing a spectrum of variables and their associated uncertainty was built according to the DSS framework to evaluate the financial risk for VF projects. This enabled flexible computation without precise production or financial data to improve economic estimation accuracy. The model assessed two VPF cases (one in the UK and another in Japan), demonstrating the first risk and uncertainty quantification of VPF business models in the literature. The results highlighted measures to improve economic viability and the viability of the UK and Japan case.
The environmental impact assessment model was developed, allowing VPF operators to evaluate their carbon footprint compared to traditional agriculture using life-cycle assessment. I explore strategies for net-zero carbon production through sensitivity analysis. Renewable energies, especially solar, geothermal, and tidal power, show promise for reducing the carbon emissions of indoor VPF. Results show that renewably-powered VPF can reduce carbon emissions compared to field-based agriculture when considering the land-use change.
The drivers for DSS adoption have been researched, showing a pathway of compliance and design thinking to overcome the ‘problem of implementation’ and enable commercialisation. Further work is suggested to standardise VF equipment, collect benchmarking data, and characterise risks. This work will reduce risk and uncertainty and accelerate the sector’s emergence
Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond
[ES] Esta tesis se enmarca en la intersección entre las técnicas modernas de Machine Learning, como las Redes Neuronales Profundas, y el modelado probabilístico confiable. En muchas aplicaciones, no solo nos importa la predicción hecha por un modelo (por ejemplo esta imagen de pulmón presenta cáncer) sino también la confianza que tiene el modelo para hacer esta predicción (por ejemplo esta imagen de pulmón presenta cáncer con 67% probabilidad). En tales aplicaciones, el modelo ayuda al tomador de decisiones (en este caso un médico) a tomar la decisión final. Como consecuencia, es necesario que las probabilidades proporcionadas por un modelo reflejen las proporciones reales presentes en el conjunto al que se ha asignado dichas probabilidades; de lo contrario, el modelo es inútil en la práctica. Cuando esto sucede, decimos que un modelo está perfectamente calibrado.
En esta tesis se exploran tres vias para proveer modelos más calibrados. Primero se muestra como calibrar modelos de manera implicita, que son descalibrados por técnicas de aumentación de datos. Se introduce una función de coste que resuelve esta descalibración tomando como partida las ideas derivadas de la toma de decisiones con la regla de Bayes. Segundo, se muestra como calibrar modelos utilizando una etapa de post calibración implementada con una red neuronal Bayesiana. Finalmente, y en base a las limitaciones estudiadas en la red neuronal Bayesiana, que hipotetizamos que se basan en un prior mispecificado, se introduce un nuevo proceso estocástico que sirve como distribución a priori en un problema de inferencia Bayesiana.[CA] Aquesta tesi s'emmarca en la intersecció entre les tècniques modernes de Machine Learning, com ara les Xarxes Neuronals Profundes, i el modelatge probabilístic fiable. En moltes aplicacions, no només ens importa la predicció feta per un model (per ejemplem aquesta imatge de pulmó presenta càncer) sinó també la confiança que té el model per fer aquesta predicció (per exemple aquesta imatge de pulmó presenta càncer amb 67% probabilitat). En aquestes aplicacions, el model ajuda el prenedor de decisions (en aquest cas un metge) a prendre la decisió final. Com a conseqüència, cal que les probabilitats proporcionades per un model reflecteixin les proporcions reals presents en el conjunt a què s'han assignat aquestes probabilitats; altrament, el model és inútil a la pràctica. Quan això passa, diem que un model està perfectament calibrat.
En aquesta tesi s'exploren tres vies per proveir models més calibrats. Primer es mostra com calibrar models de manera implícita, que són descalibrats per tècniques d'augmentació de dades. S'introdueix una funció de cost que resol aquesta descalibració prenent com a partida les idees derivades de la presa de decisions amb la regla de Bayes. Segon, es mostra com calibrar models utilitzant una etapa de post calibratge implementada amb una xarxa neuronal Bayesiana. Finalment, i segons les limitacions estudiades a la xarxa neuronal Bayesiana, que es basen en un prior mispecificat, s'introdueix un nou procés estocàstic que serveix com a distribució a priori en un problema d'inferència Bayesiana.[EN] This thesis is framed at the intersection between modern Machine Learning techniques, such as Deep Neural Networks, and reliable probabilistic modeling. In many machine learning applications, we do not only care about the prediction made by a model (e.g. this lung image presents cancer) but also in how confident is the model in making this prediction (e.g. this lung image presents cancer with 67% probability). In such applications, the model assists the decision-maker (in this case a doctor) towards making the final decision. As a consequence, one needs that the probabilities provided by a model reflects the true underlying set of outcomes, otherwise the model is useless in practice. When this happens, we say that a model is perfectly calibrated.
In this thesis three ways are explored to provide more calibrated models. First, it is shown how to calibrate models implicitly, which are decalibrated by data augmentation techniques. A cost function is introduced that solves this decalibration taking as a starting point the ideas derived from decision making with Bayes' rule. Second, it shows how to calibrate models using a post-calibration stage implemented with a Bayesian neural network. Finally, and based on the limitations studied in the Bayesian neural network, which we hypothesize that came from a mispecified prior, a new stochastic process is introduced that serves as a priori distribution in a Bayesian inference problem.Maroñas Molano, J. (2022). Modeling Uncertainty for Reliable Probabilistic Modeling in Deep Learning and Beyond [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181582TESI
Recommended from our members
Into the Multiverse: Methods for Studying Developmental Neuroscience
One major challenge in developmental neuroscience research is the sheer number of choices researchers face when addressing even a single research question. Even once data collection is complete, the journey from raw data to interpretation of findings may depend on numerous decisions. To address this issue, this dissertation explores “multiverse” analysis techniques for following many analytical paths at once in the same dataset.
In chapter 1, multiverses are used to examine which analyses of age-related change in amygdala-medial prefrontal cortex circuitry are robust versus sensitive to researcher decisions. Chapter 2 uses multiverse analysis to identify optimal solutions for mitigating breathing-induced artifacts in resting-state functional magnetic resonance imaging data. Chapter 3 uses a variety of model specifications to characterize simultaneous reward learning strategies in youth contingent on both visual task cues and spatial-motor information.
Despite varied approaches and goals, each of the three studies highlight the benefits of conducting multiple parallel analyses for both addressing questions in developmental neuroscience and deepening understanding of the methods used to address them
Bibliographic Control in the Digital Ecosystem
With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines
- …