5,643 research outputs found

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

    Studying the Biliary Tree using Organoid-Technology

    Get PDF

    19th SC@RUG 2022 proceedings 2021-2022

    Get PDF

    Hunting Wildlife in the Tropics and Subtropics

    Get PDF
    The hunting of wild animals for their meat has been a crucial activity in the evolution of humans. It continues to be an essential source of food and a generator of income for millions of Indigenous and rural communities worldwide. Conservationists rightly fear that excessive hunting of many animal species will cause their demise, as has already happened throughout the Anthropocene. Many species of large mammals and birds have been decimated or annihilated due to overhunting by humans. If such pressures continue, many other species will meet the same fate. Equally, if the use of wildlife resources is to continue by those who depend on it, sustainable practices must be implemented. These communities need to remain or become custodians of the wildlife resources within their lands, for their own well-being as well as for biodiversity in general. This title is also available via Open Access on Cambridge Core

    False textual information detection, a deep learning approach

    Get PDF
    Many approaches exist for analysing fact checking for fake news identification, which is the focus of this thesis. Current approaches still perform badly on a large scale due to a lack of authority, or insufficient evidence, or in certain cases reliance on a single piece of evidence. To address the lack of evidence and the inability of models to generalise across domains, we propose a style-aware model for detecting false information and improving existing performance. We discovered that our model was effective at detecting false information when we evaluated its generalisation ability using news articles and Twitter corpora. We then propose to improve fact checking performance by incorporating warrants. We developed a highly efficient prediction model based on the results and demonstrated that incorporating is beneficial for fact checking. Due to a lack of external warrant data, we develop a novel model for generating warrants that aid in determining the credibility of a claim. The results indicate that when a pre-trained language model is combined with a multi-agent model, high-quality, diverse warrants are generated that contribute to task performance improvement. To resolve a biased opinion and making rational judgments, we propose a model that can generate multiple perspectives on the claim. Experiments confirm that our Perspectives Generation model allows for the generation of diverse perspectives with a higher degree of quality and diversity than any other baseline model. Additionally, we propose to improve the model's detection capability by generating an explainable alternative factual claim assisting the reader in identifying subtle issues that result in factual errors. The examination demonstrates that it does indeed increase the veracity of the claim. Finally, current research has focused on stance detection and fact checking separately, we propose a unified model that integrates both tasks. Classification results demonstrate that our proposed model outperforms state-of-the-art methods

    International Conference on Mathematical Analysis and Applications in Science and Engineering – Book of Extended Abstracts

    Get PDF
    The present volume on Mathematical Analysis and Applications in Science and Engineering - Book of Extended Abstracts of the ICMASC’2022 collects the extended abstracts of the talks presented at the International Conference on Mathematical Analysis and Applications in Science and Engineering – ICMA2SC'22 that took place at the beautiful city of Porto, Portugal, in June 27th-June 29th 2022 (3 days). Its aim was to bring together researchers in every discipline of applied mathematics, science, engineering, industry, and technology, to discuss the development of new mathematical models, theories, and applications that contribute to the advancement of scientific knowledge and practice. Authors proposed research in topics including partial and ordinary differential equations, integer and fractional order equations, linear algebra, numerical analysis, operations research, discrete mathematics, optimization, control, probability, computational mathematics, amongst others. The conference was designed to maximize the involvement of all participants and will present the state-of- the-art research and the latest achievements.info:eu-repo/semantics/publishedVersio

    Effects of spatial and temporal heterogeneity on the genetic diversity of the alpine butterfly Parnassius smintheus

    Get PDF
    Genetic diversity represents a population’s evolutionary potential, as well as its demographic and evolutionary history. Advances in DNA sequencing have allowed the development of new and potentially powerful methods to quantify this diversity. However, when using these methods best practices for sampling populations and analyzing data are still being developed. Furthermore, while effects of the landscape on spatial patterns of genetic variation have received considerable attention, we have a poorer understanding of how genetic diversity changes as a result of temporal variation in environmental and demographic variables. Here, I take advantage of advances in DNA sequencing to investigate genetic diversity at single nucleotide polymorphisms (SNPs) across space and time in a model system of the butterfly, Parnassius smintheus. I used double digest restriction site associated DNA sequencing to genotype SNPs in P. smintheus from populations in Alberta, Canada. To develop recommendations for analyzing data, I tested the effect of varying the maximum amount of missing data (and therefore the number of SNPs) on common population genetic analyses. Most analyses were robust to varying amounts of missing data, except for population assignment tests where larger datasets (with more missing data) revealed higher-resolution population structure. I also examined the effect of sample size on the same set of analyses, finding that some (e.g., estimation of genetic differentiation) required as few as five individuals per population, while others (e.g., population assignment) required at least 15. I used the SNP dataset to investigate factors shaping patterns of genetic diversity at different spatial scales and across time. At a larger spatial scale but a single time point, both weather (snow depth and mean minimum temperatures) and land cover (the distance between meadow patches) predicted genetic diversity and differentiation. At a smaller spatial but longer temporal scale, I used a smaller SNP dataset to show that genetic diversity is lost over repeated demographic bottlenecks driven by winter weather, and subsequently recovered through gene flow. My work contributes to understanding how genetic diversity is shaped in natural populations, and points to the importance of both land cover and weather (and specifically, variability in weather) to this process
    • …
    corecore