12,929 research outputs found
A DeepONet multi-fidelity approach for residual learning in reduced order modeling
In the present work, we introduce a novel approach to enhance the precision
of reduced order models by exploiting a multi-fidelity perspective and
DeepONets. Reduced models provide a real-time numerical approximation by
simplifying the original model. The error introduced by the such operation is
usually neglected and sacrificed in order to reach a fast computation. We
propose to couple the model reduction to a machine learning residual learning,
such that the above-mentioned error can be learned by a neural network and
inferred for new predictions. We emphasize that the framework maximizes the
exploitation of high-fidelity information, using it for building the reduced
order model and for learning the residual. In this work, we explore the
integration of proper orthogonal decomposition (POD), and gappy POD for sensors
data, with the recent DeepONet architecture. Numerical investigations for a
parametric benchmark function and a nonlinear parametric Navier-Stokes problem
are presented
DeepOnto: A Python Package for Ontology Engineering with Deep Learning
Applying deep learning techniques, particularly language models (LMs), in
ontology engineering has raised widespread attention. However, deep learning
frameworks like PyTorch and Tensorflow are predominantly developed for Python
programming, while widely-used ontology APIs, such as the OWL API and Jena, are
primarily Java-based. To facilitate seamless integration of these frameworks
and APIs, we present Deeponto, a Python package designed for ontology
engineering. The package encompasses a core ontology processing module founded
on the widely-recognised and reliable OWL API, encapsulating its fundamental
features in a more "Pythonic" manner and extending its capabilities to include
other essential components including reasoning, verbalisation, normalisation,
projection, and more. Building on this module, Deeponto offers a suite of
tools, resources, and algorithms that support various ontology engineering
tasks, such as ontology alignment and completion, by harnessing deep learning
methodologies, primarily pre-trained LMs. In this paper, we also demonstrate
the practical utility of Deeponto through two use-cases: the Digital Health
Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment
Evaluation Initiative (OAEI).Comment: under review at Semantic Web Journa
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Machine Learning Approaches for the Prioritisation of Cardiovascular Disease Genes Following Genome- wide Association Study
Genome-wide association studies (GWAS) have revealed thousands of genetic loci, establishing itself as a valuable method for unravelling the complex biology of many diseases. As GWAS has grown in size and improved in study design to detect effects, identifying real causal signals, disentangling from other highly correlated markers associated by linkage disequilibrium (LD) remains challenging. This has severely limited GWAS findings and brought the method’s value into question. Although thousands of disease susceptibility loci have been reported, causal variants and genes at these loci remain elusive. Post-GWAS analysis aims to dissect the heterogeneity of variant and gene signals. In recent years, machine learning (ML) models have been developed for post-GWAS prioritisation. ML models have ranged from using logistic regression to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models (i.e., neural networks). When combined with functional validation, these methods have shown important translational insights, providing a strong evidence-based approach to direct post-GWAS research. However, ML approaches are in their infancy across biological applications, and as they continue to evolve an evaluation of their robustness for GWAS prioritisation is needed. Here, I investigate the landscape of ML across: selected models, input features, bias risk, and output model performance, with a focus on building a prioritisation framework that is applied to blood pressure GWAS results and tested on re-application to blood lipid traits
Beam scanning by liquid-crystal biasing in a modified SIW structure
A fixed-frequency beam-scanning 1D antenna based on Liquid Crystals (LCs) is designed for application in 2D scanning with lateral alignment. The 2D array environment imposes full decoupling of adjacent 1D antennas, which often conflicts with the LC requirement of DC biasing: the proposed design accommodates both. The LC medium is placed inside a Substrate Integrated Waveguide (SIW) modified to work as a Groove Gap Waveguide, with radiating slots etched on the upper broad wall, that radiates as a Leaky-Wave Antenna (LWA). This allows effective application of the DC bias voltage needed for tuning the LCs. At the same time, the RF field remains laterally confined, enabling the possibility to lay several antennas in parallel and achieve 2D beam scanning. The design is validated by simulation employing the actual properties of a commercial LC medium
Studies on genetic and epigenetic regulation of gene expression dynamics
The information required to build an organism is contained in its genome and the first
biochemical process that activates the genetic information stored in DNA is transcription.
Cell type specific gene expression shapes cellular functional diversity and dysregulation
of transcription is a central tenet of human disease. Therefore, understanding
transcriptional regulation is central to understanding biology in health and disease.
Transcription is a dynamic process, occurring in discrete bursts of activity that can be
characterized by two kinetic parameters; burst frequency describing how often genes
burst and burst size describing how many transcripts are generated in each burst. Genes
are under strict regulatory control by distinct sequences in the genome as well as
epigenetic modifications. To properly study how genetic and epigenetic factors affect
transcription, it needs to be treated as the dynamic cellular process it is. In this thesis, I
present the development of methods that allow identification of newly induced gene
expression over short timescales, as well as inference of kinetic parameters describing
how frequently genes burst and how many transcripts each burst give rise to. The work is
presented through four papers:
In paper I, I describe the development of a novel method for profiling newly transcribed
RNA molecules. We use this method to show that therapeutic compounds affecting
different epigenetic enzymes elicit distinct, compound specific responses mediated by
different sets of transcription factors already after one hour of treatment that can only
be detected when measuring newly transcribed RNA.
The goal of paper II is to determine how genetic variation shapes transcriptional bursting.
To this end, we infer transcriptome-wide burst kinetics parameters from genetically
distinct donors and find variation that selectively affects burst sizes and frequencies.
Paper III describes a method for inferring transcriptional kinetics transcriptome-wide
using single-cell RNA-sequencing. We use this method to describe how the regulation of
transcriptional bursting is encoded in the genome. Our findings show that gene specific
burst sizes are dependent on core promoter architecture and that enhancers affect burst
frequencies. Furthermore, cell type specific differential gene expression is regulated by
cell type specific burst frequencies.
Lastly, Paper IV shows how transcription shapes cell types. We collect data on cellular
morphologies, electrophysiological characteristics, and measure gene expression in the
same neurons collected from the mouse motor cortex. Our findings show that cells
belonging to the same, distinct transcriptomic families have distinct and non-overlapping
morpho-electric characteristics. Within families, there is continuous and correlated
variation in all modalities, challenging the notion of cell types as discrete entities
Assessing the species boundary and ecological niche in freshwater gastropods of the family Physidae (Gastropoda, Hygrophila)
The present thesis contributed to increasing the knowledge about the diversity of the neotropical
freshwater mollusks. Through the use of different methodologies for analyzing molecular and
geographical occurrence data, we address important taxonomic issues and show new paths for
future taxonomic research on the Physidae family. This family for a long time had classification
proposals based only on morphological characters of the shell and, later, on the anatomy of the
soft parts. The application of molecular delimitation methods based on coalescence showed the
inadequacy of morphological criteria in discriminating intraspecific variability (overestimating
family diversity) and in detecting the existence of cryptic species complexes (underestimating
family diversity). The data on the occurrence along with the use of georeferencing tools,
modeling, and ecological niche analyses applied to South American physid species, indicated
the possibility of errors in species identification and the need to reassess the distribution of these
physids using other operational criteria such as molecular approaches to access the actual family
diversity and distribution for the continent.A presente tese contribuiu para ampliar o conhecimento sobre a diversidade da malacofauna
dulcícola neotropical. Através do emprego de diferentes metodologias de análise de dados
moleculares e de ocorrência geográfica abordamos importantes questões taxonômicas e
mostramos novos caminhos para futuras pesquisas taxonômicas da família Physidae. Família
essa que por muito tempo teve propostas de classificação embasadas apenas em caracteres
morfológicos da concha e, posteriormente, na anatomia das partes moles. A aplicação de
métodos de delimitação molecular baseados em coalescência, evidenciou a insuficiência dos
critérios morfológicos em discriminar a variabilidade intraespecífica (superestimando a
diversidade da família) e, em detectar a existência de complexos de espécies crípticas
(subestimando a diversidade da família). A abordagem de busca intensiva por dados de
ocorrência junto a utilização de ferramentas de georreferenciamento, modelagem e análises de
nicho ecológico aplicadas às espécies de fisídeos sul-americanos, indicaram a possibilidade de
erros de identificação de espécies e a necessidade de reavaliar a distribuição desses fisídeos
usando outros critérios operacionais, incluindo abordagens moleculares, para acessar a
diversidade e distribuição reais da família para o continente
Antimicrobial Peptides Aka Host Defense Peptides – From Basic Research to Therapy
This Special Issue reprint will address the most current and innovative developments in the field of HDP research across a range of topics, such as structure and function analysis, modes of action, anti-microbial effects, cell and animal model systems, the discovery of novel host-defense peptides, and drug development
- …