5,854 research outputs found
Meta-learning algorithms and applications
Meta-learning in the broader context concerns how an agent learns about their own learning, allowing them to improve their learning process. Learning how to learn is not only beneficial for humans, but it has also shown vast benefits for improving how machines learn. In the context of machine learning, meta-learning enables models to improve their learning process by selecting suitable meta-parameters that influence the learning. For deep learning specifically, the meta-parameters typically describe details of the training of the model but can also include description of the model itself - the architecture. Meta-learning is usually done with specific goals in mind, for example trying to improve ability to generalize or learn new concepts from only a few examples.
Meta-learning can be powerful, but it comes with a key downside: it is often computationally costly. If the costs would be alleviated, meta-learning could be more accessible to developers of new artificial intelligence models, allowing them to achieve greater goals or save resources. As a result, one key focus of our research is on significantly improving the efficiency of meta-learning. We develop two approaches: EvoGrad and PASHA, both of which significantly improve meta-learning efficiency in two common scenarios. EvoGrad allows us to efficiently optimize the value of a large number of differentiable meta-parameters, while PASHA enables us to efficiently optimize any type of meta-parameters but fewer in number.
Meta-learning is a tool that can be applied to solve various problems. Most commonly it is applied for learning new concepts from only a small number of examples (few-shot learning), but other applications exist too. To showcase the practical impact that meta-learning can make in the context of neural networks, we use meta-learning as a novel solution for two selected problems: more accurate uncertainty quantification (calibration) and general-purpose few-shot learning. Both are practically important problems and using meta-learning approaches we can obtain better solutions than the ones obtained using existing approaches. Calibration is important for safety-critical applications of neural networks, while general-purpose few-shot learning tests model's ability to generalize few-shot learning abilities across diverse tasks such as recognition, segmentation and keypoint estimation.
More efficient algorithms as well as novel applications enable the field of meta-learning to make more significant impact on the broader area of deep learning and potentially solve problems that were too challenging before. Ultimately both of them allow us to better utilize the opportunities that artificial intelligence presents
Brittle-viscous deformation cycles at the base of the seismogenic zone in the continental crust
The main goal of the study was to determine the dynamical cycle of ductile-brittle deformation and to characterise the fluid pathways at different scales of a brittle-viscous fault zone active at the base of the seismogenic crust. Object of analysis are samples from the sinistral strike-slip fault zone BFZ045 from Olkiluoto (SW Finland), located at the site of a deep geological repository for nuclear waste.
Combined microstructural analysis, electron backscatter diffraction (EBSD), and mineral chemistry were applied to reconstruct the variations in pressure, temperature, fluid pressure, and differential stress that mediated deformation and strain localization along BFZ045 across the BDTZ. Ductile deformation took place at 400-500° C and 3-4 kbar, and recrystallized grain size piezometry for quartz document a progressive increase in differential stress during mylonitization, from ca. 50 MPa to ca. 120 MPa. The increase in differential stress was localised towards the shear zone center, which was eventually overprinted by brittle deformation in a narrowing shear zone. Cataclastic deformation occurred under lower T conditions down to T ≥ 320° C and was not further overprinted by mylonitic creep. Porosity estimates were obtained through the combination of x-ray micro-computed tomography (µCT), mercury intrusion porosimetry, He pycnometry, and microstructural analysis. Low porosity values (0.8-4.4%) for different rock type, 2-20 µm pore size, representative of pore connectivity, and microstructural observation suggest a relationship to a dynamical cycle of fracturing and sealing mechanism, mostly controlled by ductile deformation. Similarly, the observation from fracture orientation analysis indicates that the mylonitic precursor of BFZ045 played an important role in the localization of the brittle deformation. This thesis highlights that the ductile-brittle deformation cycle in BFZ045 was controlled by transient oscillations in fluid pressure in a narrowing shear zone deforming at progressively higher differential stress during cooling
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
Performance and Competitiveness of Tree-Based Pipeline Optimization Tool
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceAutomated machine learning (AutoML) is the process of automating the entire machine learn-ing workflow when applied to real-world problems. AutoML can increase data science produc-tivity while keeping the same performance and accuracy, allowing non-experts to use complex machine learning methods. Tree-based Pipeline Optimization Tool (TPOT) was one of the first AutoML methods created by data scientists and is targeted to optimize machine learning pipe-lines using genetic programming. While still under active development, TPOT is a very prom-ising AutoML tool. This Thesis aims to explore the algorithm and analyse its performance using real word data. Results show that evolution-based optimization is at least as accurate as TPOT initialization. The effectiveness of genetic operators, however, depends on the nature of the test case
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative – short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules – adapters – that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and
‘resolution.’ This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
Data-efficient neural network training with dataset condensation
The state of the art in many data driven fields including computer vision and natural language processing typically relies on training larger models on bigger data. It is reported by OpenAI that the computational cost to achieve the state of the art doubles every 3.4 months in the deep learning era. In contrast, the GPU computation power doubles every 21.4 months, which is significantly slower. Thus, advancing deep learning performance by consuming more hardware resources is not sustainable. How to reduce the training cost while preserving the generalization performance is a long standing goal in machine learning. This thesis investigates a largely under-explored while promising solution - dataset condensation which aims to condense a large training set into a small set of informative synthetic samples for training deep models and achieve close performance to models trained on the original dataset. In this thesis, we investigate how to condense image datasets for classification tasks. We propose three methods for image dataset condensation. Our methods can be applied to condense other kinds of datasets for different learning tasks, such as text data, graph data and medical images, and we discuss it in Section 6.1.
First, we propose a principled method that formulates the goal of learning a small synthetic set as a gradient matching problem with respect to the gradients of deep neural network weights that are trained on the original and synthetic data. A new gradient/weight matching loss is designed for robust matching of different neural architectures. We evaluate its performance in several image classification benchmarks and explore the usage of our method in continual learning and neural architecture search.
In the second work, we propose to further improve the data-efficiency of training neural networks with synthetic data by enabling effective data augmentation. Specifically, we propose Differentiable Siamese Augmentation and learn better synthetic data that can be used more effectively with data augmentation and thus achieve better performance when training networks with data augmentation. Experiments verify that the proposed method obtains substantial gains over the state of the art.
While training deep models on the small set of condensed images can be extremely fast, their synthesis remains computationally expensive due to the complex bi-level optimization. Finally, we propose a simple yet effective method that synthesizes condensed images by matching feature distributions of the synthetic and original training images when being embedded by randomly sampled deep networks. Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures and obtain a significant performance boost.
In summary, this manuscript presents several important contributions that improve data efficiency of training deep neural networks by condensing large datasets into significantly smaller synthetic ones. The innovations focus on principled methods based on gradient matching, higher data-efficiency with differentiable Siamese augmentation, and extremely simple and fast distribution matching without bilevel optimization. The proposed methods are evaluated on popular image classification datasets, namely MNIST, FashionMNIST, SVHN, CIFAR10/100 and TinyImageNet. The code is available at https://github.com/VICO-UoE/DatasetCondensation
Automated identification and behaviour classification for modelling social dynamics in group-housed mice
Mice are often used in biology as exploratory models of human conditions, due to their similar genetics and physiology. Unfortunately, research on behaviour has traditionally been limited to studying individuals in isolated environments and over short periods of time. This can miss critical time-effects, and, since mice are social creatures, bias results.
This work addresses this gap in research by developing tools to analyse the individual behaviour of group-housed mice in the home-cage over several days and with minimal disruption. Using data provided by the Mary Lyon Centre at MRC Harwell we designed an end-to-end system that (a) tracks and identifies mice in a cage, (b) infers their behaviour, and subsequently (c) models the group dynamics as functions of individual activities. In support of the above, we also curated and made available a large dataset of mouse localisation and behaviour classifications (IMADGE), as well as two smaller annotated datasets for training/evaluating the identification (TIDe) and behaviour inference (ABODe) systems. This research constitutes the first of its kind in terms of the scale and challenges addressed. The data source (side-view single-channel video with clutter and no identification markers for mice) presents challenging conditions for analysis, but has the potential to give richer information while using industry standard housing.
A Tracking and Identification module was developed to automatically detect, track and identify the (visually similar) mice in the cluttered home-cage using only single-channel IR video and coarse position from RFID readings. Existing detectors and trackers were combined with a novel Integer Linear Programming formulation to assign anonymous tracks to mouse identities. This utilised a probabilistic weight model of affinity between detections and RFID pickups.
The next task necessitated the implementation of the Activity Labelling module that classifies the behaviour of each mouse, handling occlusion to avoid giving unreliable classifications when the mice cannot be observed. Two key aspects of this were (a) careful feature-selection, and (b) judicious balancing of the errors of the system in line with the repercussions for our setup.
Given these sequences of individual behaviours, we analysed the interaction dynamics between mice in the same cage by collapsing the group behaviour into a sequence of interpretable latent regimes using both static and temporal (Markov) models. Using a permutation matrix, we were able to automatically assign mice to roles in the HMM, fit a global model to a group of cages and analyse abnormalities in data from a different demographic
Using personalised cardiovascular models to identify new diagnostic predictors for pre-eclampsia
Haemodynamic adaptations play a crucial role in uteroplacental perfusion during pregnancy. In particular, modifications of the utero-ovarian arterial network cause a significant increase in blood volume distributed to the placenta and foetus. Failure to make these cardiovascular modifications results in complicated pregnancies caused by different disorders such as hypertension, pre-eclampsia, intrauterine growth restriction (IUGR), and placental insufficiency. In pre-eclampsia, the modifications of the utero-ovarian arterial network are unsuccessful and cause less blood volume to be distributed to the placenta and foetus. Pre-eclampsia is a hypertensive disorder that is still not fully understood, and clinicians still fail at identifying pre-eclamptic women during controls, especially at differentiating between hypertensive women and pre-eclamptic women. One reason for this is that clinicians rely heavily on blood pressure when diagnosing pre-eclampsia, and this biomarker has similar readings for both pre-eclampsia and hypertension. As part of the diagnosis of pre-eclampsia, proteinuria is used. In order to improve the diagnosis of pre-eclampsia, other biomarkers are being researched. A dataset of 21 patients was used to find novel biomarkers that can classify pre-eclampsia. The dataset is divided into two groups: uncomplicated pregnancies with hypertensive women and complicated pregnancies with pre-eclampsia. A computational model of the cardiovascular system is used to simulate blood and pressure solutions based on patient-specific observations in order to develop a new biomarker. The model employs 1D modelling which incorporates a wave intensity analysis that models forward and backward waves to provide more precise predictions of wave propagation across the artery system, particularly in the utero-ovarian system. The proposed biomarkers will include dimensionless terms formed by global maternal parameters such as systolic blood pressure, stroke volume, pulse wave velocity, etc., or local uterine parameters such as pressure and velocity in specific vessels of the uterine system. Afterwards, their ability as a classifier of pre-eclampsia will be investigated. Besides this, a case study of the prone position in pregnancy and its effects on cardiovascular changes will be carried out. To do this, the computational model will be used to study what happens when a pregnant woman is positioned in the prone position and how vital metrics like blood pressure and cardiac output are altered. It was found that the biomarkers based on the radial and arcuate arteries have a better classification ability for pre-eclampsia, even higher than the Doppler-measured Resistance Index (RI) and Pulsatility Index (PI). The novelty of this work is the introduction of new biomarkers through the use of a computational model, as well as the demonstration of the dependability and use of 1D modelling in pregnancy. The model demonstrated how biomarkers that could not be measured clinically may be easily calculated using 1D modelling and provide critical information about the utero-ovarian circulation. Future work should concentrate on changing the existing solver into a much faster and simpler solver, as well as validating the biomarkers in a larger dataset
Emotion Recognition for Human-Centered Conversational Agents
This thesis proposes a study on Emotion Recognition in Conversation to address the challenges
of the task with a chatbot reference case study to enhance conversational agents’ ability to understand and respond appropriately to human emotion. The study consists of two phases. The
first one involves the use of several baselines and the implementation of EmoBERTa to explore
aspects of the task, such as preprocessing, balancing technique and context modelling tested
on ERC benchmark dataset. The results reveal that the punctuation provides key information
to the task, balancing techniques can provide marginal improvements if appropriately selected
and context can provide additional information and suggest that a non-static context construction could be beneficial.
In the second phase, the effectiveness of a Few-Shot learning method, SetFit, is explored in
the context of ERC to face the scarce amount of real labelled data. An incompatibility with
the given context definition of the architecture employed by the mentioned method called for
an adaptation which proved to be ineffective. The performance of the SetFit method and finetuning are compared in a limited data regime. Finally, the study explores the capabilities of
a trained model on a specific ERC dataset to adapt to limited data from a different domain
using Transfer Learning and fine-tuning with inconclusive results. The findings and insight
from this can lay the groundwork for future developments and studies in the growing field of
emotional-aware conversational agents and the application of Few-Shot learning in this task
- …