21 research outputs found
Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks
Epoch-wise double descent is the phenomenon where generalisation performance improves beyond the point of overfitting, resulting in a generalisation curve exhibiting two descents under the course of learning. Understanding the mechanisms driving this behaviour is crucial not only for understanding the generalisation behaviour of machine learning models in general, but also for employing conventional selection methods, such as the use of early stopping to mitigate overfitting. While we ultimately want to draw conclusions of more complex models, such as deep neural networks, a majority of theoretical results regarding the underlying cause of epoch-wise double descent are based on simple models, such as standard linear regression. In this paper, to take a step towards more complex models in theoretical analysis, we study epoch-wise double descent in two-layer linear neural networks. First, we derive a gradient flow for the linear two-layer model, that bridges the learning dynamics of the standard linear regression model, and the linear two-layer diagonal network with quadratic weights. Second, we identify additional factors of epoch-wise double descent emerging with the extra model layer, by deriving necessary conditions for the generalisation error to follow a double descent pattern. While epoch-wise double descent in linear regression has been attributed to differences in input variance, in the two-layer model, also the singular values of the input-output covariance matrix play an important role. This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models
Active Learning with Weak Labels for Gaussian Processes
Annotating data for supervised learning can be costly. When the annotation
budget is limited, active learning can be used to select and annotate those
observations that are likely to give the most gain in model performance. We
propose an active learning algorithm that, in addition to selecting which
observation to annotate, selects the precision of the annotation that is
acquired. Assuming that annotations with low precision are cheaper to obtain,
this allows the model to explore a larger part of the input space, with the
same annotation costs. We build our acquisition function on the previously
proposed BALD objective for Gaussian Processes, and empirically demonstrate the
gains of being able to adjust the annotation precision in the active learning
loop
A general framework for ensemble distribution distillation
Ensembles of neural networks have been shown to give better performance than
single networks, both in terms of predictions and uncertainty estimation.
Additionally, ensembles allow the uncertainty to be decomposed into aleatoric
(data) and epistemic (model) components, giving a more complete picture of the
predictive uncertainty. Ensemble distillation is the process of compressing an
ensemble into a single model, often resulting in a leaner model that still
outperforms the individual ensemble members. Unfortunately, standard
distillation erases the natural uncertainty decomposition of the ensemble. We
present a general framework for distilling both regression and classification
ensembles in a way that preserves the decomposition. We demonstrate the desired
behaviour of our framework and show that its predictive performance is on par
with standard distillation
Utveckling av alkaninducerad biosensor i Saccharomyces cerevisiae
Den ökande halten av växthusgaser i atmosfären kräver en utveckling av nya miljövänliga bränslen. Användning av mikrober vid bränsletillverkning är en lovande metod som har varit aktuell i flera år. n-Alkaner har en hög likhet med bland annat diesel och kan därför direkt ersätta nuvarande fossila bränslen. Stammar av Saccharomyces cervisiae som kan producera n-alkaner har tagits fram. Dessa stammar är långt ifrån optimala och behöver utvecklas innan de kan nyttjas på industriell skala. Målet med detta projekt är att skapa en biosensor som kan användas som ett screeningverktyg för att optimera utbytet av n-alkaner i S. ce- revisiae. Ett transkriptionssystem från Yarrowia lipolytica tillsammans med GFP används för att skapa biosensorn. Systemet från Y. lipolytica består av tre gener, YAS1, YAS2 och YAS3, samt promotorsekvensen ARE1. Detta systemet aktiverar transkription i närvaro av n-alkaner. PCR-produkter genererades med framgång och användes för att konstruera plasmider innehållande kombinationer av YAS1, YAS2, YAS3, ARE1-sekvenser samt GFP. Veri_kation av de konstruerade plasmiderna bekräftade att generna inkorporerats med få mutationer. Biosensorn introducerades i S. cerevisiae med ett två-vektorsystem. Flödescytometri användes för att testa biosensorn i närvaro och frånvaro av dekan och en skillnad i uttryck bekräftades. Resultaten indikerar på att det är möjligt att implementera systemet från Y. lipolytica i S. cerevisiae med målet att skapa en biosensor för n-alkane
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
Validation of a Damage Accumulation Model of Replicative Ageing in S.cerevisiae.
Age-related diseases and conditions give rise to societal challenges and pose a threat to healthy ageing. At the same time, the more recent evolutionary theories of ageing hypothesise that the process of ageing is a consequence of living rather than an evolutionary strategy. Consequently, it is implied that ageing is not as inevitable as many might believe and, as a consequence, it is of interest to study this biological process and its underlying mechanisms. On a cellular level, accumulation of damage is often regarded as the main cause of ageing. Since the basic properties of ageing between unicellular and multicellular organisms are similar on this level, it is common to use the unicellular yeast Saccharomyces cerevisiae as a model organism in the field of ageing research. The aim of this project is to validate a mathematical damage accumulation model of replicative ageing in yeast. The model represents a cell by intact protein and damage and describes how these quantities change as the cell grows. In addition to cell growth, the model takes asymmetric division, retention and cell death into account. For the purpose of validating the model of replicative ageing, structural and numerical identifiability methods are applied and continuous optimisation is performed using single-cell area data. The model is fit to experimental data obtained for wildtype yeast and the two deletion strains sir2 and fob1. Moreover, replicative lifespan data of 4,698 single-gene deletion strains is analysed and, in conjunction to this, it is investigated how the model parameters affect the replicative lifespan of the simulations. The results show that the parameters in the model of replicative ageing that describes the rate of change of intact protein and damage in the cell, are structurally identifiable. In spite of this, they are not numerically identifiable based on the experimental data available; the parameter estimates obtained have high variances and are moderately or highly correlated with each other. Likewise, it is possible to generate parameter sets that make the mathematical model reproduce the replicative lifespans of the investigated strains, if a replicative lifespan constraint is inferred on the optimisation. For future work, it is suggested that new experimental data is generated as to fit the model of replicative ageing to growth curves belonging to cells of later life stages. Ultimately, the data should be sufficient enough for the optimisation to generate parameter sets that make the model adapt to the characteristics of the investigated strains, without having additional constraints added to the objective function
On Uncertainty Quantification in Neural Networks: Ensemble Distillation and Weak Supervision
Machine learning models are employed in several aspects of society, ranging from autonomous cars to justice systems. They affect your everyday life, for instance through recommendations on your streaming service and by informing decisions in healthcare, and are expected to have even more influence in society in the future. Among these machine learning models, we find neural networks which have had a wave of success within a wide range of fields in recent years. The success of neural networks are partly attributed to the very flexible model structure and, what it seems, endless possibilities in terms of extensions. While neural networks come with great flexibility, they are so called black-box models and therefore offer little in terms of interpretability. In other words, it is seldom possible to explain or even understand why a neural network makes a certain decision. On top of this, these models are known to be overconfident, which means that they attribute low uncertainty to their predictions, even when uncertainty is, in reality, high. Previous work has demonstrated how this issue can be alleviated with the help of ensembles, i.e. by weighing the opinion of multiple models in prediction. In Paper I, we investigate this possibility further by creating a general framework for ensemble distribution distillation, developed for the purpose of preserving the performance benefits of ensembles while reducing computational costs. Specifically, we extend ensemble distribution distillation to make it applicable to tasks beyond classification and demonstrate the usefulness of the framework in, for example, out-of-distribution detection. Another obstacle in the use of neural networks, especially deep neural networks, is that supervised training of these models can require a large amount of labelled data. The process of annotating a large amount of data is costly, time-consuming and also prone to errors. Specifically, there is a risk of incorporating label noise in the data. In Paper II, we investigate the effect of label noise on model performance. In particular, under an input-dependent noise model, we analyse the properties of the asymptotic risk minimisers of strictly proper and a set of previously proposed, robust loss functions. The results demonstrate that reliability, in terms of a model’s uncertainty estimates, is an important aspect to consider also in weak supervision and, particularly, when developing noise-robust training algorithms. Related to annotation costs in supervised learning, is the use of active learning to optimise model performance under budget constraints. The goal of active learning, in this context, is to identify and annotate the observations that are most useful for the model’s performance. In Paper III, we propose an approach for taking advantage of intentionally weak annotations in active learning. What is proposed, more specifically, is to incorporate the possibility to collect cheaper, but noisy, annotations in the active learning algorithm. Thus, the same annotation budget is enough to annotate more data points for training. In turn, the model gets to explore a larger part of the input space. We demonstrate empirically how this can lead to gains in model performance.Maskininlärningsmodeller används i flera delar av samhället, från autonoma fordon till rättssystem. De påverkar redan nu din vardag, exempelvis via personliga rekommendationer i din direktuppspelningstjänst (”streaming service”) och genom att agera beslutsstöd i vården, och förväntas ha än mer påverkan i samhället i framtiden. Bland dessa maskininlärningsmodeller, finner vi neurala nätverk som har haft stor framgång inom flera fält under det senaste årtiondet. Framgången beror delvis på neurala nätverks flexibla modellstruktur och, vad det verkar, oändliga utvecklingsmöjligheter. Neurala nätverk erbjuder stor flexibilitet, men har en nackdel i att de är så kallade black-box-modeller. Detta innebär att det sällan går att förklara eller ens förstå varför ett neuralt nätverk tar ett visst beslut. Dessutom, så har den här typen av modeller en tendens att vara överdrivet självsäkra, vilket betyder att de rapporterar låg osäkerhet i sina beslut, även när osäkerheten i själva verket är hög. För ett självkörande fordon skulle detta till exempel kunna innebära att fordonet bedömer en vänstersväng som mycket säker, när sikten över det mötande körfältet är skymd och ett mötande fordon mycket väl kan finnas just bakom krönet. Tidigare forskning har demonstrerat hur denna typ av problem kan avhjälpas genom att använda flera neurala nätverk som samspelar för att prediktera eller ta ett beslut. På detta sätt fås en modell som är mer korrekt och som också är mer pålitlig när det kommer till att ge en uppskattning av den egna osäkerheten. I denna avhandling undersöker vi vidare hur vi kan lära ett enskilt neuralt nätverk att efterlikna flera samspelande modeller, för att minska de kostnader som kommer med att ha flera samspelande modeller i bruk. En annan begränsande faktor när det kommer till neurala nätverk är att de kan behöva en stor mängd insamlad data med tillhörande etiketter för att lära sig den uppgift som de är ämnade för. Att införskaffa etiketter för en stor mängd datapunkter är både kostsamt och tidskrävande och det finns en risk att det blir fel i annoteringsprocessen. Mer specifikt så kan felaktiga etiketter, så kallat etikettbrus, inkluderas i datan. Detta i sin tur kan skada modellens förmåga att ta korrekta beslut. Vi undersöker hur denna effekt tar sig form och finner att etikettbrus inte bara kan ha en negativ effekt på nämnda förmåga att ta korrekta beslut, utan även på förmågan att skatta den egna osäkerheten. Relaterat till annoteringskostnader, föreslår vi till sist ett tillvägagångssätt för att utnyttja brusiga etiketter i aktiv inlärning. Målet med aktiv inlärning, i denna kontext, är att identifiera och annotera de observationer som kommer att vara mest hjälpsamma i modellens inlärningprocess. Förslaget är att, i aktiv inlärning, ge möjligheten att samla in billigare, men brusiga, etiketter. På så sätt kan en begränsad annoteringsbudget räcka till att annotera fler datapunkter, vilket i sin tur kan leda till en bättre modell. Det senare är något som påvisas experimentellt.Funding agencies: This research was financially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.</p
Perspektiv på prediktiv- och etikettosäkerhet i probabilistisk maskininlärning
Machine learning models are, just as us humans, exposed to the uncertainty of the world. Following the complexity of real-world events, these models are often employed for prediction tasks where there is no single, ground-truth answer, meaning that it may be impossible to determine the precise outcome of the predicted event beforehand. This aleatoric uncertainty is potentially, but not necessarily, a result of the event in question being part of a larger system, where some information remains undisclosed. Moreover, machine learning models are data-driven and typically learn everything they know from data, called training data. The quality of the training data is vital in deter-mining the extent of a machine learning model’s knowledge and, consequently, how well the model performs on a given task. For instance, when training data is limited, this can result in uncertainty originating from a lack knowledge, often referred to as epistemic uncertainty. Furthermore, collected through observation, or measurements, of real-world events, the training data naturally incorporates the uncertainty inherent to these events. Some-times, additional uncertainty is integrated through the processes used to acquire the data, following, for instance, measurement error or human error. One such type of uncertainty is in this thesis termed annotation uncertainty, and relates to the collection of annotations for training models through supervised learning. The focus of this thesis lies on probabilistic predictive machine learning models, as an approach to representing different sources of so-called predictive uncertainty, including aleatoric, epistemic and annotation uncertainty. Special attention is given to annotation uncertainty, beginning with an exploration of possible negative effects of this type of uncertainty on the performance of probabilistic predictive models. We analyse how annotation uncertainty, or noise, affects the properties of asymptotic risk minimisers when training models with two different classes of loss functions: strictly proper and a group of previously proposed robust loss functions. The analysis emphasises the importance of considering a model’s ability to accurately estimate predictive uncertainty, also referred to as the model’s reliability, when developing training algorithms robust to annotation noise. However, under the umbrella of weak supervision, we also provide two examples of when annotation uncertainty can be allowed, to instead benefit model performance. In the first example, we use ensemble models to generate annotations for the training data, with the aim to teach individual probabilistic models to estimate both aleatoric and epistemic uncertainty in their predictions. Having this ability is beneficial in many applications, one of them being active learning, and, notably, the active learning algorithm constituting the second example. This specific active learning algorithm acquires data samples based on high epistemic uncertainty, believed to represent samples for which there is much gain to be made in terms of model performance. The contribution does not lie in the particular approach to acquiring data samples, but instead in introducing the possibility to make a trade-off between annotation costs and quality of annotations, as part of the active learning algorithm. Such a trade-off has the potential to lead to an improved model performance under a fixed annotation budget. The thesis also explores topics beyond annotation uncertainty. First, in the context of learning probabilistic machine learning models, we focus on unnormalised probabilistic models, with energy-based models among them. We establish a link between two groups of important methods used for estimating unnormalised models, namely noise-contrastive estimation and approximate maximum likelihood methods. This link provides an improved under-standing of noise-contrastive estimation and serves to create a more coherent framework for the estimation of unnormalised models. Second, for deeper insights into the generalisation behaviour of machine learning models trained using gradient-based learning, we study the epoch-wise double descent phenomenon in two-layer linear neural networks. With this, we identify additional factors contributing to epoch-wise double descent that has not been observed for the simpler linear regression model, which is commonly central to theoretical studies. Although not specific to probabilistic models, these insights could potentially be extended to such models in the future and used to further explore the interplay between annotation uncertainty and model performance.Funding: This research was financially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, the Excellence Center at Linköping-Lund in Information Technology (ELLIIT), and the Swedish Research Council.</p
Robustness and Reliability When Training With Noisy Labels [Elektronisk resurs]
Labelling of data for supervised learning canbe costly and time-consuming and the riskof incorporating label noise in large data setsis imminent. When training a flexible discriminative model using a strictly proper loss,such noise will inevitably shift the solution towards the conditional distribution over noisylabels. Nevertheless, while deep neural networks have proven capable of fitting randomlabels, regularisation and the use of robustloss functions empirically mitigate the effectsof label noise. However, such observationsconcern robustness in accuracy, which is insufficient if reliable uncertainty quantificationis critical. We demonstrate this by analysingthe properties of the conditional distributionover noisy labels for an input-dependent noisemodel. In addition, we evaluate the set ofrobust loss functions characterised by noiseinsensitive, asymptotic risk minimisers. Wefind that strictly proper and robust loss functions both offer asymptotic robustness in accuracy, but neither guarantee that the finalmodel is calibrated. Moreover, even with robust loss functions, overfitting is an issue inpractice. With these results, we aim to explain observed robustness of common training practices, such as early stopping, to labelnoise. In addition, we aim to encourage thedevelopment of new noise-robust algorithmsthat not only preserve accuracy but that alsoensure reliability. </p
Robustness and Reliability When Training With Noisy Labels
Labelling of data for supervised learning canbe costly and time-consuming and the riskof incorporating label noise in large data setsis imminent. When training a flexible discriminative model using a strictly proper loss,such noise will inevitably shift the solution towards the conditional distribution over noisylabels. Nevertheless, while deep neural networks have proven capable of fitting randomlabels, regularisation and the use of robustloss functions empirically mitigate the effectsof label noise. However, such observationsconcern robustness in accuracy, which is insufficient if reliable uncertainty quantificationis critical. We demonstrate this by analysingthe properties of the conditional distributionover noisy labels for an input-dependent noisemodel. In addition, we evaluate the set ofrobust loss functions characterised by noiseinsensitive, asymptotic risk minimisers. Wefind that strictly proper and robust loss functions both offer asymptotic robustness in accuracy, but neither guarantee that the finalmodel is calibrated. Moreover, even with robust loss functions, overfitting is an issue inpractice. With these results, we aim to explain observed robustness of common training practices, such as early stopping, to labelnoise. In addition, we aim to encourage thedevelopment of new noise-robust algorithmsthat not only preserve accuracy but that alsoensure reliability.
