763 research outputs found
Automated identification and behaviour classification for modelling social dynamics in group-housed mice
Mice are often used in biology as exploratory models of human conditions, due to their similar genetics and physiology. Unfortunately, research on behaviour has traditionally been limited to studying individuals in isolated environments and over short periods of time. This can miss critical time-effects, and, since mice are social creatures, bias results.
This work addresses this gap in research by developing tools to analyse the individual behaviour of group-housed mice in the home-cage over several days and with minimal disruption. Using data provided by the Mary Lyon Centre at MRC Harwell we designed an end-to-end system that (a) tracks and identifies mice in a cage, (b) infers their behaviour, and subsequently (c) models the group dynamics as functions of individual activities. In support of the above, we also curated and made available a large dataset of mouse localisation and behaviour classifications (IMADGE), as well as two smaller annotated datasets for training/evaluating the identification (TIDe) and behaviour inference (ABODe) systems. This research constitutes the first of its kind in terms of the scale and challenges addressed. The data source (side-view single-channel video with clutter and no identification markers for mice) presents challenging conditions for analysis, but has the potential to give richer information while using industry standard housing.
A Tracking and Identification module was developed to automatically detect, track and identify the (visually similar) mice in the cluttered home-cage using only single-channel IR video and coarse position from RFID readings. Existing detectors and trackers were combined with a novel Integer Linear Programming formulation to assign anonymous tracks to mouse identities. This utilised a probabilistic weight model of affinity between detections and RFID pickups.
The next task necessitated the implementation of the Activity Labelling module that classifies the behaviour of each mouse, handling occlusion to avoid giving unreliable classifications when the mice cannot be observed. Two key aspects of this were (a) careful feature-selection, and (b) judicious balancing of the errors of the system in line with the repercussions for our setup.
Given these sequences of individual behaviours, we analysed the interaction dynamics between mice in the same cage by collapsing the group behaviour into a sequence of interpretable latent regimes using both static and temporal (Markov) models. Using a permutation matrix, we were able to automatically assign mice to roles in the HMM, fit a global model to a group of cages and analyse abnormalities in data from a different demographic
Will they take this offer? A machine learning price elasticity model for predicting upselling acceptance of premium airline seating
Employing customer information from one of the world's largest airline companies, we develop a price elasticity model (PREM) using machine learning to identify customers likely to purchase an upgrade offer from economy to premium class and predict a customer's acceptable price range. A simulation of 64.3 million flight bookings and 14.1 million email offers over three years mirroring actual data indicates that PREM implementation results in approximately 1.12 million (7.94%) fewer non-relevant customer email messages, a predicted increase of 72,200 (37.2%) offers accepted, and an estimated $72.2 million (37.2%) of increased revenue. Our results illustrate the potential of automated pricing information and targeting marketing messages for upselling acceptance. We also identified three customer segments: (1) Never Upgrades are those who never take the upgrade offer, (2) Upgrade Lovers are those who generally upgrade, and (3) Upgrade Lover Lookalikes have no historical record but fit the profile of those that tend to upgrade. We discuss the implications for airline companies and related travel and tourism industries.© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed
Subgroup discovery for structured target concepts
The main object of study in this thesis is subgroup discovery, a theoretical framework for finding subgroups in dataâi.e., named sub-populationsâ whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset. This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. We introduce the concept of representative subgroups, which we use not only to ensure the fairness of a sub-population with regard to a sensitive trait, such as race or gender, but also to go beyond known trends in the data. For entities with additional relational information that can be encoded as a graph, we introduce a novel measure of robust connectedness which improves on established alternative measures of density; we then provide a method that uses this measure to discover which named sub-populations are more well-connected. Our contributions within subgroup discovery crescent with the introduction of kernelised subgroup discovery: a novel framework that enables the discovery of subgroups on i.i.d. target concepts with virtually any kind of structure. Importantly, our framework additionally provides a concrete and efficient tool that works out-of-the-box without any modification, apart from specifying the Gramian of a positive definite kernel. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel. Our kernel allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, while we also propose meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel method.Der Hauptgegenstand dieser Arbeit ist die Subgruppenentdeckung (Subgroup Discovery), ein theoretischer Rahmen fĂŒr das Auffinden von Subgruppen in Datenâd. h. benannte Teilpopulationenâderen Verhalten in Bezug auf ein bestimmtes Targetkonzept im Vergleich zum Rest des Datensatzes auĂergewöhnlich ist. Es handelt sich hierbei um ein leistungsfĂ€higes Instrument, das einem menschlichen Publikum wichtige Informationen vermittelt. Allerdings ist es trotz bisherigen Fortschritte auf einfache Targetkonzepte beschrĂ€nkt. In dieser Arbeit schlagen wir Algorithmen vor, die diesen Rahmen auf neuartige Anwendungsbereiche ĂŒbertragen. Wir fĂŒhren das Konzept der reprĂ€sentativen Untergruppen ein, mit dem wir nicht nur die Fairness einer Teilpopulation in Bezug auf ein sensibles Merkmal wie Rasse oder Geschlecht sicherstellen, sondern auch ĂŒber bekannte Trends in den Daten hinausgehen können. FĂŒr EntitĂ€ten mit zusĂ€tzlicher relationalen Information, die als Graph kodiert werden kann, fĂŒhren wir ein neuartiges MaĂ fĂŒr robuste Verbundenheit ein, das die etablierten alternativen DichtemaĂe verbessert; anschlieĂend stellen wir eine Methode bereit, die dieses MaĂ verwendet, um herauszufinden, welche benannte Teilpopulationen besser verbunden sind. Unsere BeitrĂ€ge in diesem Rahmen gipfeln in der EinfĂŒhrung der kernelisierten Subgruppenentdeckung: ein neuartiger Rahmen, der die Entdeckung von Subgruppen fĂŒr u.i.v. Targetkonzepten mit praktisch jeder Art von Struktur ermöglicht. Wichtigerweise, unser Rahmen bereitstellt zusĂ€tzlich ein konkretes und effizientes Werkzeug, das ohne jegliche Modifikation funktioniert, abgesehen von der Angabe des Gramian eines positiv definitiven Kernels. FĂŒr den Einsatz innerhalb der kernelisierten Subgruppentdeckung, aber auch fĂŒr jede andere Art von Kernel-Methode, fĂŒhren wir zusĂ€tzlich einen neuartigen Random-Walk-Graph-Kernel ein. Unser Kernel ermöglicht die Feinabstimmung der Ausrichtung zwischen den Eckpunkten der beiden unter-Vergleich-gestelltenen Graphen wĂ€hrend der ZĂ€hlung der Random Walks, wĂ€hrend wir auch sinnvolle strukturbewusste Vertex-Labels vorschlagen, um diese neue FĂ€higkeit zu nutzen. Mit diesen BeitrĂ€gen erweitern wir die Anwendbarkeit der Subgruppentdeckung grĂŒndlich und definieren wir sie im Endeffekt als Kernel-Methode neu
Measuring the impact of COVID-19 on hospital care pathways
Care pathways in hospitals around the world reported significant disruption during the recent COVID-19 pandemic but measuring the actual impact is more problematic. Process mining can be useful for hospital management to measure the conformance of real-life care to what might be considered normal operations. In this study, we aim to demonstrate that process mining can be used to investigate process changes associated with complex disruptive events. We studied perturbations to accident and emergency (A &E) and maternity pathways in a UK public hospital during the COVID-19 pandemic. Co-incidentally the hospital had implemented a Command Centre approach for patient-flow management affording an opportunity to study both the planned improvement and the disruption due to the pandemic. Our study proposes and demonstrates a method for measuring and investigating the impact of such planned and unplanned disruptions affecting hospital care pathways. We found that during the pandemic, both A &E and maternity pathways had measurable reductions in the mean length of stay and a measurable drop in the percentage of pathways conforming to normative models. There were no distinctive patterns of monthly mean values of length of stay nor conformance throughout the phases of the installation of the hospitalâs new Command Centre approach. Due to a deficit in the available A &E data, the findings for A &E pathways could not be interpreted
Recommended from our members
Complaint Driven Training Data Debugging for Machine Learning Workflows
As the need for machine learning (ML) increases rapidly across all industry sectors, so has theinterest in building ML platforms that manage and automate parts of the ML life-cycle. This has enabled companies to use ML inference as a part of their downstream analytics or their applications. Unfortunately, debugging unexpected outcomes in the result of these ML workflows remains a necessary but difficult task of the ML life-cycle. The challenge of debugging ML workflows is that it requires reasoning about the correctness of the workflow logic, the datasets used for inference and training, the models, and interactions between them. Even if the workflow logic is correct, errors in the data used across the ML workflow can still lead to wrong outcomes. In short, developers are not just debugging the code, but also the data.
We advocate in favor of a complaint driven approach towards specifying and debugging data errors in ML workflows. The approach takes as input user specified complaints specified as constraints over the final or intermediate outputs of workflows that use trained ML models. The approach outputs explanations in the form of specific operator(s) or data subsets, and how they may be changed to address the constraint violations.
In this thesis we make the first steps towards our complaint driven approach to data debugging. As a stepping stone, we focus our attention on complaints specified on top of relational workflows that use ML model inference and whose errors are caused by errors in ML modelâs training data. To the best of our knowledge, we contribute the first debugging system for this task, which we call Rain. In response to a user complaint, Rain ranks the ML modelâs training examples based on their ability to address the userâs complaint if they were removed. Our experiments show that users can use Rain to debug training data errors by specifying complaints over aggregations of model predictions without having to specify the correct label for each individual prediction.
Unfortunately, Rainâs latency may be prohibitive for use in interactive applications like analytical dashboards or business intelligence tools where users are likely to observe errors and complain. To address Rainâs latency problem when scaling to large ML models and training sets, we propose Rain++. Rain++ pushes the majority of Rainâs computation offline ahead of user interaction, achieving orders of magnitude online latency improvements compared to Rain.
To go beyond Rainâs and Rain++âs approach that evaluates individual training example deletionsindependently we propose MetaRain, a framework for training classifiers that detect training data corruptions in response to user complaints. Thanks to the generality of MetaRain, users can adapt the classifiers chosen to the training corruptions and the complaints they seek to resolve. Our experiments indicate that making use of this ability results in improved debugging outcomes.
Last but not least, we study the problem of updating relational workflow results in response tochanges to the inference ML model used. This can be leveraged by current or future complaint driven debugging systems that repeatedly change the model and reevaluate the relational workflow. We propose FaDE, a compiler that generates efficient code for the workflow update problem by casting it as view maintenance under input tuple deletions. Our experiments indicate that the code generated by FaDE has orders of magnitude lower latency than existing view maintenance systems
Quantum Computing for Airline Planning and Operations
Classical algorithms and mathematical optimization techniques have beenused extensively by airlines to optimize their profit and ensure that regulationsare followed. In this thesis, we explore which role quantum algorithmscan have for airlines. Specifically, we have considered the two quantum optimizationalgorithms; the Quantum Approximate Optimization Algorithm(QAOA) and Quantum Annealing (QA). We present a heuristic that integratesthese quantum algorithms into the existing classical algorithm, whichis currently employed to solve airline planning problems in a state-of-the-artcommercial solver. We perform numerical simulations of QAOA circuits andfind that linear and quadratic algorithm depth in the input size can be requiredto obtain a one-shot success probability of 0.5. Unfortunately, we areunable to find performance guarantees. Finally, we perform experiments withD-waveâs newly released QA machine and find that it outperforms 2000Q formost instances
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Data analysis with merge trees
Todayâs data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram
- âŠ