103 research outputs found
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
Sensing, interpreting, and anticipating human social behaviour in the real world
Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie
Recommended from our members
High-throughput, single-worm tracking and analysis in Caenorhabditis elegans
Caenorhabditis elegans, a millimeter-sized, soil-dwelling nematode, is a model organism for biology research. Its whole genome has been sequenced. The lineage and fate, for each one of the cells in wild-type (N2) worms, is known. The connectivity, for all 302 neurons of wild-type hermaphrodites, has been mapped. Many of its genes have homologs within other organisms, including humans. C. elegans have a well-defined repertoire of observed behaviors. For these reasons, and due to a wealth of experimental data, C. elegans is a well-suited organism for mapping genetics to phenotype. This thesis details a system for relating genetics to phenotype. I present a methodology for semi-automated, high-throughput, high resolution investigation of gene effects on behavior and morphology using C. elegans.
In the first section beyond the introduction, Chapter 2, I describe a new singleworm tracking system (hardware and software), titled Worm Tracker 2.0 (WT2), which was used to collect videos of worm behavior with high throughput. While multi-worm tracking systems exist, including ones that enable higher experimental throughput by recording multiple worms at once, their videos have insufficient resolution to resolve worm bodies well and these systems have been limited to only simple measurements. While other single-worm tracking systems also exist, they present, among other limitations, significant costs precluding high experimental throughput. I designed and built the hardware and software for a less expensive unit, which is approximately 1/4 the cost of previous single-worm trackers. This enabled us to purchase eight such units for high-throughput of experimentation. Other novelty for our system includes the ability to track worms at all larval stages and the ability to follow single-worms swimming.
In Chapter 3, I describe a novel automated analysis for the worm videos collected using the aforementioned single-worm tracker. While analysis exists for other single-worm tracking systems, several limitations precluded adaptation. Our worm videos are on food and the worms are of variable size. Several previous algorithms attempted to deal with worms on food but, for our purposes, suffer from poor resolution at the head and tail, areas necessary to obtain significant phenotypic information. The analysis I built uses a novel algorithm driven by a need to obtain high-accuracy and precise worm contours (and their consequent skeletons) in our difficult conditions (e.g., on food and swimming environments) with invariance to worm size (bounded by a minimal limit of resolution). This accuracy was necessary due to the sheer size of the data set collected, roughly 1/3 of a billion frames, which precludes manual verification.
In the final section, Chapter 4, I describe the results from my analysis of our collected data. Using our trackers we collected more than 12,000 videos, each 15 minutes in length, at 640x480 20-30Hz resolution, representing over 300 mutant strains matched to wild-type controls. This large set was filtered to obtain high-quality data and remove strains specific to private data sets (prepared for future publications). The filtered analysis covers 330 worm groups compiled from 300 mutant strains, 2 wild isolates, three descendants of N2, along with our N2 controls divided into hourly, daily, and monthly groups. A subset of 79 strains, representing 76 genes with no previously characterized phenotype, show significant measures in my analysis. Further sensitivity of the analysis is explored through measures of habituation, small morphological changes due to growth, and a phenotypic comparison of the three descendants from the ancestral, wild-type N2. With the sensitivity explored, I present an N2 phenotypic reference compiled from 1,218 worms, recorded over three years. Statistics of this set define a reference measure of the N2 phenotype (specific to the Schafer Lab wild type) with broad implications for performing and controlling C. elegans experiments. Three genes, implicated in mechanosensation as a result of genetic sequence but lacking any observed phenotypic support, reveal locomotory phenotypes in our analysis. This prompts a large clustering of all 330 groups, to assess the predictive capabilities of our system. The N2 groups cluster together in a large exclusive aggregate. Further support for the predictive capabilities of the clustering emerge among multiple published pathways that also form exclusive clusters. I end by discussing a set of genes, predicted to be acetylcholine receptors through genetic sequence and functional heterologous expression, which now receive further support through strong aggregation within their own exclusive phenotypic cluster.This work was supported by the Gates Trust and the Medical Research Council
Concept of a Robust & Training-free Probabilistic System for Real-time Intention Analysis in Teams
Die Arbeit beschäftigt sich mit der Analyse von Teamintentionen in Smart Environments (SE). Die fundamentale Aussage der Arbeit ist, dass die Entwicklung und Integration expliziter Modelle von Nutzeraufgaben einen wichtigen Beitrag zur Entwicklung mobiler und ubiquitärer Softwaresysteme liefern können. Die Arbeit sammelt Beschreibungen von menschlichem Verhalten sowohl in Gruppensituationen als auch Problemlösungssituationen. Sie untersucht, wie SE-Projekte die Aktivitäten eines Nutzers modellieren, und liefert ein Teamintentionsmodell zur Ableitung und Auswahl geplanten Teamaktivitäten mittels der Beobachtung mehrerer Nutzer durch verrauschte und heterogene Sensoren. Dazu wird ein auf hierarchischen dynamischen Bayes’schen Netzen basierender Ansatz gewählt
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Recommended from our members
Object Tracking-by-Segmentation in Videos
This thesis focuses on the problem of object tracking. Given a video, the general objective of tracking is to track the location over time of one or more targets in the image sequence. This is a very challenging task as algorithms need to deal with problems such as appearance variations, non-rigid deformations, cluttered background, occlusions etc. While most existing methods use bounding boxes to represent the target, we use segmentations instead, which provide better ac- cess to target pixels and can better handle occlusions. Our first contribution, is a new tracking algorithm that given an over-segmentation of a video tracks multiple targets through interactions and occlusions. We develop a provably convergent learning algorithm for this approach, which leverages training data to improve performance. Our second contribution targets the case when an over-segmentation is not available due to poor video quality or low resolution. For this case, we develop a new algorithm that tracks coherent regions and estimates the number of target objects in each region. This count representation of a video can be used to help inform more traditional tracking techniques. Finally, we develop the first tracking-by-segmentation approach based on deep learning. We propose a novel deep network architecture and training algorithms for learning to segment and track a target object throughout a video. All of our algorithms are rigorously evaluated on challenging benchmark video collections, which demonstrate improvements over the state-of-the-art
Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions
This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation
Data Epistemologies / Surveillance and Uncertainty
Data Epistemologies studies the changing ways in which ‘knowledge’ is defined, promised, problematised, legitimated vis-á-vis the advent of digital, ‘big’ data surveillance technologies in early twenty-first century America. As part of the period’s fascination with ‘new’ media and ‘big’ data, such technologies intersect ambitious claims to better knowledge with a problematisation of uncertainty. This entanglement, I argue, results in contextual reconfigurations of what ‘counts’ as knowledge and who (or what) is granted authority to produce it – whether it involves proving that indiscriminate domestic surveillance prevents terrorist attacks, to arguing that machinic sensors can know us better than we can ever know ourselves.
The present work focuses on two empirical cases. The first is the ‘Snowden Affair’ (2013-Present): the public controversy unleashed through the leakage of vast quantities of secret material on the electronic surveillance practices of the U.S. government. The second is the ‘Quantified Self’ (2007-Present), a name which describes both an international community of experimenters and the wider industry built up around the use of data-driven surveillance technology for self-tracking every possible aspect of the individual ‘self’. By triangulating media coverage, connoisseur communities, advertising discourse and leaked material, I examine how surveillance technologies were presented for public debate and speculation.
This dissertation is thus a critical diagnosis of the contemporary faith in ‘raw’ data, sensing machines and algorithmic decision-making, and of their public promotion as the next great leap towards objective knowledge. Surveillance is not only a means of totalitarian control or a technology for objective knowledge, but a collective fantasy that seeks to mobilise public support for new epistemic systems. Surveillance, as part of a broader enthusiasm for ‘data-driven’ societies, extends the old modern project whereby the human subject – its habits, its affects, its actions – become the ingredient, the raw material, the object, the target, for the production of truths and judgments about them by things other than themselves
- …