9 research outputs found
Weakly Supervised Learning of Objects, Attributes and Their Associations
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-10605-2_31]”
Methods for Learning Structured Prediction in Semantic Segmentation of Natural Images
Automatic segmentation and recognition of semantic classes in natural images is an important open problem in computer vision. In this work, we investigate three different approaches to recognition: without supervision, with supervision on level of images, and with supervision on the level of pixels. The thesis comprises three parts. The first part introduces a clustering algorithm that optimizes a novel information-theoretic objective function. We show that the proposed algorithm has clear advantages over standard algorithms from the literature on a wide array of datasets. Clustering algorithms are an important building block for higher-level computer vision applications, in particular for semantic segmentation. The second part of this work proposes an algorithm for automatic segmentation and recognition of object classes in natural images, that learns a segmentation model solely from annotation in the form of presence and absence of object classes in images. The third and main part of this work investigates one of the most popular approaches to the task of object class segmentation and semantic segmentation, based on conditional random fields and structured prediction. We investigate several learning algorithms, in particular in combination with approximate inference procedures. We show how structured models for image segmentation can be learned exactly in practical settings, even in the presence of many loops in the underlying neighborhood graphs. The introduced methods provide results advancing the state-of-the-art on two complex benchmark datasets for semantic segmentation, the MSRC-21 Dataset of RGB images and the NYU V2 Dataset or RGB-D images of indoor scenes. Finally, we introduce a software library that al- lows us to perform extensive empirical comparisons of state-of-the-art structured learning approaches. This allows us to characterize their practical properties in a range of applications, in particular for semantic segmentation and object class segmentation.Methoden zum Lernen von Strukturierter Vorhersage in Semantischer Segmentierung von Natürlichen Bildern Automatische Segmentierung und Erkennung von semantischen Klassen in natür- lichen Bildern ist ein wichtiges offenes Problem des maschinellen Sehens. In dieser Arbeit untersuchen wir drei möglichen Ansätze der Erkennung: ohne Überwachung, mit Überwachung auf Ebene von Bildern und mit Überwachung auf Ebene von Pixeln. Diese Arbeit setzt sich aus drei Teilen zusammen. Im ersten Teil der Arbeit schlagen wir einen Clustering-Algorithmus vor, der eine neuartige, informationstheoretische Zielfunktion optimiert. Wir zeigen, dass der vorgestellte Algorithmus üblichen Standardverfahren aus der Literatur gegenüber klare Vorteile auf vielen verschiedenen Datensätzen hat. Clustering ist ein wichtiger Baustein in vielen Applikationen des machinellen Sehens, insbesondere in der automatischen Segmentierung. Der zweite Teil dieser Arbeit stellt ein Verfahren zur automatischen Segmentierung und Erkennung von Objektklassen in natürlichen Bildern vor, das mit Hilfe von Supervision in Form von Klassen-Vorkommen auf Bildern in der Lage ist ein Segmentierungsmodell zu lernen. Der dritte Teil der Arbeit untersucht einen der am weitesten verbreiteten Ansätze zur semantischen Segmentierung und Objektklassensegmentierung, Conditional Random Fields, verbunden mit Verfahren der strukturierten Vorhersage. Wir untersuchen verschiedene Lernalgorithmen des strukturierten Lernens, insbesondere im Zusammenhang mit approximativer Vorhersage. Wir zeigen, dass es möglich ist trotz des Vorhandenseins von Kreisen in den betrachteten Nachbarschaftsgraphen exakte strukturierte Modelle zur Bildsegmentierung zu lernen. Mit den vorgestellten Methoden bringen wir den Stand der Kunst auf zwei komplexen Datensätzen zur semantischen Segmentierung voran, dem MSRC-21 Datensatz von RGB-Bildern und dem NYU V2 Datensatz von RGB-D Bildern von Innenraum-Szenen. Wir stellen außerdem eine Software-Bibliothek vor, die es erlaubt einen weitreichenden Vergleich der besten Lernverfahren für strukturiertes Lernen durchzuführen. Unsere Studie erlaubt uns eine Charakterisierung der betrachteten Algorithmen in einer Reihe von Anwendungen, insbesondere der semantischen Segmentierung und Objektklassensegmentierung
Recommended from our members
Weakly Supervised Learning for Activity Recognition from Time Series Data
The thesis focuses on activity recognition from sensor data, which has spurred a great deal of interest due to its impact on health care and security. Previous work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort to label the training data with the start and end times of each activity. Multi-instance learning (MIL) and multi-instance multi-label learning (MIML) are weakly supervised learning alternatives to the standard supervised learning framework, in which ambiguity in the labeling can reduce the annotation effort needed to produce labeled training data. We introduce generative graphical models for MIL and MIML based on auto-regressive processes. Our first work, based on a mixture of auto-regressive processes, assumes that instances within a bag are independent. We then relax the i.i.d. assumption for instances and extend the model by considering the sequential structure within a bag. Finally we take a MIML approach to predict the presence of multiple activities within a time interval. For each approach, we evaluate against other state-of-the-art algorithms to demonstrate the effectiveness of the proposed approach on multiple real-world data sets
Weakly Supervised Learning of Objects and Attributes.
PhDThis thesis presents weakly supervised learning approaches to directly
exploit image-level tags (e.g. objects, attributes) for comprehensive
image understanding, including tasks such as object localisation, image
description, image retrieval, semantic segmentation, person re-identification
and person search, etc. Unlike the conventional approaches which tackle
weakly supervised problem by learning a discriminative model, a generative
Bayesian framework is proposed which provides better mechanisms
to resolve the ambiguity problem. The proposed model significantly differentiates
from the existing approaches in that: (1) All foreground object
classes are modelled jointly in a single generative model that encodes multiple
objects co-existence so that “explaining away” inference can resolve
ambiguity and lead to better learning. (2) Image backgrounds are shared
across classes to better learn varying surroundings and “push out” objects
of interest. (3) the Bayesian formulation enables the exploitation of various
types of prior knowledge to compensate for the limited supervision
offered by weakly labelled data, as well as Bayesian domain adaptation
for transfer learning.
Detecting objects is the first and critical component in image understanding
paradigm. Unlike conventional fully supervised object detection
approaches, the proposed model aims to train an object detector
from weakly labelled data. A novel framework based on Bayesian latent
topic model is proposed to address the problem of localisation of objects
as bounding boxes in images and videos with image level object labels.
The inferred object location can be then used as the annotation to train a
classic object detector with conventional approaches.
However, objects cannot tell the whole story in an image. Beyond detecting
objects, a general visual model should be able to describe objects
and segment them at a pixel level. Another limitation of the initial model is
that it still requires an additional object detector. To remedy the above two
drawbacks, a novel weakly supervised non-parametric Bayesian model is
presented to model objects, attributes and their associations automatically
from weakly labelled images. Once learned, given a new image, the proposed
model can describe the image with the combination of objects and
attributes, as well as their locations and segmentation.
Finally, this thesis further tackles the weakly supervised learning problem
from a transfer learning perspective, by considering the fact that there
are always some fully labelled or weakly labelled data available in a related
domain while only insufficient labelled data exist for training in the
target domain. A powerful semantic description is transferred from the existing
fashion photography datasets to surveillance data to solve the person
re-identification problem
Recommended from our members
Multi-instance multi-label learning : algorithms and applications to bird bioacoustics
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their interactions with the environment and other organisms. However, traditional observation methods are labor- intensive. Most prior work on machine learning for bird song is not applicable to real-world acoustic monitoring, because it assumes recordings contain only a single species of bird, while recordings typically contain multiple simultaneously vocalizing birds. We propose to use the multi-instance multi-label (MIML) framework in machine learning for the species classification problem, where the dataset is viewed as a collection of bags of instances paired with sets of labels. Furthermore, we formalize MIML instance annotation, where the goal is to predict instance labels while learning only from bag label sets. We develop the first MIML representation for audio, and several new algorithms for MIML instance annotation based on support vector machines or classifier chains. The proposed methods classify either the set of species present in a recording, or individual calls, while learning only from recordings paired with a set of species. This form of training data requires less human effort to obtain than individually labeled calls. These methods are successfully applied to audio collected in the field which included multiple simultaneously vocalizing species. The proposed algorithms for MIML classification are general, and are also applied to object recognition in images
Recommended from our members
Multiple Instance Multiple Label Learning with Limited Supervision
In weak supervision learning, label information can be provided at different levels of granularity. For example, in multi-instance multi-label learning, samples are organized into bags and labels for each class are provided at the bag level. For small datasets, this approach offers means of reducing the labeling efforts. However, in the big-data order, even this mid-level labeling granularity can become prohibitively costly. Under this limited supervision, several labeling approaches can be considered to further reduce labeling costs. In semi-supervised learning, only a limited number of bags can be labeled meanwhile a large number of bags remain unlabeled. In partial- or incomplete-label learning, due to labeling policy or labeling challenges, only a subset of the classes can be labeled for each bag. This subset may be small or even an empty set. In active learning, a small number of class labels per bag are available and a limited number of the most informative labels from the unlabeled portion of data are queried in turn during the training phase to update the model efficiently. The goal is to achieve the best classifier in terms of performance with as least as possible number of available labels. All of the aforementioned approaches provide a practical solution for reducing labeling efforts under the multi-instance multi-label learning framework but introduce machine learning challenges both in terms of the required methodology as well as performance limitations for these methods. This work focuses on probabilistic models with exact/approximate but efficient inferences to leverage information from limited supervision data. In many cases, the proposed frameworks outperform, even significantly, the recently state-of-the-art approaches