15 research outputs found
Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines
Tackling pattern recognition problems in areas such as computer vision,
bioinformatics, speech or text recognition is often done best by taking into
account task-specific statistical relations between output variables. In
structured prediction, this internal structure is used to predict multiple
outputs simultaneously, leading to more accurate and coherent predictions.
Structural support vector machines (SSVMs) are nonprobabilistic models that
optimize a joint input-output function through margin-based learning. Because
SSVMs generally disregard the interplay between unary and interaction factors
during the training phase, final parameters are suboptimal. Moreover, its
factors are often restricted to linear combinations of input features, limiting
its generalization power. To improve prediction accuracy, this paper proposes:
(i) Joint inference and learning by integration of back-propagation and
loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM
factors to neural networks that form highly nonlinear functions of input
features. Image segmentation benchmark results demonstrate improvements over
conventional SSVM training methods in terms of accuracy, highlighting the
feasibility of end-to-end SSVM training with neural factors
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Deep Learning for Aerial Scene Understanding in High Resolution Remote Sensing Imagery from the Lab to the Wild
Diese Arbeit präsentiert die Anwendung von Deep Learning beim Verständnis von Luftszenen, z. B. Luftszenenerkennung, Multi-Label-Objektklassifizierung und semantische Segmentierung. Abgesehen vom Training tiefer Netzwerke unter Laborbedingungen bietet diese Arbeit auch Lernstrategien für praktische Szenarien, z. B. werden Daten ohne Einschränkungen gesammelt oder Annotationen sind knapp
Human-in-the-Loop Learning From Crowdsourcing and Social Media
Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels.
Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level
Multimodal headpose estimation and applications
This thesis presents new research into human headpose estimation and its applications
in multi-modal data. We develop new methods for head pose estimation
spanning RGB-D Human Computer Interaction (HCI) to far away "in the wild"
surveillance quality data. We present the state-of-the-art solution in both head
detection and head pose estimation through a new end-to-end Convolutional Neural
Network architecture that reuses all of the computation for detection and pose
estimation. In contrast to prior work, our method successfully spans close up HCI
to low-resolution surveillance data and is cross modality: operating on both RGB
and RGB-D data. We further address the problem of limited amount of standard
data, and different quality of annotations by semi supervised learning and novel
data augmentation. (This latter contribution also finds application in the domain
of life sciences.)
We report the highest accuracy by a large margin: 60% improvement; and demonstrate
leading performance on multiple standardized datasets. In HCI we reduce
the angular error by 40% relative to the previous reported literature. Furthermore,
by defining a probabilistic spatial gaze model from the head pose we show
application in human-human, human-scene interaction understanding. We present
the state-of-the art results on the standard interaction datasets. A new metric to
model "social mimicry" through the temporal correlation of the headpose signal
is contributed and shown to be valid qualitatively and intuitively. As an application
in surveillance, it is shown that with the robust headpose signal as a prior,
state-of-the-art results in tracking under occlusion using a Kalman filter can be
achieved. This model is named the Intentional Tracker and it improves visual
tracking metrics by up to 15%.
We also apply the ALICE loss that was developed for the end-to-end detection
and classification, to dense classiffication of underwater coral reefs imagery. The
objective of this work is to solve the challenging task of recognizing and segmenting
underwater coral imagery in the wild with sparse point-based ground truth
labelling. To achieve this, we propose an integrated Fully Convolutional Neural
Network (FCNN) and Fully-Connected Conditional Random Field (CRF) based classification and segmentation algorithm. Our major contributions lie in four major
areas. First, we show that multi-scale crop based training is useful in learning
of the initial weights in the canonical one class classiffication problem. Second,
we propose a modified ALICE loss for training the FCNN on sparse labels with
class imbalance and establish its signi cance empirically. Third we show that
by arti cially enhancing the point labels to small regions based on class distance
transform, we can improve the classification accuracy further. Fourth, we improve
the segmentation results using fully connected CRFs by using a bilateral message
passing prior. We improve upon state-of-the-art results on all publicly available
datasets by a significant margin
Recent Advances in Indoor Localization Systems and Technologies
Despite the enormous technical progress seen in the past few years, the maturity of indoor localization technologies has not yet reached the level of GNSS solutions. The 23 selected papers in this book present the recent advances and new developments in indoor localization systems and technologies, propose novel or improved methods with increased performance, provide insight into various aspects of quality control, and also introduce some unorthodox positioning methods
Deep Learning for Building Footprint Generation from Optical Imagery
Auf Deep Learning basierende Methoden haben vielversprechende Ergebnisse für die Aufgabe der Erstellung von Gebäudegrundrissen gezeigt, aber sie haben zwei inhärente Einschränkungen. Erstens zeigen die extrahierten Gebäude verschwommene Gebäudegrenzen und Klecksformen. Zweitens sind für das Netzwerktraining massive Annotationen auf Pixelebene erforderlich. Diese Dissertation hat eine Reihe von Methoden entwickelt, um die oben genannten Probleme anzugehen. Darüber hinaus werden die entwickelten Methoden in praktische Anwendungen umgesetzt