164 research outputs found
Adversarial content manipulation for analyzing and improving model robustness
The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht â von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben auĂerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewĂ€hrleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche AusnahmefĂ€lle zu identifizieren. Prinzipiell könnte die Robustheit âlokalâ bestimmt werden, indem wir alle zulĂ€ssigen Variationen einer Eingabe generieren und die Ausgabe des Systems ĂŒberprĂŒfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu ĂŒberprĂŒfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Ănderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu Ă€ndern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu Ă€ndern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezĂŒglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die DiversitĂ€t von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Ănderungen auszuwĂ€hlen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von fĂŒr die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Ănderungen vorgenommen, die fĂŒr das Modell möglichst ungĂŒnstig sind. Zum Beispiel Ă€ndern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung tĂ€uschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern
Actively Semi-Supervised Deep Rule-based Classifier Applied to Adverse Driving Scenarios
This paper presents an actively semi-supervised multi-layer neuro-fuzzy modeling method, ASSDRB, to classify different lighting conditions for driving scenes. ASSDRB is composed of a massively parallel ensemble of AnYa type 0-order fuzzy rules. It uses a recursive learning algorithm to update its structure when new data items are provided and, therefore, is able to cope with nonstationarities. Different lighting conditions for driving situations are considered in the analysis, which is used by self-driving cars as a safety mechanism. Differently from mainstream Deep Neural Networks approaches, the ASSDRB is able to learn from unseen data. Experiments on different lighting conditions for driving scenes, demonstrated that the deep neuro-fuzzy modeling is an efficient framework for these challenging classification tasks. Classification accuracy is higher than those produced by alternative machine learning methods. The number of algebraic calculations for the present method are significantly smaller and, therefore, the method is significantly faster than common Deep Neural Networks approaches. Moreover, DRB produced transparent AnYa fuzzy rules, which are human interpretable
A Functional Data Perspective and Baseline On Multi-Layer Out-of-Distribution Detection
A key feature of out-of-distribution (OOD) detection is to exploit a trained
neural network by extracting statistical patterns and relationships through the
multi-layer classifier to detect shifts in the expected input data
distribution. Despite achieving solid results, several state-of-the-art methods
rely on the penultimate or last layer outputs only, leaving behind valuable
information for OOD detection. Methods that explore the multiple layers either
require a special architecture or a supervised objective to do so. This work
adopts an original approach based on a functional view of the network that
exploits the sample's trajectories through the various layers and their
statistical dependencies. It goes beyond multivariate features aggregation and
introduces a baseline rooted in functional anomaly detection. In this new
framework, OOD detection translates into detecting samples whose trajectories
differ from the typical behavior characterized by the training set. We validate
our method and empirically demonstrate its effectiveness in OOD detection
compared to strong state-of-the-art baselines on computer vision benchmarks
Learning visually grounded meaning representations
Humans possess a rich semantic knowledge of words and concepts which captures the
perceivable physical properties of their real-world referents and their relations. Encoding
this knowledge or some of its aspects is the goal of computational models of
semantic representation and has been the subject of considerable research in cognitive
science, natural language processing, and related areas. Existing models have
placed emphasis on different aspects of meaning, depending ultimately on the task at
hand. Typically, such models have been used in tasks addressing the simulation of behavioural
phenomena, e.g., lexical priming or categorisation, as well as in natural language
applications, such as information retrieval, document classification, or semantic
role labelling. A major strand of research popular across disciplines focuses on models
which induce semantic representations from text corpora. These models are based on
the hypothesis that the meaning of words is established by their distributional relation
to other words (Harris, 1954). Despite their widespread use, distributional models of
word meaning have been criticised as âdisembodiedâ in that they are not grounded in
perception and action (Perfetti, 1998; Barsalou, 1999; Glenberg and Kaschak, 2002).
This lack of grounding contrasts with many experimental studies suggesting that meaning
is acquired not only from exposure to the linguistic environment but also from our
interaction with the physical world (Landau et al., 1998; Bornstein et al., 2004). This
criticism has led to the emergence of new models aiming at inducing perceptually
grounded semantic representations. Essentially, existing approaches learn meaning
representations from multiple views corresponding to different modalities, i.e. linguistic
and perceptual input. To approximate the perceptual modality, previous work has
relied largely on semantic attributes collected from humans (e.g., is round, is sour), or
on automatically extracted image features. Semantic attributes have a long-standing
tradition in cognitive science and are thought to represent salient psychological aspects
of word meaning including multisensory information. However, their elicitation
from human subjects limits the scope of computational models to a small number of
concepts for which attributes are available.
In this thesis, we present an approach which draws inspiration from the successful
application of attribute classifiers in image classification, and represent images and
the concepts depicted by them by automatically predicted visual attributes. To this
end, we create a dataset comprising nearly 700K images and a taxonomy of 636 visual
attributes and use it to train attribute classifiers. We show that their predictions
can act as a substitute for human-produced attributes without any critical information
loss. In line with the attribute-based approximation of the visual modality, we represent
the linguistic modality by textual attributes which we obtain with an off-the-shelf
distributional model. Having first established this core contribution of a novel modelling
framework for grounded meaning representations based on semantic attributes,
we show that these can be integrated into existing approaches to perceptually grounded
representations. We then introduce a model which is formulated as a stacked autoencoder
(a variant of multilayer neural networks), which learns higher-level meaning representations
by mapping words and images, represented by attributes, into a common
embedding space. In contrast to most previous approaches to multimodal learning using
different variants of deep networks and data sources, our model is defined at a finer
level of granularityâit computes representations for individual words and is unique in
its use of attributes as a means of representing the textual and visual modalities.
We evaluate the effectiveness of the representations learnt by our model by assessing
its ability to account for human behaviour on three semantic tasks, namely word
similarity, concept categorisation, and typicality of category members. With respect to
the word similarity task, we focus on the modelâs ability to capture similarity in both
the meaning and appearance of the wordsâ referents. Since existing benchmark datasets
on word similarity do not distinguish between these two dimensions and often contain
abstract words, we create a new dataset in a large-scale experiment where participants
are asked to give two ratings per word pair expressing their semantic and visual
similarity, respectively. Experimental results show that our model learns meaningful
representations which are more accurate than models based on individual modalities or
different modality integration mechanisms. The presented model is furthermore able to
predict textual attributes for new concepts given their visual attribute predictions only,
which we demonstrate by comparing model output with human generated attributes.
Finally, we show the modelâs effectiveness in an image-based task on visual category
learning, in which images are used as a stand-in for real-world objects
Pattern Recognition
Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
Proceedings of the Post-Graduate Conference on Robotics and Development of Cognition, 10-12 September 2012, Lausanne, Switzerland
The aim of the Postgraduate Conference on Robotics and Development of Cognition (RobotDoC-PhD) is to bring together young scientists working on developmental cognitive robotics and its core disciplines. The conference aims to provide both feedback and greater visibility to their research as lively and stimulating discussion can be held amongst participating PhD students and senior researchers. The conference is open to all PhD students and post-doctoral researchers in the field. RobotDoC-PhD conference is an initiative as a part of Marie-Curie Actions ITN RobotDoC and will be organized as a satellite event of the 22nd International Conference on Artificial Neural Networks ICANN 2012
Proceedings of the Post-Graduate Conference on Robotics and Development of Cognition, 10-12 September 2012, Lausanne, Switzerland
The aim of the Postgraduate Conference on Robotics and Development of Cognition (RobotDoC-PhD) is to bring together young scientists working on developmental cognitive robotics and its core disciplines. The conference aims to provide both feedback and greater visibility to their research as lively and stimulating discussion can be held amongst participating PhD students and senior researchers. The conference is open to all PhD students and post-doctoral researchers in the field. RobotDoC-PhD conference is an initiative as a part of Marie-Curie Actions ITN RobotDoC and will be organized as a satellite event of the 22nd International Conference on Artificial Neural Networks ICANN 2012
- âŠ