229 research outputs found
An Empirical Evaluation of Current Convolutional Architectures' Ability to Manage Nuisance Location and Scale Variability
We conduct an empirical study to test the ability of Convolutional Neural
Networks (CNNs) to reduce the effects of nuisance transformations of the input
data, such as location, scale and aspect ratio. We isolate factors by adopting
a common convolutional architecture either deployed globally on the image to
compute class posterior distributions, or restricted locally to compute class
conditional distributions given location, scale and aspect ratios of bounding
boxes determined by proposal heuristics. In theory, averaging the latter should
yield inferior performance compared to proper marginalization. Yet empirical
evidence suggests the converse, leading us to conclude that - at the current
level of complexity of convolutional architectures and scale of the data sets
used to train them - CNNs are not very effective at marginalizing nuisance
variability. We also quantify the effects of context on the overall
classification task and its impact on the performance of CNNs, and propose
improved sampling techniques for heuristic proposal schemes that improve
end-to-end performance to state-of-the-art levels. We test our hypothesis on a
classification task using the ImageNet Challenge benchmark and on a
wide-baseline matching task using the Oxford and Fischer's datasets.Comment: 10 pages, 5 figures, 3 tables -- CVPR 2016, camera-ready versio
Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems
It is unknown what kind of biases modern in the wild face datasets have
because of their lack of annotation. A direct consequence of this is that total
recognition rates alone only provide limited insight about the generalization
ability of a Deep Convolutional Neural Networks (DCNNs). We propose to
empirically study the effect of different types of dataset biases on the
generalization ability of DCNNs. Using synthetically generated face images, we
study the face recognition rate as a function of interpretable parameters such
as face pose and light. The proposed method allows valuable details about the
generalization performance of different DCNN architectures to be observed and
compared. In our experiments, we find that: 1) Indeed, dataset bias has a
significant influence on the generalization performance of DCNNs. 2) DCNNs can
generalize surprisingly well to unseen illumination conditions and large
sampling gaps in the pose variation. 3) Using the presented methodology we
reveal that the VGG-16 architecture outperforms the AlexNet architecture at
face recognition tasks because it can much better generalize to unseen face
poses, although it has significantly more parameters. 4) We uncover a main
limitation of current DCNN architectures, which is the difficulty to generalize
when different identities to not share the same pose variation. 5) We
demonstrate that our findings on synthetic data also apply when learning from
real-world data. Our face image generator is publicly available to enable the
community to benchmark other DCNN architectures.Comment: Accepted to CVPR 2018 Workshop on Analysis and Modeling of Faces and
Gestures (AMFG
Assessing Capsule Networks with Biased Data
Machine learning based methods achieves impressive results in object classification and detection. Utilizing representative data of the visual world during the training phase is crucial to achieve good performance with such data driven approaches. However, it not always possible to access bias-free datasets thus, robustness to biased data is a desirable property for a learning system. Capsule Networks have been introduced recently and their tolerance to biased data has received little attention. This paper aims to fill this gap and proposes two experimental scenarios to assess the tolerance to imbalanced training data and to determine the generalization performance of a model with unfamiliar affine transformations of the images. This paper assesses dynamic routing and EM routing based Capsule Networks and proposes a comparison with Convolutional Neural Networks in the two tested scenarios. The presented results provide new insights into the behaviour of capsule networks
Unsupervised Automatic Detection Of Transient Phenomena In InSAR Time-Series using Machine Learning
The detection and measurement of transient episodes of crustal deformation from global InSAR datasets are crucial for a wide range of solid earth and natural hazard applications. But the large volumes of unlabelled data captured by satellites preclude manual systematic analysis, and the small signal-to-noise ratio makes the task difficult. In this thesis, I present a state-of-the-art, unsupervised and event-agnostic deep-learning based approach for the automatic identification of transient deformation events in noisy time-series of unwrapped InSAR images. I adopt an anomaly detection framework that learns the ânormalâ spatio-temporal pattern of noise in the data, and which therefore identifies any transient deformation phenomena that deviate from this pattern as âanomaliesâ. The deep-learning model is built around a bespoke autoencoder that includes convolutional and LSTM layers, as well as a neural network which acts as a bridge between the encoder and decoder. I train our model on real InSAR data from northern Turkey and find it has an overall accuracy and true positive rate of around 85% when trying to detect synthetic deformation signals of length-scale > 350 m and magnitude > 4 cm. Furthermore, I also show the method can detect (1) a real Mw 5.7 earthquake in InSAR data from an entirely different region- SW Turkey, (2) a volcanic deformation in Domuyo, Argentina, (3) a synthetic slow-slip event and (4) an interseismic deformation around NAF in a descending frame in northern Turkey. Overall I show that my method is suitable for automated analysis of large, global InSAR datasets, and for robust detection and separation of deformation signals from nuisance signals in InSAR data
The Robustness of Deep Networks - A geometric perspective
Deep neural networks have recently shown impressive classification performance on a diverse set of visual tasks. When deployed in real-world (noise-prone) environments, it is equally important that these classifiers satisfy robustness guarantees: small perturbations applied to the samples should not yield significant losses to the performance of the predictor. The goal of this paper is to discuss the robustness of deep networks to a diverse set of perturbations that may affect the samples in practice, including adversarial perturbations, random noise, and geometric transformations. Our paper further discusses the recent works that build on the robustness analysis to provide geometric insights on the classifierâs decision surface, which help in developing a better understanding of deep nets. The overview finally presents recent solutions that attempt to increase the robustness of deep networks. We hope that this review paper will contribute shedding light on the open research challenges in the robustness of deep networks, and will stir interest in the analysis of their fundamental properties
Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Peer reviewedPublisher PD
Sample efficiency, transfer learning and interpretability for deep reinforcement learning
Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces
Natural image processing and synthesis using deep learning
Nous Ă©tudions dans cette thĂšse comment les rĂ©seaux de neurones profonds peuvent ĂȘtre utilisĂ©s dans diffĂ©rents domaines de la vision artificielle. La vision artificielle est un domaine interdisciplinaire qui traite de la comprĂ©hension dâimages et de vidĂ©os numĂ©riques. Les problĂšmes de ce domaine ont traditionnellement Ă©tĂ© adressĂ©s avec des mĂ©thodes ad-hoc nĂ©cessitant beaucoup de rĂ©glages manuels. En effet, ces systĂšmes de vision artificiels comprenaient jusquâĂ rĂ©cemment une sĂ©rie de modules optimisĂ©s indĂ©pendamment. Cette approche est trĂšs raisonnable dans la mesure oĂč, avec peu de donnĂ©es, elle bĂ©nĂ©ficient autant que possible des connaissances du chercheur. Mais cette avantage peut se rĂ©vĂ©ler ĂȘtre une limitation si certaines donnĂ©es dâentrĂ© nâont pas Ă©tĂ© considĂ©rĂ©es dans la conception de lâalgorithme.
Avec des volumes et une diversitĂ© de donnĂ©es toujours plus grands, ainsi que des capacitĂ©s de calcul plus rapides et Ă©conomiques, les rĂ©seaux de neurones profonds optimisĂ©s dâun bout Ă lâautre sont devenus une alternative attrayante. Nous dĂ©montrons leur avantage avec une sĂ©rie dâarticles de recherche, chacun dâentre eux trouvant une solution Ă base de rĂ©seaux de neurones profonds Ă un problĂšme dâanalyse ou de synthĂšse visuelle particulier.
Dans le premier article, nous considĂ©rons un problĂšme de vision classique: la dĂ©tection de bords et de contours. Nous partons de lâapproche classique et la rendons plus âneuraleâ en combinant deux Ă©tapes, la dĂ©tection et la description de motifs visuels, en un seul rĂ©seau convolutionnel. Cette mĂ©thode, qui peut ainsi sâadapter Ă de nouveaux ensembles de donnĂ©es, sâavĂšre ĂȘtre au moins aussi prĂ©cis que les mĂ©thodes conventionnelles quand il sâagit de domaines qui leur sont favorables, tout en Ă©tant beaucoup plus robuste dans des domaines plus gĂ©nĂ©rales.
Dans le deuxiĂšme article, nous construisons une nouvelle architecture pour la manipulation dâimages qui utilise lâidĂ©e que la majoritĂ© des pixels produits peuvent dâĂȘtre copiĂ©s de lâimage dâentrĂ©e. Cette technique bĂ©nĂ©ficie de plusieurs avantages majeurs par rapport Ă lâapproche conventionnelle en apprentissage profond. En effet, elle conserve les dĂ©tails de lâimage dâorigine, nâintroduit pas dâaberrations grĂące Ă la capacitĂ© limitĂ©e du rĂ©seau sous-jacent et simplifie lâapprentissage. Nous dĂ©montrons lâefficacitĂ© de cette architecture dans le cadre dâune tĂąche de correction du regard, oĂč notre systĂšme produit dâexcellents rĂ©sultats.
Dans le troisiĂšme article, nous nous Ă©clipsons de la vision artificielle pour Ă©tudier le problĂšme plus gĂ©nĂ©rale de lâadaptation Ă de nouveaux domaines. Nous dĂ©veloppons un nouvel algorithme dâapprentissage, qui assure lâadaptation avec un objectif auxiliaire Ă la tĂąche principale. Nous cherchons ainsi Ă extraire des motifs qui permettent dâaccomplir la tĂąche mais qui ne permettent pas Ă un rĂ©seau dĂ©diĂ© de reconnaĂźtre le domaine. Ce rĂ©seau est optimisĂ© de maniĂšre simultanĂ© avec les motifs en question, et a pour tĂąche de reconnaĂźtre le domaine de provenance des motifs. Cette technique est simple Ă implĂ©menter, et conduit pourtant Ă lâĂ©tat de lâart sur toutes les tĂąches de rĂ©fĂ©rence.
Enfin, le quatriĂšme article prĂ©sente un nouveau type de modĂšle gĂ©nĂ©ratif dâimages. Ă lâopposĂ© des approches conventionnels Ă base de rĂ©seaux de neurones convolutionnels, notre systĂšme baptisĂ© SPIRAL dĂ©crit les images en termes de programmes bas-niveau qui sont exĂ©cutĂ©s par un logiciel de graphisme ordinaire. Entre autres, ceci permet Ă lâalgorithme de ne pas sâattarder sur les dĂ©tails de lâimage, et de se concentrer plutĂŽt sur sa structure globale. Lâespace latent de notre modĂšle est, par construction, interprĂ©table et permet de manipuler des images de façon prĂ©visible. Nous montrons la capacitĂ© et lâagilitĂ© de cette approche sur plusieurs bases de donnĂ©es de rĂ©fĂ©rence.In the present thesis, we study how deep neural networks can be applied to various tasks in computer vision. Computer vision is an interdisciplinary field that deals with understanding of digital images and video. Traditionally, the problems arising in this domain were tackled using heavily hand-engineered adhoc methods. A typical computer vision system up until recently consisted of a sequence of independent modules which barely talked to each other. Such an approach is quite reasonable in the case of limited data as it takes major advantage of the researcher's domain expertise. This strength turns into a weakness if some of the input scenarios are overlooked in the algorithm design process.
With the rapidly increasing volumes and varieties of data and the advent of cheaper and faster computational resources end-to-end deep neural networks have become an appealing alternative to the traditional computer vision pipelines. We demonstrate this in a series of research articles, each of which considers a particular task of either image analysis or synthesis and presenting a solution based on a ``deep'' backbone.
In the first article, we deal with a classic low-level vision problem of edge detection. Inspired by a top-performing non-neural approach, we take a step towards building an end-to-end system by combining feature extraction and description in a single convolutional network. The resulting fully data-driven method matches or surpasses the detection quality of the existing conventional approaches in the settings for which they were designed while being significantly more usable in the out-of-domain situations.
In our second article, we introduce a custom architecture for image manipulation based on the idea that most of the pixels in the output image can be directly copied from the input. This technique bears several significant advantages over the naive black-box neural approach. It retains the level of detail of the original images, does not introduce artifacts due to insufficient capacity of the underlying neural network and simplifies training process, to name a few. We demonstrate the efficiency of the proposed architecture on the challenging gaze correction task where our system achieves excellent results.
In the third article, we slightly diverge from pure computer vision and study a more general problem of domain adaption. There, we introduce a novel training-time algorithm (\ie, adaptation is attained by using an auxilliary objective in addition to the main one). We seek to extract features that maximally confuse a dedicated network called domain classifier while being useful for the task at hand. The domain classifier is learned simultaneosly with the features and attempts to tell whether those features are coming from the source or the target domain. The proposed technique is easy to implement, yet results in superior performance in all the standard benchmarks.
Finally, the fourth article presents a new kind of generative model for image data. Unlike conventional neural network based approaches our system dubbed SPIRAL describes images in terms of concise low-level programs executed by off-the-shelf rendering software used by humans to create visual content. Among other things, this allows SPIRAL not to waste its capacity on minutae of datasets and focus more on the global structure. The latent space of our model is easily interpretable by design and provides means for predictable image manipulation. We test our approach on several popular datasets and demonstrate its power and flexibility
- âŠ