573 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Learning visual representations with neural networks for video captioning and image generation
La recherche sur les reĢseaux de neurones a permis de reĢaliser de larges progreĢs durant la dernieĢre deĢcennie. Non seulement les reĢseaux de neurones ont eĢteĢ appliqueĢs avec succeĢs pour reĢsoudre des probleĢmes de plus en plus complexes; mais ils sont aussi devenus lāapproche dominante dans les domaines ouĢ ils ont eĢteĢ testeĢs tels que la compreĢhension du langage, les agents jouant aĢ des jeux de manieĢre automatique ou encore la vision par ordinateur, graĢce aĢ leurs capaciteĢs calculatoires et leurs efficaciteĢs statistiques.
La preĢsente theĢse eĢtudie les reĢseaux de neurones appliqueĢs aĢ des probleĢmes en vision par ordinateur, ouĢ les repreĢsentations seĢmantiques abstraites jouent un roĢle fondamental. Nous deĢmontrerons, aĢ la fois par la theĢorie et par lāexpeĢrimentation, la capaciteĢ des reĢseaux de neurones aĢ apprendre de telles repreĢsentations aĢ partir de donneĢes, avec ou sans supervision.
Le contenu de la theĢse est diviseĢ en deux parties. La premieĢre partie eĢtudie les reĢseaux de neurones appliqueĢs aĢ la description de videĢo en langage naturel, neĢcessitant lāapprentissage de repreĢsentation visuelle. Le premier modeĢle proposeĢ permet dāavoir une attention dynamique sur les diffeĢrentes trames de la videĢo lors de la geĢneĢration de la description textuelle pour de courtes videĢos. Ce modeĢle est ensuite ameĢlioreĢ par lāintroduction dāune opeĢration de convolution reĢcurrente. Par la suite, la dernieĢre section de cette partie identifie un probleĢme fondamental dans la description de videĢo en langage naturel et propose un nouveau type de meĢtrique dāeĢvaluation qui peut eĢtre utiliseĢ empiriquement comme un oracle afin dāanalyser les performances de modeĢles concernant cette taĢche.
La deuxieĢme partie se concentre sur lāapprentissage non-superviseĢ et eĢtudie une famille de modeĢles capables de geĢneĢrer des images. En particulier, lāaccent est mis sur les āNeural Autoregressive Density Estimators (NADEs), une famille de modeĢles probabilistes pour les images naturelles. Ce travail met tout dāabord en eĢvidence une connection entre les modeĢles NADEs et les reĢseaux stochastiques geĢneĢratifs (GSN). De plus, une ameĢlioration des modeĢles NADEs standards est proposeĢe. DeĢnommeĢs NADEs iteĢratifs, cette ameĢlioration introduit plusieurs iteĢrations lors de lāinfeĢrence du modeĢle NADEs tout en preĢservant son nombre de parameĢtres.
DeĢbutant par une revue chronologique, ce travail se termine par un reĢsumeĢ des reĢcents deĢveloppements en lien avec les contributions preĢsenteĢes dans les deux parties principales, concernant les probleĢmes dāapprentissage de repreĢsentation seĢmantiques pour les images et les videĢos. De prometteuses directions de recherche sont envisageĢes.The past decade has been marked as a golden era of neural network research. Not only have neural networks been successfully applied to solve more and more challenging real- world problems, but also they have become the dominant approach in many of the places where they have been tested. These places include, for instance, language understanding, game playing, and computer vision, thanks to neural networksā superiority in computational efficiency and statistical capacity. This thesis applies neural networks to problems in computer vision where high-level and semantically meaningful representations play a fundamental role. It demonstrates both in theory and in experiment the ability to learn such representations from data with and without supervision. The main content of the thesis is divided into two parts. The first part studies neural networks in the context of learning visual representations for the task of video captioning. Models are developed to dynamically focus on different frames while generating a natural language description of a short video. Such a model is further improved by recurrent convolutional operations. The end of this part identifies fundamental challenges in video captioning and proposes a new type of evaluation metric that may be used experimentally as an oracle to benchmark performance. The second part studies the family of models that generate images. While the first part is supervised, this part is unsupervised. The focus of it is the popular family of Neural Autoregressive Density Estimators (NADEs), a tractable probabilistic model for natural images. This work first makes a connection between NADEs and Generative Stochastic Networks (GSNs). The standard NADE is improved by introducing multiple iterations in its inference without increasing the number of parameters, which is dubbed iterative NADE. With a historical view at the beginning, this work ends with a summary of recent development for work discussed in the first two parts around the central topic of learning visual representations for images and videos. A bright future is envisioned at the end
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
Manifold Learning Approaches to Compressing Latent Spaces of Unsupervised Feature Hierarchies
Field robots encounter dynamic unstructured environments containing a vast array of unique objects. In order to make sense of the world in which they are placed, they collect large quantities of unlabelled data with a variety of sensors. Producing robust and reliable applications depends entirely on the ability of the robot to understand the unlabelled data it obtains. Deep Learning techniques have had a high level of success in learning powerful unsupervised representations for a variety of discriminative and generative models. Applying these techniques to problems encountered in field robotics remains a challenging endeavour. Modern Deep Learning methods are typically trained with a substantial labelled dataset, while datasets produced in a field robotics context contain limited labelled training data. The primary motivation for this thesis stems from the problem of applying large scale Deep Learning models to field robotics datasets that are label poor. While the lack of labelled ground truth data drives the desire for unsupervised methods, the need for improving the model scaling is driven by two factors, performance and computational requirements. When utilising unsupervised layer outputs as representations for classification, the classification performance increases with layer size. Scaling up models with multiple large layers of features is problematic, as the sizes of subsequent hidden layers scales with the size of the previous layer. This quadratic scaling, and the associated time required to train such networks has prevented adoption of large Deep Learning models beyond cluster computing. The contributions in this thesis are developed from the observation that parameters or filter el- ements learnt in Deep Learning systems are typically highly structured, and contain related ele- ments. Firstly, the structure of unsupervised filters is utilised to construct a mapping from the high dimensional filter space to a low dimensional manifold. This creates a significantly smaller repre- sentation for subsequent feature learning. This mapping, and its effect on the resulting encodings, highlights the need for the ability to learn highly overcomplete sets of convolutional features. Driven by this need, the unsupervised pretraining of Deep Convolutional Networks is developed to include a number of modern training and regularisation methods. These pretrained models are then used to provide initialisations for supervised convolutional models trained on low quantities of labelled data. By utilising pretraining, a significant increase in classification performance on a number of publicly available datasets is achieved. In order to apply these techniques to outdoor 3D Laser Illuminated Detection And Ranging data, we develop a set of resampling techniques to provide uniform input to Deep Learning models. The features learnt in these systems outperform the high effort hand engineered features developed specifically for 3D data. The representation of a given signal is then reinterpreted as a combination of modes that exist on the learnt low dimensional filter manifold. From this, we develop an encoding technique that allows the high dimensional layer output to be represented as a combination of low dimensional components. This allows the growth of subsequent layers to only be dependent on the intrinsic dimensionality of the filter manifold and not the number of elements contained in the previous layer. Finally, the resulting unsupervised convolutional model, the encoding frameworks and the em- bedding methodology are used to produce a new unsupervised learning stratergy that is able to encode images in terms of overcomplete filter spaces, without producing an explosion in the size of the intermediate parameter spaces. This model produces classification results on par with state of the art models, yet requires significantly less computational resources and is suitable for use in the constrained computation environment of a field robot
Adaptive Edge-guided Block-matching and 3D filtering (BM3D) Image Denoising Algorithm
Image denoising is a well studied field, yet reducing noise from images is still a valid challenge. Recently proposed Block-matching and 3D filtering (BM3D) is the current state of the art algorithm for denoising images corrupted by Additive White Gaussian noise (AWGN). Though BM3D outperforms all existing methods for AWGN denoising, still its performance decreases as the noise level increases in images, since it is harder to find proper match for reference blocks in the presence of highly corrupted pixel values. It also blurs sharp edges and textures. To overcome these problems we proposed an edge guided BM3D with selective pixel restoration. For higher noise levels it is possible to detect noisy pixels form its neighborhoods gray level statistics. We exploited this property to reduce noise as much as possible by applying a pre-filter. We also introduced an edge guided pixel restoration process in the hard-thresholding step of BM3D to restore the sharpness of edges and textures. Experimental results confirm that our proposed method is competitive and outperforms the state of the art BM3D in all considered subjective and objective quality measurements, particularly in preserving edges, textures and image contrast
- ā¦