23 research outputs found
Learning visual representations with neural networks for video captioning and image generation
La recherche sur les reĢseaux de neurones a permis de reĢaliser de larges progreĢs durant la dernieĢre deĢcennie. Non seulement les reĢseaux de neurones ont eĢteĢ appliqueĢs avec succeĢs pour reĢsoudre des probleĢmes de plus en plus complexes; mais ils sont aussi devenus lāapproche dominante dans les domaines ouĢ ils ont eĢteĢ testeĢs tels que la compreĢhension du langage, les agents jouant aĢ des jeux de manieĢre automatique ou encore la vision par ordinateur, graĢce aĢ leurs capaciteĢs calculatoires et leurs efficaciteĢs statistiques.
La preĢsente theĢse eĢtudie les reĢseaux de neurones appliqueĢs aĢ des probleĢmes en vision par ordinateur, ouĢ les repreĢsentations seĢmantiques abstraites jouent un roĢle fondamental. Nous deĢmontrerons, aĢ la fois par la theĢorie et par lāexpeĢrimentation, la capaciteĢ des reĢseaux de neurones aĢ apprendre de telles repreĢsentations aĢ partir de donneĢes, avec ou sans supervision.
Le contenu de la theĢse est diviseĢ en deux parties. La premieĢre partie eĢtudie les reĢseaux de neurones appliqueĢs aĢ la description de videĢo en langage naturel, neĢcessitant lāapprentissage de repreĢsentation visuelle. Le premier modeĢle proposeĢ permet dāavoir une attention dynamique sur les diffeĢrentes trames de la videĢo lors de la geĢneĢration de la description textuelle pour de courtes videĢos. Ce modeĢle est ensuite ameĢlioreĢ par lāintroduction dāune opeĢration de convolution reĢcurrente. Par la suite, la dernieĢre section de cette partie identifie un probleĢme fondamental dans la description de videĢo en langage naturel et propose un nouveau type de meĢtrique dāeĢvaluation qui peut eĢtre utiliseĢ empiriquement comme un oracle afin dāanalyser les performances de modeĢles concernant cette taĢche.
La deuxieĢme partie se concentre sur lāapprentissage non-superviseĢ et eĢtudie une famille de modeĢles capables de geĢneĢrer des images. En particulier, lāaccent est mis sur les āNeural Autoregressive Density Estimators (NADEs), une famille de modeĢles probabilistes pour les images naturelles. Ce travail met tout dāabord en eĢvidence une connection entre les modeĢles NADEs et les reĢseaux stochastiques geĢneĢratifs (GSN). De plus, une ameĢlioration des modeĢles NADEs standards est proposeĢe. DeĢnommeĢs NADEs iteĢratifs, cette ameĢlioration introduit plusieurs iteĢrations lors de lāinfeĢrence du modeĢle NADEs tout en preĢservant son nombre de parameĢtres.
DeĢbutant par une revue chronologique, ce travail se termine par un reĢsumeĢ des reĢcents deĢveloppements en lien avec les contributions preĢsenteĢes dans les deux parties principales, concernant les probleĢmes dāapprentissage de repreĢsentation seĢmantiques pour les images et les videĢos. De prometteuses directions de recherche sont envisageĢes.The past decade has been marked as a golden era of neural network research. Not only have neural networks been successfully applied to solve more and more challenging real- world problems, but also they have become the dominant approach in many of the places where they have been tested. These places include, for instance, language understanding, game playing, and computer vision, thanks to neural networksā superiority in computational efficiency and statistical capacity. This thesis applies neural networks to problems in computer vision where high-level and semantically meaningful representations play a fundamental role. It demonstrates both in theory and in experiment the ability to learn such representations from data with and without supervision. The main content of the thesis is divided into two parts. The first part studies neural networks in the context of learning visual representations for the task of video captioning. Models are developed to dynamically focus on different frames while generating a natural language description of a short video. Such a model is further improved by recurrent convolutional operations. The end of this part identifies fundamental challenges in video captioning and proposes a new type of evaluation metric that may be used experimentally as an oracle to benchmark performance. The second part studies the family of models that generate images. While the first part is supervised, this part is unsupervised. The focus of it is the popular family of Neural Autoregressive Density Estimators (NADEs), a tractable probabilistic model for natural images. This work first makes a connection between NADEs and Generative Stochastic Networks (GSNs). The standard NADE is improved by introducing multiple iterations in its inference without increasing the number of parameters, which is dubbed iterative NADE. With a historical view at the beginning, this work ends with a summary of recent development for work discussed in the first two parts around the central topic of learning visual representations for images and videos. A bright future is envisioned at the end
Harnessing function from form: towards bio-inspired artificial intelligence in neuronal substrates
Despite the recent success of deep learning, the mammalian brain is still unrivaled when it comes
to interpreting complex, high-dimensional data streams like visual, auditory and somatosensory stimuli.
However, the underlying computational principles allowing the brain to deal with unreliable, high-dimensional
and often incomplete data while having a power consumption on the order of a few watt are still mostly
unknown.
In this work, we investigate how specific functionalities emerge from simple structures observed in the
mammalian cortex, and how these might be utilized in non-von Neumann devices like āneuromorphic
hardwareā. Firstly, we show that an ensemble of deterministic, spiking neural networks can be shaped by
a simple, local learning rule to perform sampling-based Bayesian inference. This suggests a coding scheme
where spikes (or āaction potentialsā) represent samples of a posterior distribution, constrained by sensory
input, without the need for any source of stochasticity. Secondly, we introduce a top-down framework where
neuronal and synaptic dynamics are derived using a least action principle and gradient-based minimization.
Combined, neurosynaptic dynamics approximate real-time error backpropagation, mappable to mechanistic
components of cortical networks, whose dynamics can again be described within the proposed framework.
The presented models narrow the gap between well-defined, functional algorithms and their biophysical
implementation, improving our understanding of the computational principles the brain might employ.
Furthermore, such models are naturally translated to hardware mimicking the vastly parallel neural
structure of the brain, promising a strongly accelerated and energy-efficient implementation of powerful
learning and inference algorithms, which we demonstrate for the physical model system āBrainScaleSā1ā
Tensor Regression
Regression analysis is a key area of interest in the field of data analysis
and machine learning which is devoted to exploring the dependencies between
variables, often using vectors. The emergence of high dimensional data in
technologies such as neuroimaging, computer vision, climatology and social
networks, has brought challenges to traditional data representation methods.
Tensors, as high dimensional extensions of vectors, are considered as natural
representations of high dimensional data. In this book, the authors provide a
systematic study and analysis of tensor-based regression models and their
applications in recent years. It groups and illustrates the existing
tensor-based regression methods and covers the basics, core ideas, and
theoretical characteristics of most tensor-based regression methods. In
addition, readers can learn how to use existing tensor-based regression methods
to solve specific regression tasks with multiway data, what datasets can be
selected, and what software packages are available to start related work as
soon as possible. Tensor Regression is the first thorough overview of the
fundamentals, motivations, popular algorithms, strategies for efficient
implementation, related applications, available datasets, and software
resources for tensor-based regression analysis. It is essential reading for all
students, researchers and practitioners of working on high dimensional data.Comment: 187 pages, 32 figures, 10 table
Temporal Segmentation of Human Motion for Rehabilitation
Current physiotherapy practice relies on visual observation of patient movement for assessment and diagnosis. Automation of motion monitoring has the potential to improve accuracy and reliability, and provide additional diagnostic insight to the clinician, improving treatment quality, and patient progress. To enable automated monitoring, assessment, and diagnosis, the movements of the patient must be temporally segmented from the continuous measurements. Temporal segmentation is the process of identifying the starting and ending locations of movement primitives in a time-series data sequence. Most segmentation algorithms require training data, but a priori knowledge of the patient's movement patterns may not be available, necessitating the use of healthy population data for training. However, healthy population movement data may not generalize well to rehabilitation patients due to large differences in motion characteristics between the two demographics. In this thesis, four key contributions will be elaborated to enable accurate segmentation of patient movement data during rehabilitation.
The first key contribution is the creation of a segmentation framework to categorize and compare different segmentation algorithms considering segment definitions, data sources, application specific requirements, algorithm mechanics, and validation techniques. This framework provides a structure for considering the factors that must be incorporated when constructing a segmentation and identification algorithm. The framework enables systematic comparison of different segmentation algorithms, provides the means to examine the impact of each algorithm component, and allows for a systematic approach to determine the best algorithm for a given situation.
The second key contribution is the development of an online and accurate motion segmentation algorithm based on a classification framework. The proposed algorithm transforms the segmentation task into a classification problem by modelling the segment edge point directly. Given this formulation, a variety of feature transformation, dimensionality reduction and classifier techniques were investigated on several healthy and patient datasets. With proper normalization, the segmentation algorithm can be trained using healthy participant data and obtain high quality segments on patient data. Inter-participant and inter-primitive variability were assessed on a dataset of 30 healthy participants and 44 rehabilitation participants, demonstrating the generalizability and utility of the proposed approach for rehabilitation settings. The proposed approach achieves a segmentation accuracy of 83-100%.
The third key contribution is the investigation of feature set generalizability of the proposed method. Nearly all segmentation techniques developed previously use a single sensor modality. The proposed method was applied to joint angles, electromyogram, motion capture, and force plate data to investigate how the choice of modality impacts segmentation performance. With proper normalization, the proposed method was shown to work with various input sensor types and achieved high accuracy on all sensor modalities examined. The proposed approach achieves a segmentation accuracy of 72-97%.
The fourth key contribution is the development of a new feature set based on hypotheses about the optimality of human motion trajectory generation. A common hypothesis in human motor control is that human movement is generated by optimizing with respect to a certain criterion and is task dependent. In this thesis, a method to segment human movement by detecting changes to the optimization criterion being used via inverse trajectory optimization is proposed. The control strategy employed by the motor system is hypothesized to be a weighted sum of basis cost functions, with the basis weights changing with changes to the motion objective(s). Continuous time series data of movement is processed using a sliding fixed width window, estimating the basis weights of each cost function for each window by minimizing the Karush-Kuhn-Tucker optimality conditions. The quality of the cost function recovery is verified by evaluating the residual. The successfully estimated basis weights are averaged together to create a set of time varying basis weights that describe the changing control strategy of the motion and can be used to segment the movement with simple thresholds. The proposed algorithm is first demonstrated on simulation data and then demonstrated on a dataset of human subjects performing a series of exercise tasks. The proposed approach achieves a segmentation accuracy of 74-88%