1,473 research outputs found
Unsupervised Behaviour Analysis and Magnification (uBAM) using Deep Learning
Motor behaviour analysis is essential to biomedical research and clinical
diagnostics as it provides a non-invasive strategy for identifying motor
impairment and its change caused by interventions. State-of-the-art
instrumented movement analysis is time- and cost-intensive, since it requires
placing physical or virtual markers. Besides the effort required for marking
keypoints or annotations necessary for training or finetuning a detector, users
need to know the interesting behaviour beforehand to provide meaningful
keypoints. We introduce unsupervised behaviour analysis and magnification
(uBAM), an automatic deep learning algorithm for analysing behaviour by
discovering and magnifying deviations. A central aspect is unsupervised
learning of posture and behaviour representations to enable an objective
comparison of movement. Besides discovering and quantifying deviations in
behaviour, we also propose a generative model for visually magnifying subtle
behaviour differences directly in a video without requiring a detour via
keypoints or annotations. Essential for this magnification of deviations even
across different individuals is a disentangling of appearance and behaviour.
Evaluations on rodents and human patients with neurological diseases
demonstrate the wide applicability of our approach. Moreover, combining
optogenetic stimulation with our unsupervised behaviour analysis shows its
suitability as a non-invasive diagnostic tool correlating function to brain
plasticity.Comment: Published in Nature Machine Intelligence (2021),
https://rdcu.be/ch6p
A Methodology for Extracting Human Bodies from Still Images
Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them.
One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach
Visual Representation Learning with Minimal Supervision
Computer vision intends to provide the human abilities of understanding and interpreting the visual surroundings to computers. An essential element to comprehend
the environment is to extract relevant information from complex visual data so that the desired task can be solved. For instance, to distinguish cats from dogs the feature 'body shape' is more relevant than 'eye color' or the 'amount
of legs'. In traditional computer vision it is conventional to develop handcrafted functions that extract specific low-level features such as edges from visual data. However, in order to solve a particular task satisfactorily we require a combination of several features. Thus, the approach of traditional computer vision has the disadvantage that whenever a new task is addressed, a developer needs to manually
specify all the features the computer should look for. For that reason, recent works have primarily focused on developing new algorithms that teach the computer
to autonomously detect relevant and task-specific features. Deep learning has been particularly successful for that matter. In deep learning, artificial neural networks automatically learn to extract informative features directly from visual data. The majority of developed deep learning strategies require a dataset with annotations which indicate the solution of the desired task. The main bottleneck is that creating such a dataset is very tedious and time-intensive considering that every sample needs to be annotated manually. This thesis presents new techniques that attempt to keep the amount of human supervision to a minimum while still
reaching satisfactory performances on various visual understanding tasks. In particular, this thesis focuses on self-supervised learning algorithms that train a neural network on a surrogate task where no human supervision is required. We create an artificial supervisory signal by breaking the order of visual patterns and asking the network to recover the original structure. Besides demonstrating the abilities of our model on common computer vision tasks such
as action recognition, we additionally apply our model to biomedical scenarios. Many research projects in medicine involve profuse manual processes that extend the duration of developing successful treatments. Taking the example of analyzing the motor function of neurologically impaired patients we show that our self-supervised method can help to automate tedious, visually based processes in medical research. In order to perform a detailed analysis of motor behavior and, thus, provide a suitable treatment, it is important to discover and identify the negatively affected movements. Therefore, we propose a magnification tool that
can detect and enhance subtle changes in motor function including motor behavior differences across individuals. In this way, our automatic diagnostic system does not only analyze apparent behavior but also facilitates the perception and discovery of impaired movements. Learning a feature representation without requiring annotations significantly
reduces human supervision. However, using annotated dataset leads generally to better performances in contrast to self-supervised learning methods. Hence, we
additionally examine semi-supervised approaches which efficiently combine few annotated samples with large unlabeled datasets. Consequently, semi-supervised learning represents a good trade-off between annotation time and accuracy
Radar and RGB-depth sensors for fall detection: a review
This paper reviews recent works in the literature on the use of systems based on radar and RGB-Depth (RGB-D) sensors for fall detection, and discusses outstanding research challenges and trends related to this research field. Systems to detect reliably fall events and promptly alert carers and first responders have gained significant interest in the past few years in order to address the societal issue of an increasing number of elderly people living alone, with the associated risk of them falling and the consequences in terms of health treatments, reduced well-being, and costs. The interest in radar and RGB-D sensors is related to their capability to enable contactless and non-intrusive monitoring, which is an advantage for practical deployment and users’ acceptance and compliance, compared with other sensor technologies, such as video-cameras, or wearables. Furthermore, the possibility of combining and fusing information from The heterogeneous types of sensors is expected to improve the overall performance of practical fall detection systems. Researchers from different fields can benefit from multidisciplinary knowledge and awareness of the latest developments in radar and RGB-D sensors that this paper is discussing
Biceps brachii synergy and its contribution to target reaching tasks within a virtual cube
Ces dernières années, des travaux importants ont été observés dans le développement du contrôle prothétique afin d'aider les personnes amputées du membre supérieur à améliorer leur qualité de vie au quotidien. Certaines prothèses myoélectriques modernes des membres supérieurs disponibles dans le commerce ont de nombreux degrés de liberté et nécessitent de nombreux signaux de contrôle pour réaliser plusieurs tâches fréquemment utilisées dans la vie quotidienne. Pour obtenir plusieurs signaux de contrôle, de nombreux muscles sont requis mais pour les personnes ayant subi une amputation du membre supérieur, le nombre de muscles disponibles est plus ou moins réduit selon le niveau de l’amputation. Pour accroître le nombre de signaux de contrôle, nous nous sommes intéressés au biceps brachial, vu qu’anatomiquement il est formé de 2 chefs et que de la présence de compartiments a été observée sur sa face interne. Physiologiquement, il a été trouvé que les unités motrices du biceps sont activées à différents endroits du muscle lors de la production de diverses tâches fonctionnelles. De plus, il semblerait que le système nerveux central puisse se servir de la synergie musculaire pour arriver à facilement produire plusieurs mouvements. Dans un premier temps on a donc identifié que la synergie musculaire était présente chez le biceps de sujets normaux et on a montré que les caractéristiques de cette synergie permettaient d’identifier la posture statique de la main lorsque les signaux du biceps avaient été enregistrés. Dans un deuxième temps, on a réussi à démontrer qu’il était possible, dans un cube présenté sur écran, à contrôler la position d’une sphère en vue d’atteindre diverses cibles en utilisant la synergie musculaire du biceps. Les techniques de classification utilisées pourraient servir à faciliter le contrôle des prothèses myoélectriques.In recent years, important work has been done in the development of prosthetic control to help upper limb amputees improve their quality of life on a daily basis. Some modern commercially available upper limb myoelectric prostheses have many degrees of freedom and require many control signals to perform several tasks commonly used in everyday life. To obtain several control signals, many muscles are required, but for people with upper limb amputation, the number of muscles available is more or less reduced, depending on the level of amputation. To increase the number of control signals, we were interested in the biceps brachii, since it is anatomically composed of 2 heads and the presence of compartments was observed on its internal face. Physiologically, it has been found that the motor units of the biceps are activated at different places of the muscle during production of various functional tasks. In addition, it appears that the central nervous system can use muscle synergy to easily produce multiple movements. In this research, muscle synergy was first identified to be present in the biceps of normal subjects, and it was shown that the characteristics of this synergy allowed the identification of static posture of the hand when the biceps signals had been recorded. In a second investigation, we demonstrated that it was possible in a virtual cube presented on a screen to control online the position of a sphere to reach various targets by using muscle synergy of the biceps. Classification techniques have been used to improve the classification of muscular synergy features, and these classification techniques can be integrated with control algorithm that produces dynamic movement of myoelectric prostheses to facilitate the training of prosthetic control
Visual Similarity Using Limited Supervision
The visual world is a conglomeration of objects, scenes, motion, and much more. As humans, we look at the world through our eyes, but we understand it by using our brains. From a young age, humans learn to recognize objects by association, meaning that we link an object or action to the most similar one in our memory to make sense of it.
Within the field of Artificial Intelligence, Computer Vision gives machines the ability to see. While digital cameras provide eyes to the machine, Computer Vision develops its brain. To that purpose, Deep Learning has emerged as a very successful tool. This method allows machines to learn solutions to problems directly from the data. On the basis of Deep Learning, computers nowadays can also learn to interpret the visual world. However, the process of learning in machines is very different from ours. In Deep Learning, images and videos are grouped into predefined, artificial categories. However, describing a group of objects, or actions, with a single integer (category) disregards most of its characteristics and pair-wise relationships. To circumvent this, we propose to expand the categorical model by using visual similarity which better mirrors the human approach.
Deep Learning requires a large set of manually annotated samples, that form the training set. Retrieving training samples is easy given the endless amount of images and videos available on the internet. However, this also requires manual annotations, which are very costly and laborious to obtain and thus a major bottleneck in modern computer vision.
In this thesis, we investigate visual similarity methods to solve image and video classification. In particular, we search for a solution where human super- vision is marginal. We focus on Zero-Shot Learning (ZSL), where only a subset of categories are manually annotated. After studying existing methods in the field, we identify common limitations and propose methods to tackle them. In particular, ZSL image classification is trained using only discriminative supervi- sion, i.e. predefined categories, while ignoring other descriptive characteristics. To tackle this, we propose a new approach to learn shared features, i.e. non- discriminative, thus descriptive characteristics, which improves existing methods by a large margin. However, while ZSL has shown great potential for the task of image classification, for example in case of face recognition, it has performed poorly for video classification. We identify the reasons for the lack of growth in the field and provide a new, powerful baseline.
Unfortunately, even if ZSL requires only partial labeled data, it still needs supervision during training. For that reason, we also investigate purely unsuper- vised methods. A successful paradigm is self-supervision: the model is trained using a surrogate task where supervision is automatically provided. The key to self-supervision is the ability of deep learning to transfer the knowledge learned from one task to a new task. The more similar the two tasks are, the more effective the transfer is. Similar to our work on ZSL, we also studied the com- mon limitations of existing self-supervision approaches and proposed a method to overcome them. To improve self-supervised learning, we propose a policy network which controls the parameters of the surrogate task and is trained through reinforcement learning.
Finally, we present a real-life application where utilizing visual similarity with limited supervision provides a better solution compared to existing parametric approaches. We analyze the behavior of motor-impaired rodents during a single repeating action for which our method provides an objective similarity of behav- ior, facilitating comparisons across animal subjects and time during recovery
Recommended from our members
Human extremity detection and its applications in action detection and recognition
textIt is proven that locations of internal body joints are sufficient visual cues to characterize human motion. In this dissertation I propose that locations of human extremities including heads, hands and feet provide powerful approximation to internal body motion. I propose detection of precise extremities from contours obtained from image segmentation or contour tracking. Junctions of medial axis of contours are selected as stars. Contour points with a local maximum distance to various stars are chosen as candidate extremities. All the candidates are filtered by cues including proximity to other candidates, visibility to stars and robustness to noise smoothing parameters. I present my applications of using precise extremities for fast human action detection and recognition. Environment specific features are built from precise extremities and feed into a block based Hidden Markov Model to decode the fence climbing action from continuous videos. Precise extremities are grouped into stable contacts if the same extremity does not move for a certain duration. Such stable contacts are utilized to decompose a long continuous video into shorter pieces. Each piece is associated with certain motion features to form primitive motion units. In this way the sequence is abstracted into more meaningful segments and a searching strategy is used to detect the fence climbing action. Moreover, I propose the histogram of extremities as a general posture descriptor. It is tested in a Hidden Markov Model based framework for action recognition. I further propose detection of probable extremities from raw images without any segmentation. Modeling the extremity as an image patch instead of a single point on the contour helps overcome the segmentation difficulty and increase the detection robustness. I represent the extremity patches with Histograms of Oriented Gradients. The detection is achieved by window based image scanning. In order to reduce computation load, I adopt the integral histograms technique without sacrificing accuracy. The result is a probability map where each pixel denotes probability of the patch forming the specific class of extremities. With a probable extremity map, I propose the histogram of probable extremities as another general posture descriptor. It is tested on several data sets and the results are compared with that of precise extremities to show the superiority of probable extremities.Electrical and Computer Engineerin
- …