1,404 research outputs found

    Multi-Label/Multi-Class Deep Learning Classification of Spatiotemporal Data

    Get PDF
    Human senses allow for the detection of simultaneous changes in our environments. An unobstructed field of view allows us to notice concurrent variations in different parts of what we are looking at. For example, when playing a video game, a player, oftentimes, needs to be aware of what is happening in the entire scene. Likewise, our hearing makes us aware of various simultaneous sounds occurring around us. Human perception can be affected by the cognitive ability of the brain and acuity of the senses. This is not a factor with machines. As long as a system is given a signal and instructed how to analyze this signal and extract useful information, it will be able to complete this task repeatedly with enough processing power. Automated and simultaneous detection of activity in machine learning requires the use of multi-labels. In order to detect concurrent occurrences spatially, the labels should represent the regions of interest for a particular application. For example, in this thesis, the regions of interest will be either different quadrants of a parking lot as captured on surveillance videos, four auscultation sites on patients\u27 lungs, or the two sides of the brain\u27s motor cortex (left and right). Since the labels, within the multi-labels, will be used to represent not only certain spatial locations but also different levels or types of occurrences, a multi-class/multi-level schema is necessary. In the first study, each label is appointed one of three levels of activity within the specific quadrant. In the second study, each label is assigned one of four different types of respiratory sounds. In the third study, each label is designated one of three different finger tapping frequencies. This novel multi-labeling/multi-class schema is one part of being able to detect useful information in the data. The other part of the process lies in the machine learning algorithm, the network model. In order to be able to capture the spatiotemporal characteristics of the data, selecting Convolutional Neural Network and Long Short Term Memory Network-based algorithms as the basis of the network is fitting. The following classifications are described in this thesis: 1. In the first study, one of three different motion densities are identified simultaneously in four quadrants of two sets of surveillance videos. Publicly available video recordings are the spatiotemporal data. 2. In the second study, one of four types of breathing sounds are classified simultaneously in four auscultation sites. The spatiotemporal data are publicly available respiratory sound recordings. 3. In the third study, one of three finger tapping rates are detected simultaneously in two regions of interest, the right and left sides of the brain\u27s motor cortex. The spatiotemporal data are fNIRS channel readings gathered during an index finger tapping experiment. Classification results are based on testing data which is not part of model training and validation. The success of the results is based on measures of Hamming Loss and Subset Accuracy as well Accuracy, F-Score, Sensitivity, and Specificity metrics. In the last study, model explanation is performed using Shapley Additive Explanation (SHAP) values and plotting them on an image-like background, a representation of the fNIRS channel layout used as data input. Overall, promising findings support the use of this approach in classifying spatiotemporal data with the interest of detecting different levels or types of occurrences simultaneously in several regions of interest

    Multi-scale techniques for multi-dimensional data analysis

    Get PDF
    Large datasets of geometric data of various nature are becoming more and more available as sensors become cheaper and more widely used. Due to both their size and their noisy nature, special techniques must be employed to deal with them correctly. In order to efficiently handle this amount of data and to tackle the technical challenges they pose, we propose techniques that analyze a scalar signal by means of its critical points (i.e. maxima and minima), ranking them on a scale of importance, by which we can extrapolate important information of the input signal separating it from noise, thus dramatically reducing the complexity of the problem. In order to obtain a ranking of critical points we employ multi-scale techniques. The standard scale-space approach, however, is not sufficient when trying to track critical points across various scales. We start from an implementation of the scale-space which computes a linear interpolation between scales in order to make tracking of critical points easier. The linear interpolation of a process which is not itself linear, though, does not fulfill some theoretical properties of scale-space, thus making the tracking of critical points much harder. We propose an extension of this piecewiselinear scale-space implementation, which recovers the theoretical properties (e.g., to avoid the generation of new critical points as the scale increases) and keeps the tracking consistent. Next we combine the scale-space with another technique that comes from the topology theory: the classification of critical points based on their persistence value. While the scale-space applies a filtering in the frequency domain, by progressively smoothing the input signal with low-pass filters of increasing size, the computation of the persistence can be seen as a filtering applied in the amplitude domain, which progressively removes pairs of critical points based on their difference in amplitude. The two techniques, while being both relevant to the concept of scale, express different qualities of the critical points of the input signal; depending on the application domain we can use either of them, or, since they both have non-zero values only at critical points, they can be used together with a linear combination. The thesis will be structured as follows: In Chapter 1 we will present an overview on the problem of analyzing huge geometric datasets, focusing on the problem of dealing with their size and noise, and of reducing the problem to a subset of relevant samples. The Chapter 2 will contain a study of the state of the art in scale-space algorithms, followed by a more in-depth analysis of the virtually continuous framework used as base technique will be presented. In its last part, we will propose methods to extend these techniques in order to satisfy the axioms present in the continuous version of the scale-space and to have a stronger and more reliable tracking of critical points across scales, and the extraction of the persistence of critical points of a signal as a variant to the standard scale-space approach; we will show the differences between the two and discuss how to combine them. The Chapter 3 will introduce an ever growing source of data, the motion capture systems; we will motivate its importance by discussing the many applications in which it has been used for the past two decades. We will briefly summarize the different systems existing and then we will focus on a particular one, discussing its peculiarities and its output data. In Chapter 4, we will discuss the problem of studying intra-personal synchronization computed on data coming from such motion-capture systems. We will show how multi-scale approaches can be used to identify relevant instants in the motion and how these instants can be used to precisely study synchronization between the different parts of the body from which they are extracted. We will apply these techniques to the problem of generating a classifier to discriminate between martial artists of different skills who have been recorded doing karate\u2019s movements. In Chapter 5 will present a work on the automatic detection of relevant points of the human face from 3D data. We will show that the Gaussian curvature of the 3D surface is a good feature to distinguish the so-called fiducial points, but also that multi-scale techniques must be used to extract only relevant points and get rid of the noise. In closing, Chapter 6 will discuss an ongoing work about motion segmentation; after an introduction about the meaning and different possibilities of motion segmentation we will present the data we work with, the approach used to identify segments and some preliminary tools and results

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    Deep Multi Temporal Scale Networks for Human Motion Analysis

    Get PDF
    The movement of human beings appears to respond to a complex motor system that contains signals at different hierarchical levels. For example, an action such as ``grasping a glass on a table'' represents a high-level action, but to perform this task, the body needs several motor inputs that include the activation of different joints of the body (shoulder, arm, hand, fingers, etc.). Each of these different joints/muscles have a different size, responsiveness, and precision with a complex non-linearly stratified temporal dimension where every muscle has its temporal scale. Parts such as the fingers responds much faster to brain input than more voluminous body parts such as the shoulder. The cooperation we have when we perform an action produces smooth, effective, and expressive movement in a complex multiple temporal scale cognitive task. Following this layered structure, the human body can be described as a kinematic tree, consisting of joints connected. Although it is nowadays well known that human movement and its perception are characterised by multiple temporal scales, very few works in the literature are focused on studying this particular property. In this thesis, we will focus on the analysis of human movement using data-driven techniques. In particular, we will focus on the non-verbal aspects of human movement, with an emphasis on full-body movements. The data-driven methods can interpret the information in the data by searching for rules, associations or patterns that can represent the relationships between input (e.g. the human action acquired with sensors) and output (e.g. the type of action performed). Furthermore, these models may represent a new research frontier as they can analyse large masses of data and focus on aspects that even an expert user might miss. The literature on data-driven models proposes two families of methods that can process time series and human movement. The first family, called shallow models, extract features from the time series that can help the learning algorithm find associations in the data. These features are identified and designed by domain experts who can identify the best ones for the problem faced. On the other hand, the second family avoids this phase of extraction by the human expert since the models themselves can identify the best set of features to optimise the learning of the model. In this thesis, we will provide a method that can apply the multi-temporal scales property of the human motion domain to deep learning models, the only data-driven models that can be extended to handle this property. We will ask ourselves two questions: what happens if we apply knowledge about how human movements are performed to deep learning models? Can this knowledge improve current automatic recognition standards? In order to prove the validity of our study, we collected data and tested our hypothesis in specially designed experiments. Results support both the proposal and the need for the use of deep multi-scale models as a tool to better understand human movement and its multiple time-scale nature

    Reconstructing Human Motion

    Get PDF
    This thesis presents methods for reconstructing human motion in a variety of applications and begins with an introduction to the general motion capture hardware and processing pipeline. Then, a data-driven method for the completion of corrupted marker-based motion capture data is presented. The approach is especially suitable for challenging cases, e.g., if complete marker sets of multiple body parts are missing over a long period of time. Using a large motion capture database and without the need for extensive preprocessing the method is able to fix missing markers across different actors and motion styles. The approach can be used for incrementally increasing prior-databases, as the underlying search technique for similar motions scales well to huge databases. The resulting clean motion database could then be used in the next application: a generic data-driven method for recognizing human full body actions from live motion capture data originating from various sources. The method queries an annotated motion capture database for similar motion segments, able to handle temporal deviations from the original motion. The approach is online-capable, works in realtime, requires virtually no preprocessing and is shown to work with a variety of feature sets extracted from input data including positional data, sparse accelerometer signals, skeletons extracted from depth sensors and even video data. Evaluation is done by comparing against a frame-based Support Vector Machine approach on a freely available motion database as well as a database containing Judo referee signal motions. In the last part, a method to indirectly reconstruct the effects of the human heart's pumping motion from video data of the face is applied in the context of epileptic seizures. These episodes usually feature interesting heart rate patterns like a significant increase at seizure start as well as seizure-type dependent drop-offs near the end. The pulse detection method is evaluated for applicability regarding seizure detection in a multitude of scenarios, ranging from videos recorded in a controlled clinical environment to patient supplied videos of seizures filmed with smartphones

    Concept of a Robust & Training-free Probabilistic System for Real-time Intention Analysis in Teams

    Get PDF
    Die Arbeit beschäftigt sich mit der Analyse von Teamintentionen in Smart Environments (SE). Die fundamentale Aussage der Arbeit ist, dass die Entwicklung und Integration expliziter Modelle von Nutzeraufgaben einen wichtigen Beitrag zur Entwicklung mobiler und ubiquitärer Softwaresysteme liefern können. Die Arbeit sammelt Beschreibungen von menschlichem Verhalten sowohl in Gruppensituationen als auch Problemlösungssituationen. Sie untersucht, wie SE-Projekte die Aktivitäten eines Nutzers modellieren, und liefert ein Teamintentionsmodell zur Ableitung und Auswahl geplanten Teamaktivitäten mittels der Beobachtung mehrerer Nutzer durch verrauschte und heterogene Sensoren. Dazu wird ein auf hierarchischen dynamischen Bayes’schen Netzen basierender Ansatz gewählt

    Understanding Person Identification Through Gait

    Get PDF
    Gait recognition is the process of identifying humans from their bipedal locomotion such as walking or running. As such, gait data is privacy sensitive information and should be anonymized where possible. With the rise of higher quality gait recording techniques, such as depth cameras or motion capture suits, an increasing amount of detailed gait data is captured and processed. Introduction and rise of the Metaverse is but one popular application scenario in which the gait of users is transferred onto digital avatars. As a first step towards developing effective anonymization techniques for high-quality gait data, we study different aspects of movement data to quantify their contribution to gait recognition. We first extract categories of features from the literature on human gait perception and then design experiments for each category to assess how much the information they contain contributes to recognition success. Our results show that gait anonymization will be challenging, as the data is highly redundant and interdependent

    Motion Synthesis and Control for Autonomous Agents using Generative Models and Reinforcement Learning

    Get PDF
    Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics often involves laborious fine-tuning and repeated experiments, and may suffer from generalization issues. In this thesis, we explore data-driven approaches using generative models and reinforcement learning to study and simulate human motions. Specifically, we begin with motion synthesis and control of physically simulated agents imitating a wide range of human motor skills, and then focus on improving the local navigation decisions of autonomous agents in multi-agent interaction settings. For physics-based agent control, we introduce an imitation learning framework built upon generative adversarial networks and reinforcement learning that enables humanoid agents to learn motor skills from a few examples of human reference motion data. Our approach generates high-fidelity motions and robust controllers without needing to manually design and finetune a reward function, allowing at the same time interactive switching between different controllers based on user input. Based on this framework, we further propose a multi-objective learning scheme for composite and task-driven control of humanoid agents. Our multi-objective learning scheme balances the simultaneous learning of disparate motions from multiple reference sources and multiple goal-directed control objectives in an adaptive way, enabling the training of efficient composite motion controllers. Additionally, we present a general framework for fast and robust learning of motor control skills. Our framework exploits particle filtering to dynamically explore and discretize the high-dimensional action space involved in continuous control tasks, and provides a multi-modal policy as a substitute for the commonly used Gaussian policies. For navigation learning, we leverage human crowd data to train a human-inspired collision avoidance policy by combining knowledge distillation and reinforcement learning. Our approach enables autonomous agents to take human-like actions during goal-directed steering in fully decentralized, multi-agent environments. To inform better control in such environments, we propose SocialVAE, a variational autoencoder based architecture that uses timewise latent variables with socially-aware conditions and a backward posterior approximation to perform agent trajectory prediction. Our approach improves current state-of-the-art performance on trajectory prediction tasks in daily human interaction scenarios and more complex scenes involving interactions between NBA players. We further extend SocialVAE by exploiting semantic maps as context conditions to generate map-compliant trajectory prediction. Our approach processes context conditions and social conditions occurring during agent-agent interactions in an integrated manner through the use of a dual-attention mechanism. We demonstrate the real-time performance of our approach and its ability to provide high-fidelity, multi-modal predictions on various large-scale vehicle trajectory prediction tasks

    Recognizing talking faces from acoustic Doppler reflections

    Full text link
    • …
    corecore