9,869 research outputs found

    Pedestrian Trajectory Prediction with Structured Memory Hierarchies

    Full text link
    This paper presents a novel framework for human trajectory prediction based on multimodal data (video and radar). Motivated by recent neuroscience discoveries, we propose incorporating a structured memory component in the human trajectory prediction pipeline to capture historical information to improve performance. We introduce structured LSTM cells for modelling the memory content hierarchically, preserving the spatiotemporal structure of the information and enabling us to capture both short-term and long-term context. We demonstrate how this architecture can be extended to integrate salient information from multiple modalities to automatically store and retrieve important information for decision making without any supervision. We evaluate the effectiveness of the proposed models on a novel multimodal dataset that we introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from a radar system and a CCTV camera system installed in a public place. The performance is also evaluated on the publicly available New York Grand Central pedestrian database. In both settings, the proposed models demonstrate their capability to better anticipate future pedestrian motion compared to existing state of the art.Comment: To appear in ECML-PKDD 201

    Exploring multimodal data fusion through joint decompositions with flexible couplings

    Full text link
    A Bayesian framework is proposed to define flexible coupling models for joint tensor decompositions of multiple data sets. Under this framework, a natural formulation of the data fusion problem is to cast it in terms of a joint maximum a posteriori (MAP) estimator. Data driven scenarios of joint posterior distributions are provided, including general Gaussian priors and non Gaussian coupling priors. We present and discuss implementation issues of algorithms used to obtain the joint MAP estimator. We also show how this framework can be adapted to tackle the problem of joint decompositions of large datasets. In the case of a conditional Gaussian coupling with a linear transformation, we give theoretical bounds on the data fusion performance using the Bayesian Cramer-Rao bound. Simulations are reported for hybrid coupling models ranging from simple additive Gaussian models, to Gamma-type models with positive variables and to the coupling of data sets which are inherently of different size due to different resolution of the measurement devices.Comment: 15 pages, 7 figures, revised versio

    Speech-driven Animation with Meaningful Behaviors

    Full text link
    Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, (2) initializes the state configuration models increasing the range of the generated behaviors, and (3) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table

    Migrants Selection and Replacement in Distributed Evolutionary Algorithms for Dynamic Optimization

    Get PDF
    Many distributed systems (task scheduling, moving priorities, changing mobile environments, ...) can be linked as Dynamic Optimization Problems (DOPs), since they require to pursue an optimal value that changes over time. Consequently, we have focused on the utilization of Distributed Genetic Algorithms (dGAs), one of the domains still to be investigated for DOPs. A dGA essentially decentralizes the population in islands which cooperate through migrations of individuals. In this article, we analyze the effect of the migrants selection and replacement on the performance of the dGA for DOPs. Quality and distance based criteria are tested using a comprehensive set of benchmarks. Results show the benefits and drawbacks of each setting in dynamic optimization.Universidad de Málaga. Proyecto roadME (TIN2011-28194). Programa de movilidad de la AUIP

    Multimodal Multipart Learning for Action Recognition in Depth Videos

    Full text link
    The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy
    corecore