306 research outputs found
Behavioral Priors for Detection and Tracking of Pedestrians in Video Sequences
In this paper we address the problem of detection and tracking of pedestrians in complex scenarios. The inclusion of prior knowledge is more and more crucial in scene analysis to guarantee flexibility and robustness, necessary to have reliability in complex scenes. We aim to combine image processing methods with behavioral models of pedestrian dynamics, calibrated on real data. We introduce Discrete Choice Models (DCM) for pedestrian behavior and we discuss their integration in a detection and tracking context. The obtained results show how it is possible to combine both methodologies to improve the performances of such systems in complex sequence
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
In this work, we explore the correlation between people trajectories and
their head orientations. We argue that people trajectory and head pose
forecasting can be modelled as a joint problem. Recent approaches on trajectory
forecasting leverage short-term trajectories (aka tracklets) of pedestrians to
predict their future paths. In addition, sociological cues, such as expected
destination or pedestrian interaction, are often combined with tracklets. In
this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between
positions and head orientations (vislets) thanks to a joint unconstrained
optimization of full covariance matrices during the LSTM backpropagation. We
additionally exploit the head orientations as a proxy for the visual attention,
when modeling social interactions. MX-LSTM predicts future pedestrians location
and head pose, increasing the standard capabilities of the current approaches
on long-term trajectory forecasting. Compared to the state-of-the-art, our
approach shows better performances on an extensive set of public benchmarks.
MX-LSTM is particularly effective when people move slowly, i.e. the most
challenging scenario for all other models. The proposed approach also allows
for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
Visual Human Tracking and Group Activity Analysis: A Video Mining System for Retail Marketing
Thesis (PhD) - Indiana University, Computer Sciences, 2007In this thesis we present a system for automatic human tracking and activity recognition from
video sequences. The problem of automated analysis of visual information in order to derive descriptors
of high level human activities has intrigued computer vision community for decades and is
considered to be largely unsolved. A part of this interest is derived from the vast range of applications
in which such a solution may be useful. We attempt to find efficient formulations of these tasks
as applied to the extracting customer behavior information in a retail marketing context. Based on
these formulations, we present a system that visually tracks customers in a retail store and performs
a number of activity analysis tasks based on the output from the tracker.
In tracking we introduce new techniques for pedestrian detection, initialization of the body
model and a formulation of the temporal tracking as a global trans-dimensional optimization problem.
Initial human detection is addressed by a novel method for head detection, which incorporates
the knowledge of the camera projection model.The initialization of the human body model is addressed
by newly developed shape and appearance descriptors. Temporal tracking of customer
trajectories is performed by employing a human body tracking system designed as a Bayesian
jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality
ambiguities as people are leaving and entering the scene.
Following the tracking, we developed a two-stage group activity formulation based upon the
ideas from swarming research. For modeling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions,
which combine to derive a distance metric used in further swarm clustering. This way, in the
first stage the shoppers that belong to the same group are identified by deterministically clustering
bodies to detect short term events and in the second stage events are post-processed to form clusters
of group activities with fuzzy memberships.
Quantitative analysis of the tracking subsystem shows an improvement over the state of the
art methods, if used under similar conditions. Finally, based on the output from the tracker, the
activity recognition procedure achieves over 80% correct shopper group detection, as validated by
the human generated ground truth results
Understanding Vehicular Traffic Behavior from Video: A Survey of Unsupervised Approaches
Recent emerging trends for automatic behavior analysis and understanding from infrastructure video are reviewed. Research has shifted from high-resolution estimation of vehicle state and instead, pushed machine learning approaches to extract meaningful patterns in aggregates in an unsupervised fashion. These patterns represent priors on observable motion, which can be utilized to describe a scene, answer behavior questions such as where is a vehicle going, how many vehicles are performing the same action, and to detect an abnormal event. The review focuses on two main methods for scene description, trajectory clustering and topic modeling. Example applications that utilize the behavioral modeling techniques are also presented. In addition, the most popular public datasets for behavioral analysis are presented. Discussion and comment on future directions in the field are also provide
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Predicting pedestrian crossing intentions using contextual information
El entorno urbano es uno de los escenarios m as complejos para un veh culo aut onomo, ya
que lo comparte con otros tipos de usuarios conocidos como usuarios vulnerables de la
carretera, con los peatones como mayor representante. Estos usuarios se caracterizan por
su gran dinamicidad. A pesar del gran n umero de interacciones entre veh culos y peatones,
la seguridad de estos ultimos no ha aumentado al mismo ritmo que la de los ocupantes de
los veh culos. Por esta raz on, es necesario abordar este problema. Una posible estrategia
estar a basada en conseguir que los veh culos anticipen el comportamiento de los peatones
para minimizar situaciones de riesgo, especialmente presentes en el momento de cruce.
El objetivo de esta tesis doctoral es alcanzar dicha anticipaci on mediante el desarrollo
de t ecnicas de predicci on de la acci on de cruce de peatones basadas en aprendizaje
profundo.
Previo al dise~no e implementaci on de los sistemas de predicci on, se ha desarrollado
un sistema de clasi caci on con el objetivo de discernir a los peatones involucrados en la
escena vial. El sistema, basado en redes neuronales convolucionales, ha sido entrenado y
validado con un conjunto de datos personalizado. Dicho conjunto se ha construido a partir
de varios conjuntos existentes y aumentado mediante la inclusi on de im agenes obtenidas de
internet. Este paso previo a la anticipaci on permitir a reducir el procesamiento innecesario
dentro del sistema de percepci on del veh culo.
Tras este paso, se han desarrollado dos sistemas como propuesta para abordar el problema
de predicci on.
El primer sistema, basado en redes convolucionales y recurrentes, obtiene una predicci
on a corto plazo de la acci on de cruce realizada un segundo en el futuro. La informaci on
de entrada al modelo est a basada principalmente en imagen, que permite aportar contexto
adicional del peat on. Adem as, el uso de otras variables relacionadas con el peat on junto
con mejoras en la arquitectura, permiten mejorar considerablemente los resultados en el
conjunto de datos JAAD.
El segundo sistema se basa en una arquitectura end-to-end basado en la combinaci on
de redes neuronales convolucionales tridimensionales y/o el codi cador de la arquitectura
Transformer. En este modelo, a diferencia del anterior, la mayor a de las mejoras est an
centradas en transformaciones de los datos de entrada. Tras analizar dichas mejoras,
una serie de modelos se han evaluado y comparado con otros m etodos utilizando tanto el
conjunto de datos JAAD como PIE. Los resultados obtenidos han conseguido liderar el
estado del arte, validando la arquitectura propuesta.The urban environment is one of the most complex scenarios for an autonomous vehicle,
as it is shared with other types of users known as vulnerable road users, with pedestrians
as their principal representative. These users are characterized by their great dynamicity.
Despite a large number of interactions between vehicles and pedestrians, the safety of
pedestrians has not increased at the same rate as that of vehicle occupants. For this
reason, it is necessary to address this problem. One possible strategy would be anticipating
pedestrian behavior to minimize risky situations, especially during the crossing.
The objective of this doctoral thesis is to achieve such anticipation through the development
of crosswalk action prediction techniques based on deep learning.
Before the design and implementation of the prediction systems, a classi cation system
has been developed to discern the pedestrians involved in the road scene. The system,
based on convolutional neural networks, has been trained and validated with a customized
dataset. This set has been built from several existing sets and augmented by including
images obtained from the Internet. This pre-anticipation step would reduce unnecessary
processing within the vehicle perception system.
After this step, two systems have been developed as a proposal to solve the prediction
problem.
The rst system is composed of convolutional and recurrent encoder networks. It
obtains a short-term prediction of the crossing action performed one second in the future.
The input information to the model is mainly image-based, which provides additional
pedestrian context. In addition, the use of pedestrian-related variables and architectural
improvements allows better results on the JAAD dataset.
The second system is an end-to-end architecture based on the combination of threedimensional
convolutional neural networks and/or the Transformer architecture encoder.
In this model, most of the proposed and investigated improvements are focused on transformations
of the input data. After an extensive set of individual tests, several models
have been trained, evaluated, and compared with other methods using both JAAD and
PIE datasets. Obtained results are among the best state-of-the-art models, validating the
proposed architecture
- …