57 research outputs found
Realtime Multilevel Crowd Tracking using Reciprocal Velocity Obstacles
We present a novel, realtime algorithm to compute the trajectory of each
pedestrian in moderately dense crowd scenes. Our formulation is based on an
adaptive particle filtering scheme that uses a multi-agent motion model based
on velocity-obstacles, and takes into account local interactions as well as
physical and personal constraints of each pedestrian. Our method dynamically
changes the number of particles allocated to each pedestrian based on different
confidence metrics. Additionally, we use a new high-definition crowd video
dataset, which is used to evaluate the performance of different pedestrian
tracking algorithms. This dataset consists of videos of indoor and outdoor
scenes, recorded at different locations with 30-80 pedestrians. We highlight
the performance benefits of our algorithm over prior techniques using this
dataset. In practice, our algorithm can compute trajectories of tens of
pedestrians on a multi-core desktop CPU at interactive rates (27-30 frames per
second). To the best of our knowledge, our approach is 4-5 times faster than
prior methods, which provide similar accuracy
RAIST: Learning Risk Aware Traffic Interactions via Spatio-Temporal Graph Convolutional Networks
A key aspect of driving a road vehicle is to interact with the other road
users, assess their intentions and make risk-aware tactical decisions. An
intuitive approach of enabling an intelligent automated driving system would be
to incorporate some aspects of the human driving behavior. To this end, we
propose a novel driving framework for egocentric views, which is based on
spatio-temporal traffic graphs. The traffic graphs not only model the spatial
interactions amongst the road users, but also their individual intentions
through temporally associated message passing. We leverage spatio-temporal
graph convolutional network (ST-GCN) to train the graph edges. These edges are
formulated using parameterized functions of 3D positions and scene-aware
appearance features of road agents. Along with tactical behavior prediction, it
is crucial to evaluate the risk assessing ability of the proposed framework. We
claim that our framework learns risk aware representations by improving on the
task of risk object identification, especially in identifying objects with
vulnerable interactions like pedestrians and cyclists
Interactive Tracking, Prediction, and Behavior Learning of Pedestrians in Dense Crowds
The ability to automatically recognize human motions and behaviors is a key skill for autonomous machines to exhibit to interact intelligently with a human-inhabited environment. The capabilities autonomous machines should have include computing the motion trajectory of each pedestrian in a crowd, predicting his or her position in the near future, and analyzing the personality characteristics of the pedestrian. Such techniques are frequently used for collision-free robot navigation, data-driven crowd simulation, and crowd surveillance applications. However, prior methods for these problems have been restricted to low-density or sparse crowds where the pedestrian movement is modeled using simple motion models.
In this thesis, we present several interactive algorithms to extract pedestrian trajectories from videos in dense crowds. Our approach combines different pedestrian motion models with particle tracking and mixture models and can obtain an average of improvement in accuracy in medium-density crowds over prior work. We compute the pedestrian dynamics from these trajectories using Bayesian learning techniques and combine them with global methods for long-term pedestrian prediction in densely crowded settings. Finally, we combine these techniques with Personality Trait Theory to automatically classify the dynamic behavior or the personality of a pedestrian based on his or her movements in a crowded scene. The resulting algorithms are robust and can handle sparse and noisy motion trajectories. We demonstrate the benefits of our long-term prediction and behavior classification methods in dense crowds and highlight the benefits over prior techniques.
We highlight the performance of our novel algorithms on three different applications. The first application is interactive data-driven crowd simulation, which includes crowd replication as well as the combination of pedestrian behaviors from different videos. Secondly, we combine the prediction scheme with proxemic characteristics from psychology and use them to perform socially-aware navigation. Finally, we present novel techniques for anomaly detection in low-to medium-density crowd videos using trajectory-level behavior learning.Doctor of Philosoph
HEROES: Unreal Engine-based Human and Emergency Robot Operation Education System
Training and preparing first responders and humanitarian robots for Mass
Casualty Incidents (MCIs) often poses a challenge owing to the lack of
realistic and easily accessible test facilities. While such facilities can
offer realistic scenarios post an MCI that can serve training and educational
purposes for first responders and humanitarian robots, they are often hard to
access owing to logistical constraints. To overcome this challenge, we present
HEROES- a versatile Unreal Engine simulator for designing novel training
simulations for humans and emergency robots for such urban search and rescue
operations. The proposed HEROES simulator is capable of generating synthetic
datasets for machine learning pipelines that are used for training robot
navigation. This work addresses the necessity for a comprehensive training
platform in the robotics community, ensuring pragmatic and efficient
preparation for real-world emergency scenarios. The strengths of our simulator
lie in its adaptability, scalability, and ability to facilitate collaboration
between robot developers and first responders, fostering synergy in developing
effective strategies for search and rescue operations in MCIs. We conducted a
preliminary user study with an 81% positive response supporting the ability of
HEROES to generate sufficiently varied environments, and a 78% positive
response affirming the usefulness of the simulation environment of HEROES
DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention
With the rise in manipulated media, deepfake detection has become an
imperative task for preserving the authenticity of digital content. In this
paper, we present a novel multi-modal audio-video framework designed to
concurrently process audio and video inputs for deepfake detection tasks. Our
model capitalizes on lip synchronization with input audio through a
cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16
network. Subsequently, a transformer encoder network is employed to perform
facial self-attention. We conduct multiple ablation studies highlighting
different strengths of our approach. Our multi-modal methodology outperforms
state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and
per-video AUC scores
- …