57 research outputs found

    Realtime Multilevel Crowd Tracking using Reciprocal Velocity Obstacles

    Full text link
    We present a novel, realtime algorithm to compute the trajectory of each pedestrian in moderately dense crowd scenes. Our formulation is based on an adaptive particle filtering scheme that uses a multi-agent motion model based on velocity-obstacles, and takes into account local interactions as well as physical and personal constraints of each pedestrian. Our method dynamically changes the number of particles allocated to each pedestrian based on different confidence metrics. Additionally, we use a new high-definition crowd video dataset, which is used to evaluate the performance of different pedestrian tracking algorithms. This dataset consists of videos of indoor and outdoor scenes, recorded at different locations with 30-80 pedestrians. We highlight the performance benefits of our algorithm over prior techniques using this dataset. In practice, our algorithm can compute trajectories of tens of pedestrians on a multi-core desktop CPU at interactive rates (27-30 frames per second). To the best of our knowledge, our approach is 4-5 times faster than prior methods, which provide similar accuracy

    RAIST: Learning Risk Aware Traffic Interactions via Spatio-Temporal Graph Convolutional Networks

    Full text link
    A key aspect of driving a road vehicle is to interact with the other road users, assess their intentions and make risk-aware tactical decisions. An intuitive approach of enabling an intelligent automated driving system would be to incorporate some aspects of the human driving behavior. To this end, we propose a novel driving framework for egocentric views, which is based on spatio-temporal traffic graphs. The traffic graphs not only model the spatial interactions amongst the road users, but also their individual intentions through temporally associated message passing. We leverage spatio-temporal graph convolutional network (ST-GCN) to train the graph edges. These edges are formulated using parameterized functions of 3D positions and scene-aware appearance features of road agents. Along with tactical behavior prediction, it is crucial to evaluate the risk assessing ability of the proposed framework. We claim that our framework learns risk aware representations by improving on the task of risk object identification, especially in identifying objects with vulnerable interactions like pedestrians and cyclists

    Interactive Tracking, Prediction, and Behavior Learning of Pedestrians in Dense Crowds

    Get PDF
    The ability to automatically recognize human motions and behaviors is a key skill for autonomous machines to exhibit to interact intelligently with a human-inhabited environment. The capabilities autonomous machines should have include computing the motion trajectory of each pedestrian in a crowd, predicting his or her position in the near future, and analyzing the personality characteristics of the pedestrian. Such techniques are frequently used for collision-free robot navigation, data-driven crowd simulation, and crowd surveillance applications. However, prior methods for these problems have been restricted to low-density or sparse crowds where the pedestrian movement is modeled using simple motion models. In this thesis, we present several interactive algorithms to extract pedestrian trajectories from videos in dense crowds. Our approach combines different pedestrian motion models with particle tracking and mixture models and can obtain an average of 20%20\% improvement in accuracy in medium-density crowds over prior work. We compute the pedestrian dynamics from these trajectories using Bayesian learning techniques and combine them with global methods for long-term pedestrian prediction in densely crowded settings. Finally, we combine these techniques with Personality Trait Theory to automatically classify the dynamic behavior or the personality of a pedestrian based on his or her movements in a crowded scene. The resulting algorithms are robust and can handle sparse and noisy motion trajectories. We demonstrate the benefits of our long-term prediction and behavior classification methods in dense crowds and highlight the benefits over prior techniques. We highlight the performance of our novel algorithms on three different applications. The first application is interactive data-driven crowd simulation, which includes crowd replication as well as the combination of pedestrian behaviors from different videos. Secondly, we combine the prediction scheme with proxemic characteristics from psychology and use them to perform socially-aware navigation. Finally, we present novel techniques for anomaly detection in low-to medium-density crowd videos using trajectory-level behavior learning.Doctor of Philosoph

    HEROES: Unreal Engine-based Human and Emergency Robot Operation Education System

    Full text link
    Training and preparing first responders and humanitarian robots for Mass Casualty Incidents (MCIs) often poses a challenge owing to the lack of realistic and easily accessible test facilities. While such facilities can offer realistic scenarios post an MCI that can serve training and educational purposes for first responders and humanitarian robots, they are often hard to access owing to logistical constraints. To overcome this challenge, we present HEROES- a versatile Unreal Engine simulator for designing novel training simulations for humans and emergency robots for such urban search and rescue operations. The proposed HEROES simulator is capable of generating synthetic datasets for machine learning pipelines that are used for training robot navigation. This work addresses the necessity for a comprehensive training platform in the robotics community, ensuring pragmatic and efficient preparation for real-world emergency scenarios. The strengths of our simulator lie in its adaptability, scalability, and ability to facilitate collaboration between robot developers and first responders, fostering synergy in developing effective strategies for search and rescue operations in MCIs. We conducted a preliminary user study with an 81% positive response supporting the ability of HEROES to generate sufficiently varied environments, and a 78% positive response affirming the usefulness of the simulation environment of HEROES

    DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention

    Full text link
    With the rise in manipulated media, deepfake detection has become an imperative task for preserving the authenticity of digital content. In this paper, we present a novel multi-modal audio-video framework designed to concurrently process audio and video inputs for deepfake detection tasks. Our model capitalizes on lip synchronization with input audio through a cross-attention mechanism while extracting visual cues via a fine-tuned VGG-16 network. Subsequently, a transformer encoder network is employed to perform facial self-attention. We conduct multiple ablation studies highlighting different strengths of our approach. Our multi-modal methodology outperforms state-of-the-art multi-modal deepfake detection techniques in terms of F-1 and per-video AUC scores
    • …
    corecore