29 research outputs found

    Real-World Anomaly Detection in Video Using Spatio-Temporal Features Analysis for Weakly Labelled Data with Auto Label Generation

    Get PDF
    Detecting anomalies in videos is a complex task due to diverse content, noisy labeling, and a lack of frame-level labeling. To address these challenges in weakly labeled datasets, we propose a novel custom loss function in conjunction with the multi-instance learning (MIL) algorithm. Our approach utilizes the UCF Crime and ShanghaiTech datasets for anomaly detection. The UCF Crime dataset includes labeled videos depicting a range of incidents such as explosions, assaults, and burglaries, while the ShanghaiTech dataset is one of the largest anomaly datasets, with over 400 video clips featuring three different scenes and 130 abnormal events. We generated pseudo labels for videos using the MIL technique to detect frame-level anomalies from video-level annotations, and to train the network to distinguish between normal and abnormal classes. We conducted extensive experiments on the UCF Crime dataset using C3D and I3D features to test our model\u27s performance. For the ShanghaiTech dataset, we used I3D features for training and testing. Our results show that with I3D features, we achieve an 84.6% frame-level AUC score for the UCF Crime dataset and a 92.27% frame-level AUC score for the ShanghaiTech dataset, which are comparable to other methods used for similar datasets

    Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning

    Full text link
    The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers working on visual scene understanding and related fields. Further, we introduce a joint model for joint importance level ranking and natural language captions generation to benchmark our dataset and demonstrate performance with quantitative evaluations

    Deep Learning: A Philosophical Introduction

    Get PDF
    Deep learning is currently the most prominent and widely successful method in artificial intelligence. Despite having played an active role in earlier artificial intelligence and neural network research, philosophers have been largely silent on this technology so far. This is remarkable, given that deep learning neural networks have blown past predicted upper limits on artificial intelligence performance—recognizing complex objects in natural photographs, and defeating world champions in strategy games as complex as Go and chess—yet there remains no universally-accepted explanation as to why they work so well. This article provides an introduction to these networks, as well as an opinionated guidebook on the philosophical significance of their structure and achievements. It argues that deep learning neural networks differ importantly in their structure and mathematical properties from the shallower neural networks that were the subject of so much philosophical reflection in the 1980s and 1990s. The article then explores several different explanations for their success, and ends by proposing ten areas of research that would benefit from future engagement by philosophers of mind, epistemology, science, perception, law, and ethics

    Multi-Sensor Data Fusion for Robust Environment Reconstruction in Autonomous Vehicle Applications

    Get PDF
    In autonomous vehicle systems, understanding the surrounding environment is mandatory for an intelligent vehicle to make every decision of movement on the road. Knowledge about the neighboring environment enables the vehicle to detect moving objects, especially irregular events such as jaywalking, sudden lane change of the vehicle etc. to avoid collision. This local situation awareness mostly depends on the advanced sensors (e.g. camera, LIDAR, RADAR) added to the vehicle. The main focus of this work is to formulate a problem of reconstructing the vehicle environment using point cloud data from the LIDAR and RGB color images from the camera. Based on a widely used point cloud registration tool such as iterated closest point (ICP), an expectation-maximization (EM)-ICP technique has been proposed to automatically mosaic multiple point cloud sets into a larger one. Motion trajectories of the moving objects are analyzed to address the issue of irregularity detection. Another contribution of this work is the utilization of fusion of color information (from RGB color images captured by the camera) with the three-dimensional point cloud data for better representation of the environment. For better understanding of the surrounding environment, histogram of oriented gradient (HOG) based techniques are exploited to detect pedestrians and vehicles.;Using both camera and LIDAR, an autonomous vehicle can gather information and reconstruct the map of the surrounding environment up to a certain distance. Capability of communicating and cooperating among vehicles can improve the automated driving decisions by providing extended and more precise view of the surroundings. In this work, a transmission power control algorithm is studied along with the adaptive content control algorithm to achieve a more accurate map of the vehicle environment. To exchange the local sensor data among the vehicles, an adaptive communication scheme is proposed that controls the lengths and the contents of the messages depending on the load of the communication channel. The exchange of this information can extend the tracking region of a vehicle beyond the area sensed by its own sensors. In this experiment, a combined effect of power control, and message length and content control algorithm is exploited to improve the map\u27s accuracy of the surroundings in a cooperative automated vehicle system

    Learning to Drive Anywhere

    Full text link
    Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.Comment: Conference on Robot Learning (CoRL) 202

    A Real-time Proactive Intersection Safety Monitoring System Based on Video Data

    Get PDF
    69A3551847102In recent years, identifying road users' behavior and conflicts at intersections have become an essential data source for evaluating traffic safety. According to the Federal Highway Administration (FHWA), in 2020, more than 50% of fatal and injury crashes occurred at or near the intersections, necessitating further investigation. This study developed an innovative artificial intelligence (AI)-based video analytic tool to assess intersection safety using surrogate safety measures. Surrogate safety measures (e.g., Post-encroachment Time (PET) and Time-to-Collision (TTC)) are extensively used to identify future threats, such as rear-end and left-turning collisions due to vehicle and road users' interactions. To extract the trajectory data, this project integrates a real-time AI detection model - YOLO-v5 with a tracking framework based on the DeepSORT algorithm. 54 hours of high-resolution video data were collected at six signalized intersections (including three3-leg intersections and three 4-leg intersections) in Glassboro, New Jersey. Non-compliance behaviors, such as red-light running and pedestrian jaywalking, are captured to better understand the risky behaviors at these locations. The proposed approach achieved an accuracy of 92% to 98% for detecting and tracking the road users' trajectories. Additionally, a user-friendly web-based application was developed that provides directional traffic volumes, pedestrian volumes, vehicles running a red light, pedestrian jaywalking events, and PET and TTC for crossing conflicts between two road users. In addition, an extreme value theory (EVT) was used to estimate the number of crashes at each intersection utilizing the frequency of PETs and TTCs. Finally, the intersections were ranked based on the calculated score considering the severity of crashes. Overall, the developed tool as well as the crash estimation model and ranking method, can provide valuable information for engineers and policymakers to assess the safety of intersections and implement effective countermeasures to mitigate the intersection-involved crashes
    corecore