1,074 research outputs found

    Spatio-Temporal Dual Graph Neural Networks for Travel Time Estimation

    Full text link
    Travel time estimation is one of the core tasks for the development of intelligent transportation systems. Most previous works model the road segments or intersections separately by learning their spatio-temporal characteristics to estimate travel time. However, due to the continuous alternations of the road segments and intersections in a path, the dynamic features are supposed to be coupled and interactive. Therefore, modeling one of them limits further improvement in accuracy of estimating travel time. To address the above problems, a novel graph-based deep learning framework for travel time estimation is proposed in this paper, namely Spatio-Temporal Dual Graph Neural Networks (STDGNN). Specifically, we first establish the node-wise and edge-wise graphs to respectively characterize the adjacency relations of intersections and that of road segments. In order to extract the joint spatio-temporal correlations of the intersections and road segments, we adopt the spatio-temporal dual graph learning approach that incorporates multiple spatial-temporal dual graph learning modules with multi-scale network architectures for capturing multi-level spatial-temporal information from the dual graph. Finally, we employ the multi-task learning approach to estimate the travel time of a given whole route, each road segment and intersection simultaneously. We conduct extensive experiments to evaluate our proposed model on three real-world trajectory datasets, and the experimental results show that STDGNN significantly outperforms several state-of-art baselines

    Spatiotemporal Event Graphs for Dynamic Scene Understanding

    Full text link
    Dynamic scene understanding is the ability of a computer system to interpret and make sense of the visual information present in a video of a real-world scene. In this thesis, we present a series of frameworks for dynamic scene understanding starting from road event detection from an autonomous driving perspective to complex video activity detection, followed by continual learning approaches for the life-long learning of the models. Firstly, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. Due to the lack of datasets equipped with formally specified logical requirements, we also introduce the ROad event Awareness Dataset with logical Requirements (ROAD-R), the first publicly available dataset for autonomous driving with requirements expressed as logical constraints, as a tool for driving neurosymbolic research in the area. Next, we extend event detection to holistic scene understanding by proposing two complex activity detection methods. In the first method, we present a deformable, spatiotemporal scene graph approach, consisting of three main building blocks: action tube detection, a 3D deformable RoI pooling layer designed for learning the flexible, deformable geometry of the constituent action tubes, and a scene graph constructed by considering all parts as nodes and connecting them based on different semantics. In a second approach evolving from the first, we propose a hybrid graph neural network that combines attention applied to a graph encoding of the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Finally, the last part of the thesis is about presenting a new continual semi-supervised learning (CSSL) paradigm.Comment: PhD thesis, Oxford Brookes University, Examiners: Prof. Dima Damen and Dr. Matthias Rolf, 183 page

    Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models

    Full text link
    Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual understanding, problem solving, and adaptability across a wide range of tasks suggest that integrating these models into urban domains could have a transformative impact on the development of smart cities. Despite growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces challenges such as a lack of clear definitions, systematic reviews, and universalizable solutions. To this end, this paper first introduces the concept of UFM and discusses the unique challenges involved in building them. We then propose a data-centric taxonomy that categorizes current UFM-related works, based on urban data modalities and types. Furthermore, to foster advancement in this field, we present a promising framework aimed at the prospective realization of UFMs, designed to overcome the identified challenges. Additionally, we explore the application landscape of UFMs, detailing their potential impact in various urban contexts. Relevant papers and open-source resources have been collated and are continuously updated at https://github.com/usail-hkust/Awesome-Urban-Foundation-Models

    A Survey of Deep Learning Solutions for Anomaly Detection in Surveillance Videos

    Get PDF
    Deep learning has proven to be a landmark computing approach to the computer vision domain. Hence, it has been widely applied to solve complex cognitive tasks like the detection of anomalies in surveillance videos. Anomaly detection in this case is the identification of abnormal events in the surveillance videos which can be deemed as security incidents or threats. Deep learning solutions for anomaly detection has outperformed other traditional machine learning solutions. This review attempts to provide holistic benchmarking of the published deep learning solutions for videos anomaly detection since 2016. The paper identifies, the learning technique, datasets used and the overall model accuracy. Reviewed papers were organised into five deep learning methods namely; autoencoders, continual learning, transfer learning, reinforcement learning and ensemble learning. Current and emerging trends are discussed as well

    Survey on video anomaly detection in dynamic scenes with moving cameras

    Full text link
    The increasing popularity of compact and inexpensive cameras, e.g.~dash cameras, body cameras, and cameras equipped on robots, has sparked a growing interest in detecting anomalies within dynamic scenes recorded by moving cameras. However, existing reviews primarily concentrate on Video Anomaly Detection (VAD) methods assuming static cameras. The VAD literature with moving cameras remains fragmented, lacking comprehensive reviews to date. To address this gap, we endeavor to present the first comprehensive survey on Moving Camera Video Anomaly Detection (MC-VAD). We delve into the research papers related to MC-VAD, critically assessing their limitations and highlighting associated challenges. Our exploration encompasses three application domains: security, urban transportation, and marine environments, which in turn cover six specific tasks. We compile an extensive list of 25 publicly-available datasets spanning four distinct environments: underwater, water surface, ground, and aerial. We summarize the types of anomalies these datasets correspond to or contain, and present five main categories of approaches for detecting such anomalies. Lastly, we identify future research directions and discuss novel contributions that could advance the field of MC-VAD. With this survey, we aim to offer a valuable reference for researchers and practitioners striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie

    Learning spatiotemporal patterns for monitoring smart cities and infrastructure

    Get PDF
    Recent advances in the Internet of Things (IoT) have changed the way we interact with the world. The ability to monitor and manage objects in the physical world electronically makes it possible to bring data-driven decision making to new realms of city infrastructure and management. Large volumes of spatiotemporal data have been collected from pervasive sensors in both indoor and outdoor environments, and this data reveals dynamic patterns in cities, infrastructure, and public property. In light of the need for new approaches to analysing such data, in this thesis, we propose present relevant data mining techniques and machine learning approaches to extract knowledge from spatiotemporal data to solve real-world problems. Many challenges and problems are under-addressed in smart cities and infrastructure monitoring systems such as indoor person identification, evaluation of city regions segmentation with parking events, fine collection from cars in violations, parking occupancy prediction and airport aircraft path map reconstruction. All the above problems are associated with both spatial and temporal information and the accurate pattern recognition of these spatiotemporal data are essential for determining problem solutions. Therefore, how to incorporate spatiotemporal data mining techniques, artificial intelligence approaches and expert knowledge in each specific domain is a common challenge. In the indoor person identification area, identifying the person accessing a secured room without vision-based or device-based systems is very challenging. In particular, to distinguish time-series patterns on high-dimensional wireless signal channels caused by different activities and people, requires novel time-series data mining approaches. To solve this important problem, we established a device-free system and proposed a two-step solution to identify a person who has accessed a secure area such as an office. Establishing smart parking systems in cities is a key component of smart cities and infrastructure construction. Many sub-problems such as parking space arrangements, fine collection and parking occupancy prediction are urgent and important for city managers. Arranging parking spaces based on historical data can improve the utilisation rate of parking spaces. To arrange parking spaces based on collected spatiotemporal data requires reasonable region segmentation approaches. Moreover, evaluating parking space grouping results needs to consider the correlation between the spatial and temporal domains since these are heterogeneous. Therefore, we have designed a spatiotemporal data clustering evaluation approach, which exploits the correlation between the spatial domain and the temporal domain. It can evaluate the segmentation results of parking spaces in cities using historical data and similar clustering results that group data consisting of both spatial and temporal domains. For fine collection problem, using the sensor instrumentation installed in parking spaces to detect cars in violation and issue infringement notices in a short time-window to catch these cars in time is significantly difficult. This is because most cars in violation leave within a short period and multiple cars are in violation at the same time. Parking officers need to choose the best route to collect fines from these drivers in the shortest time. Therefore, we proposed a new optimisation problem called the Travelling Officer Problem and a general probability-based model. We succeeded in integrating temporal information and the traditional optimisation algorithm. This model can suggest to parking officers an optimised path that maximise the probability to catch the cars in violation in time. To solve this problem in real-time, we incorporated the model with deep learning methods. We proposed a theoretical approach to solving the traditional orienteering problem with deep learning networks. This approach could improve the efficiency of similar urban computing problems as well. For parking occupancy prediction, a key problem in parking space management is with providing a car parking availability prediction service that can inform car drivers of vacant parking lots before they start their journeys using prediction approaches. We proposed a deep learning-based model to solve this parking occupancy prediction problem using spatiotemporal data analysis techniques. This model can be generalised to other spatiotemporal data prediction problems also. In the airport aircraft management area, grouping similar spatiotemporal data is widely used in the real world. Determining key features and combining similar data are two key problems in this area. We presented a new framework to group similar spatiotemporal data and construct a road graph with GPS data. We evaluated our framework experimentally using a state-of-the-art test-bed technique and found that it could effectively and efficiently construct and update airport aircraft route map. In conclusion, the studies in this thesis aimed to discover intrinsic and dynamic patterns from spatiotemporal data and proposed corresponding solutions for real-world smart cities and infrastructures monitoring problems via spatiotemporal pattern analysis and machine learning approaches. We hope this research will inspire the research community to develop more robust and effective approaches to solve existing problems in this area in the future

    Spatiotemporal Event Graphs for Dynamic Scene Understanding

    Get PDF
    Dynamic scene understanding is the ability of a computer system to interpret and make sense of the visual information present in a video of a real-world scene. In this thesis, we present a series of frameworks for dynamic scene understanding starting from road event detection from an autonomous driving perspective to complex video activity detection, followed by continual learning approaches for the life-long learning of the models. Firstly, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROAD is designed to test an autonomous vehicle’s ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. Due to the lack of datasets equipped with formally specified logical requirements, we also introduce the ROad event Awareness Dataset with logical Requirements (ROAD-R), the first publicly available dataset for autonomous driving with requirements expressed as logical constraints, as a tool for driving neurosymbolic research in the area. Next, we extend event detection to holistic scene understanding by proposing two complex activity detection methods. In the first method, we present a deformable, spatiotemporal scene graph approach, consisting of three main building blocks: action tube detection, a 3D deformable RoI pooling layer designed for learning the flexible, deformable geometry of the constituent action tubes, and a scene graph constructed by considering all parts as nodes and connecting them based on different semantics. In a second approach evolving from the first, we propose a hybrid graph neural network that combines attention applied to a graph encoding of the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Our contribution is threefold: i) a feature extraction technique; ii) a method for constructing a local scene graph followed by graph attention, and iii) a graph for temporally connecting all the local dynamic scene graphs. Finally, the last part of the thesis is about presenting a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community. We also propose to formulate the continual semi-supervised learning problem as a latent-variable
    corecore