1,074 research outputs found
Spatio-Temporal Dual Graph Neural Networks for Travel Time Estimation
Travel time estimation is one of the core tasks for the development of
intelligent transportation systems. Most previous works model the road segments
or intersections separately by learning their spatio-temporal characteristics
to estimate travel time. However, due to the continuous alternations of the
road segments and intersections in a path, the dynamic features are supposed to
be coupled and interactive. Therefore, modeling one of them limits further
improvement in accuracy of estimating travel time. To address the above
problems, a novel graph-based deep learning framework for travel time
estimation is proposed in this paper, namely Spatio-Temporal Dual Graph Neural
Networks (STDGNN). Specifically, we first establish the node-wise and edge-wise
graphs to respectively characterize the adjacency relations of intersections
and that of road segments. In order to extract the joint spatio-temporal
correlations of the intersections and road segments, we adopt the
spatio-temporal dual graph learning approach that incorporates multiple
spatial-temporal dual graph learning modules with multi-scale network
architectures for capturing multi-level spatial-temporal information from the
dual graph. Finally, we employ the multi-task learning approach to estimate the
travel time of a given whole route, each road segment and intersection
simultaneously. We conduct extensive experiments to evaluate our proposed model
on three real-world trajectory datasets, and the experimental results show that
STDGNN significantly outperforms several state-of-art baselines
Spatiotemporal Event Graphs for Dynamic Scene Understanding
Dynamic scene understanding is the ability of a computer system to interpret
and make sense of the visual information present in a video of a real-world
scene. In this thesis, we present a series of frameworks for dynamic scene
understanding starting from road event detection from an autonomous driving
perspective to complex video activity detection, followed by continual learning
approaches for the life-long learning of the models. Firstly, we introduce the
ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge
the first of its kind. Due to the lack of datasets equipped with formally
specified logical requirements, we also introduce the ROad event Awareness
Dataset with logical Requirements (ROAD-R), the first publicly available
dataset for autonomous driving with requirements expressed as logical
constraints, as a tool for driving neurosymbolic research in the area. Next, we
extend event detection to holistic scene understanding by proposing two complex
activity detection methods. In the first method, we present a deformable,
spatiotemporal scene graph approach, consisting of three main building blocks:
action tube detection, a 3D deformable RoI pooling layer designed for learning
the flexible, deformable geometry of the constituent action tubes, and a scene
graph constructed by considering all parts as nodes and connecting them based
on different semantics. In a second approach evolving from the first, we
propose a hybrid graph neural network that combines attention applied to a
graph encoding of the local (short-term) dynamic scene with a temporal graph
modelling the overall long-duration activity. Finally, the last part of the
thesis is about presenting a new continual semi-supervised learning (CSSL)
paradigm.Comment: PhD thesis, Oxford Brookes University, Examiners: Prof. Dima Damen
and Dr. Matthias Rolf, 183 page
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Machine learning techniques are now integral to the advancement of
intelligent urban services, playing a crucial role in elevating the efficiency,
sustainability, and livability of urban environments. The recent emergence of
foundation models such as ChatGPT marks a revolutionary shift in the fields of
machine learning and artificial intelligence. Their unparalleled capabilities
in contextual understanding, problem solving, and adaptability across a wide
range of tasks suggest that integrating these models into urban domains could
have a transformative impact on the development of smart cities. Despite
growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces
challenges such as a lack of clear definitions, systematic reviews, and
universalizable solutions. To this end, this paper first introduces the concept
of UFM and discusses the unique challenges involved in building them. We then
propose a data-centric taxonomy that categorizes current UFM-related works,
based on urban data modalities and types. Furthermore, to foster advancement in
this field, we present a promising framework aimed at the prospective
realization of UFMs, designed to overcome the identified challenges.
Additionally, we explore the application landscape of UFMs, detailing their
potential impact in various urban contexts. Relevant papers and open-source
resources have been collated and are continuously updated at
https://github.com/usail-hkust/Awesome-Urban-Foundation-Models
A Survey of Deep Learning Solutions for Anomaly Detection in Surveillance Videos
Deep learning has proven to be a landmark computing approach to the computer vision domain. Hence, it has been widely applied to solve complex cognitive tasks like the detection of anomalies in surveillance videos. Anomaly detection in this case is the identification of abnormal events in the surveillance videos which can be deemed as security incidents or threats. Deep learning solutions for anomaly detection has outperformed other traditional machine learning solutions. This review attempts to provide holistic benchmarking of the published deep learning solutions for videos anomaly detection since 2016. The paper identifies, the learning technique, datasets used and the overall model accuracy. Reviewed papers were organised into five deep learning methods namely; autoencoders, continual learning, transfer learning, reinforcement learning and ensemble learning. Current and emerging trends are discussed as well
Survey on video anomaly detection in dynamic scenes with moving cameras
The increasing popularity of compact and inexpensive cameras, e.g.~dash
cameras, body cameras, and cameras equipped on robots, has sparked a growing
interest in detecting anomalies within dynamic scenes recorded by moving
cameras. However, existing reviews primarily concentrate on Video Anomaly
Detection (VAD) methods assuming static cameras. The VAD literature with moving
cameras remains fragmented, lacking comprehensive reviews to date. To address
this gap, we endeavor to present the first comprehensive survey on Moving
Camera Video Anomaly Detection (MC-VAD). We delve into the research papers
related to MC-VAD, critically assessing their limitations and highlighting
associated challenges. Our exploration encompasses three application domains:
security, urban transportation, and marine environments, which in turn cover
six specific tasks. We compile an extensive list of 25 publicly-available
datasets spanning four distinct environments: underwater, water surface,
ground, and aerial. We summarize the types of anomalies these datasets
correspond to or contain, and present five main categories of approaches for
detecting such anomalies. Lastly, we identify future research directions and
discuss novel contributions that could advance the field of MC-VAD. With this
survey, we aim to offer a valuable reference for researchers and practitioners
striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie
Learning spatiotemporal patterns for monitoring smart cities and infrastructure
Recent advances in the Internet of Things (IoT) have changed the way we interact with the world. The ability to monitor and manage objects in the physical world electronically makes it possible to bring data-driven decision making to new realms of city infrastructure and management. Large volumes of spatiotemporal data have been collected from pervasive sensors in both indoor and outdoor environments, and this data reveals dynamic patterns in cities, infrastructure, and public property. In light of the need for new approaches to analysing such data, in this thesis, we propose present relevant data mining techniques and machine learning approaches to extract knowledge from spatiotemporal data to solve real-world problems. Many challenges and problems are under-addressed in smart cities and infrastructure monitoring systems such as indoor person identification, evaluation of city regions segmentation with parking events, fine collection from cars in violations, parking occupancy prediction and airport aircraft path map reconstruction. All the above problems are associated with both spatial and temporal information and the accurate pattern recognition of these spatiotemporal data are essential for determining problem solutions. Therefore, how to incorporate spatiotemporal data mining techniques, artificial intelligence approaches and expert knowledge in each specific domain is a common challenge. In the indoor person identification area, identifying the person accessing a secured room without vision-based or device-based systems is very challenging. In particular, to distinguish time-series patterns on high-dimensional wireless signal channels caused by different activities and people, requires novel time-series data mining approaches. To solve this important problem, we established a device-free system and proposed a two-step solution to identify a person who has accessed a secure area such as an office. Establishing smart parking systems in cities is a key component of smart cities and infrastructure construction. Many sub-problems such as parking space arrangements, fine collection and parking occupancy prediction are urgent and important for city managers. Arranging parking spaces based on historical data can improve the utilisation rate of parking spaces. To arrange parking spaces based on collected spatiotemporal data requires reasonable region segmentation approaches. Moreover, evaluating parking space grouping results needs to consider the correlation between the spatial and temporal domains since these are heterogeneous. Therefore, we have designed a spatiotemporal data clustering evaluation approach, which exploits the correlation between the spatial domain and the temporal domain. It can evaluate the segmentation results of parking spaces in cities using historical data and similar clustering results that group data consisting of both spatial and temporal domains. For fine collection problem, using the sensor instrumentation installed in parking spaces to detect cars in violation and issue infringement notices in a short time-window to catch these cars in time is significantly difficult. This is because most cars in violation leave within a short period and multiple cars are in violation at the same time. Parking officers need to choose the best route to collect fines from these drivers in the shortest time. Therefore, we proposed a new optimisation problem called the Travelling Officer Problem and a general probability-based model. We succeeded in integrating temporal information and the traditional optimisation algorithm. This model can suggest to parking officers an optimised path that maximise the probability to catch the cars in violation in time. To solve this problem in real-time, we incorporated the model with deep learning methods. We proposed a theoretical approach to solving the traditional orienteering problem with deep learning networks. This approach could improve the efficiency of similar urban computing problems as well. For parking occupancy prediction, a key problem in parking space management is with providing a car parking availability prediction service that can inform car drivers of vacant parking lots before they start their journeys using prediction approaches. We proposed a deep learning-based model to solve this parking occupancy prediction problem using spatiotemporal data analysis techniques. This model can be generalised to other spatiotemporal data prediction problems also. In the airport aircraft management area, grouping similar spatiotemporal data is widely used in the real world. Determining key features and combining similar data are two key problems in this area. We presented a new framework to group similar spatiotemporal data and construct a road graph with GPS data. We evaluated our framework experimentally using a state-of-the-art test-bed technique and found that it could effectively and efficiently construct and update airport aircraft route map. In conclusion, the studies in this thesis aimed to discover intrinsic and dynamic patterns from spatiotemporal data and proposed corresponding solutions for real-world smart cities and infrastructures monitoring problems via spatiotemporal pattern analysis and machine learning approaches. We hope this research will inspire the research community to develop more robust and effective approaches to solve existing problems in this area in the future
Spatiotemporal Event Graphs for Dynamic Scene Understanding
Dynamic scene understanding is the ability of a computer system to interpret and make sense of the visual information present in a video of a real-world scene. In this thesis, we present a series of frameworks for dynamic scene understanding starting from road event detection from an autonomous driving perspective to complex video activity detection, followed by continual learning approaches for the life-long learning of the models. Firstly, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROAD is designed to test an autonomous vehicle’s ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. Due to the lack of datasets equipped with formally specified logical requirements, we also introduce the ROad event Awareness Dataset with logical Requirements (ROAD-R), the first publicly available dataset for autonomous driving with requirements expressed as logical constraints, as a tool for driving neurosymbolic research in the area.
Next, we extend event detection to holistic scene understanding by proposing two complex activity detection methods. In the first method, we present a deformable, spatiotemporal scene graph approach, consisting of three main building blocks: action tube detection, a 3D deformable RoI pooling layer designed for learning the flexible, deformable geometry of the constituent action tubes, and a scene graph constructed by considering all parts as nodes and connecting them based on different semantics. In a second approach evolving from the first, we propose a hybrid graph neural network that combines attention applied to a graph encoding of the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Our contribution is threefold: i) a feature extraction technique; ii) a method for constructing a local scene graph followed by graph attention, and iii) a graph for temporally connecting all the local dynamic scene graphs.
Finally, the last part of the thesis is about presenting a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community. We also propose to formulate the continual semi-supervised learning problem as a latent-variable
- …