190 research outputs found
GEO-REFERENCED VIDEO RETRIEVAL: TEXT ANNOTATION AND SIMILARITY SEARCH
Ph.DDOCTOR OF PHILOSOPH
Aircraft Landing Time Prediction with Deep Learning on Trajectory Images
Aircraft landing time (ALT) prediction is crucial for air traffic management,
especially for arrival aircraft sequencing on the runway. In this study, a
trajectory image-based deep learning method is proposed to predict ALTs for the
aircraft entering the research airspace that covers the Terminal Maneuvering
Area (TMA). Specifically, the trajectories of all airborne arrival aircraft
within the temporal capture window are used to generate an image with the
target aircraft trajectory labeled as red and all background aircraft
trajectory labeled as blue. The trajectory images contain various information,
including the aircraft position, speed, heading, relative distances, and
arrival traffic flows. It enables us to use state-of-the-art deep convolution
neural networks for ALT modeling. We also use real-time runway usage obtained
from the trajectory data and the external information such as aircraft types
and weather conditions as additional inputs. Moreover, a convolution neural
network (CNN) based module is designed for automatic holding-related
featurizing, which takes the trajectory images, the leading aircraft holding
status, and their time and speed gap at the research airspace boundary as its
inputs. Its output is further fed into the final end-to-end ALT prediction. The
proposed ALT prediction approach is applied to Singapore Changi Airport (ICAO
Code: WSSS) using one-month Automatic Dependent Surveillance-Broadcast (ADS-B)
data from November 1 to November 30, 2022. Experimental results show that by
integrating the holding featurization, we can reduce the mean absolute error
(MAE) from 82.23 seconds to 43.96 seconds, and achieve an average accuracy of
96.1\%, with 79.4\% of the predictions errors being less than 60 seconds.Comment: In 2023 13th SESAR Innovation Days (SIDS2023
SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection
In the field of autonomous driving, accurate and comprehensive perception of
the 3D environment is crucial. Bird's Eye View (BEV) based methods have emerged
as a promising solution for 3D object detection using multi-view images as
input. However, existing 3D object detection methods often ignore the physical
context in the environment, such as sidewalk and vegetation, resulting in
sub-optimal performance. In this paper, we propose a novel approach called
SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection), that
leverages a 3D semantic-occupancy branch to improve the accuracy of 3D object
detection. In particular, the physical context modeled by semantic occupancy
helps the detector to perceive the scenes in a more holistic view. Our SOGDet
is flexible to use and can be seamlessly integrated with most existing
BEV-based methods. To evaluate its effectiveness, we apply this approach to
several state-of-the-art baselines and conduct extensive experiments on the
exclusive nuScenes dataset. Our results show that SOGDet consistently enhance
the performance of three baseline methods in terms of nuScenes Detection Score
(NDS) and mean Average Precision (mAP). This indicates that the combination of
3D object detection and 3D semantic occupancy leads to a more comprehensive
perception of the 3D environment, thereby aiding build more robust autonomous
driving systems. The codes are available at: https://github.com/zhouqiu/SOGDet.Comment: Accepted by AAAI202
A Multimodal Approach to Predict Social Media Popularity
Multiple modalities represent different aspects by which information is
conveyed by a data source. Modern day social media platforms are one of the
primary sources of multimodal data, where users use different modes of
expression by posting textual as well as multimedia content such as images and
videos for sharing information. Multimodal information embedded in such posts
could be useful in predicting their popularity. To the best of our knowledge,
no such multimodal dataset exists for the prediction of social media photos. In
this work, we propose a multimodal dataset consisiting of content, context, and
social information for popularity prediction. Specifically, we augment the
SMPT1 dataset for social media prediction in ACM Multimedia grand challenge
2017 with image content, titles, descriptions, and tags. Next, in this paper,
we propose a multimodal approach which exploits visual features (i.e., content
information), textual features (i.e., contextual information), and social
features (e.g., average views and group counts) to predict popularity of social
media photos in terms of view counts. Experimental results confirm that despite
our multimodal approach uses the half of the training dataset from SMP-T1, it
achieves comparable performance with that of state-of-the-art.Comment: Preprint version for paper accepted in Proceedings of 1st IEEE
International Conference on Multimedia Information Processing and Retrieva
Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery
Street-view imagery provides us with novel experiences to explore different
places remotely. Carefully calibrated street-view images (e.g. Google Street
View) can be used for different downstream tasks, e.g. navigation, map features
extraction. As personal high-quality cameras have become much more affordable
and portable, an enormous amount of crowdsourced street-view images are
uploaded to the internet, but commonly with missing or noisy sensor
information. To prepare this hidden treasure for "ready-to-use" status,
determining missing location information and camera orientation angles are two
equally important tasks. Recent methods have achieved high performance on
geo-localization of street-view images by cross-view matching with a pool of
geo-referenced satellite imagery. However, most of the existing works focus
more on geo-localization than estimating the image orientation. In this work,
we re-state the importance of finding fine-grained orientation for street-view
images, formally define the problem and provide a set of evaluation metrics to
assess the quality of the orientation estimation. We propose two methods to
improve the granularity of the orientation estimation, achieving 82.4% and
72.3% accuracy for images with estimated angle errors below 2 degrees for CVUSA
and CVACT datasets, corresponding to 34.9% and 28.2% absolute improvement
compared to previous works. Integrating fine-grained orientation estimation in
training also improves the performance on geo-localization, giving top 1 recall
95.5%/85.5% and 86.8%/80.4% for orientation known/unknown tests on the two
datasets.Comment: This paper has been accepted by ACM Multimedia 2022. The version
contains additional supplementary material
- …