Search CORE

79 research outputs found

Transformer Network for Multi-Person Tracking and Re-Identification in Unconstrained Environment

Author: Khan Muhammad Usman Ghani
Mukhtar Hamza
Publication venue
Publication date: 19/12/2023
Field of study

Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics. Despite considerable advancements, existing MOT methodologies tend to falter when faced with non-uniform movements, occlusions, and appearance-reappearance scenarios of the objects. Recognizing this inadequacy, we put forward an integrated MOT method that not only marries object detection and identity linkage within a singular, end-to-end trainable framework but also equips the model with the ability to maintain object identity links over long periods of time. Our proposed model, named STMMOT, is built around four key modules: 1) candidate proposal generation, which generates object proposals via a vision-transformer encoder-decoder architecture that detects the object from each frame in the video; 2) scale variant pyramid, a progressive pyramid structure to learn the self-scale and cross-scale similarities in multi-scale feature maps; 3) spatio-temporal memory encoder, extracting the essential information from the memory associated with each object under tracking; and 4) spatio-temporal memory decoder, simultaneously resolving the tasks of object detection and identity association for MOT. Our system leverages a robust spatio-temporal memory module that retains extensive historical observations and effectively encodes them using an attention-based aggregator. The uniqueness of STMMOT lies in representing objects as dynamic query embeddings that are updated continuously, which enables the prediction of object states with attention mechanisms and eradicates the need for post-processing

arXiv.org e-Print Archive

Dynamic Switching State Systems for Visual Tracking

Author: Becker Stefan
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2020
Field of study

This work addresses the problem of how to capture the dynamics of maneuvering objects for visual tracking. Towards this end, the perspective of recursive Bayesian filters and the perspective of deep learning approaches for state estimation are considered and their functional viewpoints are brought together

KITopen

Directory of Open Access Books (DOAB)

Dynamic Switching State Systems for Visual Tracking

Author: Becker Stefan
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen

Directory of Open Access Books (DOAB)

Object Tracking Based on Satellite Videos: A Literature Review

Author: Song J.
Wang C.
Xu Y.
Zhang Z.
Publication venue: 'MDPI AG'
Publication date: 01/07/2022
Field of study

Video satellites have recently become an attractive method of Earth observation, providing consecutive images of the Earth’s surface for continuous monitoring of specific events. The development of on-board optical and communication systems has enabled the various applications of satellite image sequences. However, satellite video-based target tracking is a challenging research topic in remote sensing due to its relatively low spatial and temporal resolution. Thus, this survey systematically investigates current satellite video-based tracking approaches and benchmark datasets, focusing on five typical tracking applications: traffic target tracking, ship tracking, typhoon tracking, fire tracking, and ice motion tracking. The essential aspects of each tracking target are summarized, such as the tracking architecture, the fundamental characteristics, primary motivations, and contributions. Furthermore, popular visual tracking benchmarks and their respective properties are discussed. Finally, a revised multi-level dataset based on WPAFB videos is generated and quantitatively evaluated for future development in the satellite video-based tracking area. In addition, 54.3% of the tracklets with lower Difficulty Score (DS) are selected and renamed as the Easy group, while 27.2% and 18.5% of the tracklets are grouped into the Medium-DS group and the Hard-DS group, respectively

Multidisciplinary Digital Publishing Institute

City Research Online

Directory of Open Access Journals

Multi-Modal Recognition of Manipulation Activities through Visual Accelerometer Tracking, Relational Histograms, and User-Adaptation

Author: Stein Sebastian
Publication venue
Publication date: 01/01/2014
Field of study

University of Dundee Online Publications

Audio-visual multi-modality driven hybrid feature learning model for crowd analysis and classification

Author: G. Shivakumar
H. Y. Swathi
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/05/2023
Field of study

The high pace emergence in advanced software systems, low-cost hardware and decentralized cloud computing technologies have broadened the horizon for vision-based surveillance, monitoring and control. However, complex and inferior feature learning over visual artefacts or video streams, especially under extreme conditions confine majority of the at-hand vision-based crowd analysis and classification systems. Retrieving event-sensitive or crowd-type sensitive spatio-temporal features for the different crowd types under extreme conditions is a highly complex task. Consequently, it results in lower accuracy and hence low reliability that confines existing methods for real-time crowd analysis. Despite numerous efforts in vision-based approaches, the lack of acoustic cues often creates ambiguity in crowd classification. On the other hand, the strategic amalgamation of audio-visual features can enable accurate and reliable crowd analysis and classification. Considering it as motivation, in this research a novel audio-visual multi-modality driven hybrid feature learning model is developed for crowd analysis and classification. In this work, a hybrid feature extraction model was applied to extract deep spatio-temporal features by using Gray-Level Co-occurrence Metrics (GLCM) and AlexNet transferrable learning model. Once extracting the different GLCM features and AlexNet deep features, horizontal concatenation was done to fuse the different feature sets. Similarly, for acoustic feature extraction, the audio samples (from the input video) were processed for static (fixed size) sampling, pre-emphasis, block framing and Hann windowing, followed by acoustic feature extraction like GTCC, GTCC-Delta, GTCC-Delta-Delta, MFCC, Spectral Entropy, Spectral Flux, Spectral Slope and Harmonics to Noise Ratio (HNR). Finally, the extracted audio-visual features were fused to yield a composite multi-modal feature set, which is processed for classification using the random forest ensemble classifier. The multi-class classification yields a crowd-classification accurac12529y of (98.26%), precision (98.89%), sensitivity (94.82%), specificity (95.57%), and F-Measure of 98.84%. The robustness of the proposed multi-modality-based crowd analysis model confirms its suitability towards real-world crowd detection and classification tasks

Directory of Open Access Journals

Enhancing Sensor Performance with Statistical Data Analytics

Author: Wright James
Publication venue
Publication date
Field of study

This thesis examines the use of Automatic Identification System (AIS) information to generate a picture of maritime activity. It derives suitable methods to produce tracks of vessel movements, both in littoral and open-ocean scenarios, removing ambiguities and highlighting doppelg�anger. The thesis then goes on to describe techniques to improve our understanding of maritime activities through the extraction of individual vessel behaviours and the generation of models describing normal behaviours to highlight abnormalities

University of Liverpool Repository

Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis

Author: Cao Ziang
Fu Changhong
Li Bowen
Lu Geng
Lu Kunhan
Ye Junjie
Zheng Guangze
Publication venue
Publication date: 03/08/2022
Field of study

Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide range of applications and attracted increasing attention in the field of intelligent transportation systems because of its versatility and effectiveness. As an emerging force in the revolutionary trend of deep learning, Siamese networks shine in UAV-based object tracking with their promising balance of accuracy, robustness, and speed. Thanks to the development of embedded processors and the gradual optimization of deep neural networks, Siamese trackers receive extensive research and realize preliminary combinations with UAVs. However, due to the UAV's limited onboard computational resources and the complex real-world circumstances, aerial tracking with Siamese networks still faces severe obstacles in many aspects. To further explore the deployment of Siamese networks in UAV-based tracking, this work presents a comprehensive review of leading-edge Siamese trackers, along with an exhaustive UAV-specific analysis based on the evaluation using a typical UAV onboard processor. Then, the onboard tests are conducted to validate the feasibility and efficacy of representative Siamese trackers in real-world UAV deployment. Furthermore, to better promote the development of the tracking community, this work analyzes the limitations of existing Siamese trackers and conducts additional experiments represented by low-illumination evaluations. In the end, prospects for the development of Siamese tracking for UAV-based intelligent transportation systems are deeply discussed. The unified framework of leading-edge Siamese trackers, i.e., code library, and the results of their experimental evaluations are available at https://github.com/vision4robotics/SiameseTracking4UAV

arXiv.org e-Print Archive

Robuste Detektion, Verfolgung und Wiedererkennung von Personen in Videodaten mit niedriger Auflösung

Author: Metzler Jürgen
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2020
Field of study

Mit der zunehmenden Menge an Bilddaten im Videoüberwachungssektor wächst die Chance, Straftaten besser aufklären zu können. Allerdings ist dafür ein immenser Aufwand für die Auswertung der Bilder erforderlich, die oft nicht mehr vollständig ohne Computerunterstützung durch Personen gesichtet werden können. Diese Arbeit umfasst Methoden und Verbesserungen auf Basis neuartiger Personenrepräsentationen für die Detektion, Verfolgung und erscheinungsbasierte Wiedererkennung von Personen

KITopen