Faculty of Engineering, School of Electrical and Information Engineering
Abstract
This thesis advances visual object tracking by introducing four key enhancements that address current limitations in training data, network architecture, and tracking methodologies. It proposes a refined sampling strategy for Siamese Networks to enrich training data and develops a more efficient Partially Siamese Network through neural architecture search, achieving superior performance on benchmarks. The work further streamlines tracking with a new transformer-based pipeline and breaks ground with a speech-guided tracking framework, improving human-machine collaboration. These advancements are thoroughly validated, marking significant progress in the visual object tracking domain