This thesis studies on-line multiple object tracking (MOT) problem which has been developed in numerous real-world applications, such as emerging self-driving car agents or estimating a target's trajectory over time to identify its movement pattern. The challenges that an on-line MOT tracker always faces are: (1) being able to consistently and smoothly track the same target over time with the presence of occlusions, (2) being able to recover from fragmented tracks, (3) handling identity switches of the same target, and (4) being able to operate in real-time. This work aims to provide an efficient detect-and-track framework to address these challenges. To narrow down the classes of objects to be studied, but without losing the tracker's extendibility to a generic object, we pick \textit{pedestrians} as the primary objects of interest.
The proposed framework consists of four building blocks, i.e. object detection, object tracking, data association, and object re-identification. While most of the MOT frameworks make the assumption of the availability of the detector in every frame, the proposed MOT tracker operates with the detector being triggered only periodically, e.g. in every three frames, leading to improved efficiency. As for each building block, the detection is performed by Single Shot Detector (SSD), which has proven efficiency and efficacy on generic object classes. When the detector is triggered and active tracks exist, data association module identifies the correspondence of the objects detected by the detector and tracked by the tracker. In cases where newly detected objects cannot be identified as any of current tracks, the re-identification module then attempts to find the correspondence for them in the history track.
The experiments show that the proposed framework is outperformed by the recently published on-line MOT trackers which are based on different object detectors. However, the results suggest that the proposed framework's performance does not degrade when the detector is partially unavailable and improves in certain conditions due to better temporal consistency. Based on these experiments, we are able to identify major shortcomings of the current framework, providing possible ways to improve it and directions for the future work