Search CORE

6 research outputs found

TVPR: Text-to-Video Person Retrieval and a New Benchmark

Author: Dong Guan-Nan
Liu Hui
Ni Fan
Wu Jianhui
Zhang Xu
Zhang Yue
Zhu Aichun
Publication venue
Publication date: 14/07/2023
Field of study

Most existing methods for text-based person retrieval focus on text-to-image person retrieval. Nevertheless, due to the lack of dynamic information provided by isolated frames, the performance is hampered when the person is obscured in isolated frames or variable motion details are given in the textual description. In this paper, we propose a new task called Text-to-Video Person Retrieval(TVPR) which aims to effectively overcome the limitations of isolated frames. Since there is no dataset or benchmark that describes person videos with natural language, we construct a large-scale cross-modal person video dataset containing detailed natural language annotations, such as person's appearance, actions and interactions with environment, etc., termed as Text-to-Video Person Re-identification (TVPReid) dataset, which will be publicly available. To this end, a Text-to-Video Person Retrieval Network (TVPRN) is proposed. Specifically, TVPRN acquires video representations by fusing visual and motion representations of person videos, which can deal with temporal occlusion and the absence of variable motion details in isolated frames. Meanwhile, we employ the pre-trained BERT to obtain caption representations and the relationship between caption and video representations to reveal the most relevant person videos. To evaluate the effectiveness of the proposed TVPRN, extensive experiments have been conducted on TVPReid dataset. To the best of our knowledge, TVPRN is the first successful attempt to use video for text-based person retrieval task and has achieved state-of-the-art performance on TVPReid dataset. The TVPReid dataset will be publicly available to benefit future research

arXiv.org e-Print Archive

Learning Person Re-identification Models from Videos with Weak Supervision

Author: Fellow
IEEE
Liu Min
Paul Sujoy
Raychaudhuri Dripta S.
Roy-Chowdhury Amit K.
Wang Xueping
Wang Yaonan
Publication venue
Publication date: 21/07/2020
Field of study

Most person re-identification methods, being supervised techniques, suffer from the burden of massive annotation requirement. Unsupervised methods overcome this need for labeled data, but perform poorly compared to the supervised alternatives. In order to cope with this issue, we introduce the problem of learning person re-identification models from videos with weak supervision. The weak nature of the supervision arises from the requirement of video-level labels, i.e. person identities who appear in the video, in contrast to the more precise framelevel annotations. Towards this goal, we propose a multiple instance attention learning framework for person re-identification using such video-level labels. Specifically, we first cast the video person re-identification task into a multiple instance learning setting, in which person images in a video are collected into a bag. The relations between videos with similar labels can be utilized to identify persons, on top of that, we introduce a co-person attention mechanism which mines the similarity correlations between videos with person identities in common. The attention weights are obtained based on all person images instead of person tracklets in a video, making our learned model less affected by noisy annotations. Extensive experiments demonstrate the superiority of the proposed method over the related methods on two weakly labeled person re-identification datasets

arXiv.org e-Print Archive

eScholarship - University of California

광역 다중 보행자 추적을 위한 계층적 궤적 매칭 기법

Author: 김기경
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 최진영.The purpose of wide-area tracking problem is to track pedestrians that appear on cameras that overlap or do not overlap, regardless of the time interval or person density. In a single camera tracking, data association using overlapping of the detection boxes is used to solve the tracking problem, but still has appearance ambiguity issues. However, wide-area tracking requires a tracking scheme that focuses on the appearance similarity of humans, without the use of overlapping of detection boxes. In this dissertation, we propose the tracking scheme for the Wide-area Multi-Pedestrian Tracking (WaMuPeT). To achieve the WaMuPeT, we propose the trajectory matching in overlapping camera settings (Ch. 3), non-overlapping camera settings (Ch. 4) and robust trajectory matching in dense scene settings (Ch. 5). In trajectory matching in overlapping camera settings (Ch. 3), we propose a novel deep-learning architecture for accurate 3-D localization and tracking of a pedestrian using multiple cameras. The deep-learning network is composed of two networks: detection network and localization network. The detection network yields the pedestrian detections and the localization network estimates the ground position of a pedestrian within its detection box. In addition, an attentional pass filter is introduced to effectively connect the two networks. Using the detection proposals and their 2-D grounding positions obtained from the two networks, multi-camera multi-target 3-D localization and tracking algorithm is developed through min-cost network flow approach. In the experiments, it is shown that the proposed method improves the performance of 3-D localization and tracking. In trajectory matching in non-overlapping camera settings (Ch. 4), we propose a novel re-ranking method using a ranking-reflected metric to measure the similarity between two ordered sets of

K

-nearest neighbors (OKNN). The proposed metric for ranking-reflected similarity (RSS) reflects the ranking of the shared elements between the two OKNNs. Using RSS, a re-ranking procedure is proposed that prioritizes galleries having neighbors similar to a probe's neighbor in the perspective of ranking order. In the experiment, we show that the proposed method improves the Re-ID accuracy by add-on to the state-of-the-art methods. In robust trajectory matching in dense scene settings (Ch. 5), we propose a novel framework for multi-pedestrian tracking to generate robust trajectories in dense scene. In the proposed tracking method, we propose the tracking method based on the trajectory matching by the strategy of divide and conquer method. In this strategy, short-term, mid-term and long-term trajectories are generated by each trajectory merging stages, respectively. Also we propose a novel deep-feature matching method called stable boundary selection (SBS). In SBS matching, the detections are clustered by the group similarity of deep features, so that robust trajectories can be generated. With the smoothing algorithms and the detection restoration algorithm, the proposed tracking method shows the state-of-the-art tracking accuracy in three public tracking dataset.광역 추적 문제의 목적은 시간 간격이나 사람 밀도에 관계없이 겹치거나 겹치지 않는 카메라에 나타나는 보행자를 추적하는 것이다. 단일 카메라 추적에서 감지 상자의 겹침을 사용하는 데이터 연결은 추적 문제를 해결하는 데 사용되지만 여전히 모양 모호성 문제가 있다. 그러나 광역 추적에는 감지 상자의 겹침을 사용하지 않고 사람의 외형 유사성에 중점을 둔 추적 체계가 필요하다. 이 논문에서는 광역 다중 보행자 추적 (WaMuPeT)에 대한 추적 체계를 제안한다. WaMuPeT를 달성하기 위해 겹치는 카메라 설정 (3 장), 겹치지 않는 카메라 설정 (4 장) 에서의 궤적 일치 그리고 빽빽한 장면 설정 (5 장)에서 강인한 궤적 일치에 대해서 제안한다. 겹치는 카메라 설정에서의 궤적 매칭 (3 장)에서는 여러 카메라를 사용하여 보행자를 정확하게 3D 지역화하고 추적하기위한 새로운 딥 러닝 아키텍처를 제안한다. 딥 러닝 네트워크는 감지 네트워크와 로컬라이제이션 네트워크의 두 가지 네트워크로 구성된다. 탐지 네트워크는 보행자 탐지를 제공하고 현지화 네트워크는 탐지 상자 내에서 보행자의 지상 위치를 추정한다. 또한 두 개의 네트워크를 효과적으로 연결하기 위해주의 패스 필터가 도입되었다. 두 네트워크에서 얻은 탐지 제안 및 2D 접지 위치를 사용하여 최소 비용의 네트워크 흐름 접근 방식을 통해 다중 카메라 다중 대상 3D 지역화 및 추적 알고리즘이 개발된다. 실험에서 제안 된 방법이 3D 지역화 및 추적 성능을 향상시키는 것으로 나타났다. 겹치지 않는 카메라 설정에서의 궤적 일치 (4 장)에서, 우리는 순위가 반영된 메트릭을 사용하여 두개의 순서가 지정된

K

-최근 접 이웃 (OKNN) 세트 사이의 유사성을 측정한다. 순위 반영 유사성 (RSS)에 대해 제안 된 메트릭은 두 OKNN 사이의 공유 요소의 순위를 반영합니다. RSS를 사용하여, 순위 순서의 관점에서 프로브의 이웃과 유사한 이웃을 갖는 갤러리를 우선 순위 화하는 재순위 절차가 제안된다. 실험에서 제안 된 방법이 최신 방법에 추가되어 Re-ID 정확도가 향상됨을 보여준다. 고밀도 장면 설정에서 강력한 궤적 일치 (5 장)에서, 우리는 고밀도 장면에서 강력한 궤적을 생성하기 위해 다중 보행자 추적을 위한 새로운 프레임 워크를 제안한다. 제안된 추적 방법에서는 분할 및 정복 방법 전략에 따른 궤적 매칭을 기반으로 추적 방법을 제안한다. 이 전략에서, 단기, 중기 및 장기 궤적은 각각의 궤적 병합 단계에 의해 생성된다. 또한 SBS (Stable Boundary Selection)라는 새로운 기능 매칭 기법을 제안한다. SBS 매칭에서, 탐지는 깊은 특징의 그룹 유사성에 의해 군집화되어, 강력한 궤적이 생성 될 수 있다. 제안 된 추적 방법은 평활 알고리즘과 탐지 복원 알고리즘을 통해 3 개의 공개 추적 데이터 세트에서 최첨단 추적 정확도를 보여준다.Chapter 1 Introduction 1 1.1 Background 1 1.2 Related Works 4 1.2.1 Localization of Pedestrian Detection 4 1.2.2 Pedestrian Feature from Person Re-identification 5 1.2.3 Multi-Pedestrian Tracking 8 1.3 Contributions 8 1.4 Thesis Organization 10 Chapter 2 Problem Statements 11 2.1 Trajectory Matching in Overlapping Camera Settings 11 2.1.1 Challenges 11 2.1.2 Approach for the challenges 13 2.2 Trajectory Matching in Non-Overlapping Camera Settings 13 2.2.1 Challenges 13 2.2.2 Approach for the challenges 14 2.3 Robust Trajectory Matching in Dense Scene Settings 16 2.3.1 Challenges 16 2.3.2 Approach for the challenges 18 Chapter 3 Trajectory Matching in Overlapping Camera Settings 19 3.1 Overall Scheme 19 3.2 Network Design 20 3.3 MCMTT with Proposed Network 22 Chapter 4 Trajectory Matching in Non-overlapping Camera Settings 25 4.1 Overall Scheme 25 4.2 Proposed Method 30 4.2.1 Proposed Similarity Metric 30 4.2.2 Selection of A 31 4.2.3 Re-ranking Procedure 32 Chapter 5 Robust Trajectory Matching in Dense Scene Settings 35 5.1 Overall Scheme 35 5.2 Similarity Matrix Generation 39 5.3 Stable Boundary Selection 40 5.4 Trajectory Smoothing 42 5.5 Detection Restoration 46 5.6 Trajectory Merging Process 48 Chapter 6 Experiments 51 6.1 Dataset and Evaluation Metric 51 6.1.1 Trajectory Matching in Overlapping Camera Settings 51 6.1.2 Trajectory Matching in Non-overlapping Camera Settings 52 6.1.3 Robust Trajectory Matching in Dense Scene Settings 53 6.2 Results and Discussion 56 6.2.1 Trajectory Matching in Overlapping Camera Settings 56 6.2.2 Trajectory Matching in Non-overlapping Camera Settings 56 6.2.3 Robust Trajectory Matching in Dense Scene Settings 62 Chapter 7 Conclusions and Future Works 81 7.1 Concluding Remarks 81 7.2 Future Works 83 Abstract 97Docto

SNU Open Repository and Archive

Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification

Author: Guangyi Chen
Jie Zhou
Jiwen Lu
Ming Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref