17 research outputs found
Incorporating Language-Driven Appearance Knowledge Units with Visual Cues in Pedestrian Detection
Large language models (LLMs) have shown their capability in understanding
contextual and semantic information regarding appearance knowledge of
instances. In this paper, we introduce a novel approach to utilize the strength
of an LLM in understanding contextual appearance variations and to leverage its
knowledge into a vision model (here, pedestrian detection). While pedestrian
detection is considered one of crucial tasks directly related with our safety
(e.g., intelligent driving system), it is challenging because of varying
appearances and poses in diverse scenes. Therefore, we propose to formulate
language-driven appearance knowledge units and incorporate them with visual
cues in pedestrian detection. To this end, we establish description corpus
which includes numerous narratives describing various appearances of
pedestrians and others. By feeding them through an LLM, we extract appearance
knowledge sets that contain the representations of appearance variations. After
that, we perform a task-prompting process to obtain appearance knowledge units
which are representative appearance knowledge guided to be relevant to a
downstream pedestrian detection task. Finally, we provide plentiful appearance
information by integrating the language-driven knowledge units with visual
cues. Through comprehensive experiments with various pedestrian detectors, we
verify the effectiveness of our method showing noticeable performance gains and
achieving state-of-the-art detection performance.Comment: 11 pages, 4 figures, 9 table
Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification via Full-Body Person Synthesis
The advent of data-driven technology solutions is accompanied by an
increasing concern with data privacy. This is of particular importance for
human-centered image recognition tasks, such as pedestrian detection,
re-identification, and tracking. To highlight the importance of privacy issues
and motivate future research, we motivate and introduce the Pedestrian Dataset
De-Identification (PDI) task. PDI evaluates the degree of de-identification and
downstream task training performance for a given de-identification method. As a
first baseline, we propose IncogniMOT, a two-stage full-body de-identification
pipeline based on image synthesis via generative adversarial networks. The
first stage replaces target pedestrians with synthetic identities. To improve
downstream task performance, we then apply stage two, which blends and adapts
the synthetic image parts into the data. To demonstrate the effectiveness of
IncogniMOT, we generate a fully de-identified version of the MOT17 pedestrian
tracking dataset and analyze its application as training data for pedestrian
re-identification, detection, and tracking models. Furthermore, we show how our
data is able to narrow the synthetic-to-real performance gap in a
privacy-conscious manner