492 research outputs found
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
Model pre-training is essential in human-centric perception. In this paper,
we first introduce masked image modeling (MIM) as a pre-training approach for
this task. Upon revisiting the MIM training strategy, we reveal that human
structure priors offer significant potential. Motivated by this insight, we
further incorporate an intuitive human structure prior - human parts - into
pre-training. Specifically, we employ this prior to guide the mask sampling
process. Image patches, corresponding to human part regions, have high priority
to be masked out. This encourages the model to concentrate more on body
structure information during pre-training, yielding substantial benefits across
a range of human-centric perception tasks. To further capture human
characteristics, we propose a structure-invariant alignment loss that enforces
different masked views, guided by the human part prior, to be closely aligned
for the same image. We term the entire method as HAP. HAP simply uses a plain
ViT as the encoder yet establishes new state-of-the-art performance on 11
human-centric benchmarks, and on-par result on one dataset. For example, HAP
achieves 78.1% mAP on MSMT17 for person re-identification, 86.54% mA on PA-100K
for pedestrian attribute recognition, 78.2% AP on MS COCO for 2D pose
estimation, and 56.0 PA-MPJPE on 3DPW for 3D pose and shape estimation.Comment: Accepted by NeurIPS 202
Deepfakes for Medical Video De-Identification: Privacy Protection and Diagnostic Information Preservation
Data sharing for medical research has been difficult as open-sourcing
clinical data may violate patient privacy. Traditional methods for face
de-identification wipe out facial information entirely, making it impossible to
analyze facial behavior. Recent advancements on whole-body keypoints detection
also rely on facial input to estimate body keypoints. Both facial and body
keypoints are critical in some medical diagnoses, and keypoints invariability
after de-identification is of great importance. Here, we propose a solution
using deepfake technology, the face swapping technique. While this swapping
method has been criticized for invading privacy and portraiture right, it could
conversely protect privacy in medical video: patients' faces could be swapped
to a proper target face and become unrecognizable. However, it remained an open
question that to what extent the swapping de-identification method could affect
the automatic detection of body keypoints. In this study, we apply deepfake
technology to Parkinson's disease examination videos to de-identify subjects,
and quantitatively show that: face-swapping as a de-identification approach is
reliable, and it keeps the keypoints almost invariant, significantly better
than traditional methods. This study proposes a pipeline for video
de-identification and keypoint preservation, clearing up some ethical
restrictions for medical data sharing. This work could make open-source high
quality medical video datasets more feasible and promote future medical
research that benefits our society.Comment: Accepted for publication at the AAAI/ACM Conference on Artificial
Intelligence, Ethics, and Society (AIES) 202
Using Prior Knowledge for Verification and Elimination of Stationary and Variable Objects in Real-time Images
With the evolving technologies in the autonomous vehicle industry, now it has become possible for automobile passengers to sit relaxed instead of driving the car. Technologies like object detection, object identification, and image segmentation have enabled an autonomous car to identify and detect an object on the road in order to drive safely. While an autonomous car drives by itself on the road, the types of objects surrounding the car can be dynamic (e.g., cars and pedestrians), stationary (e.g., buildings and benches), and variable (e.g., trees) depending on if the location or shape of an object changes or not. Different from the existing image-based approaches to detect and recognize objects in the scene, in this research 3D virtual world is employed to verify and eliminate stationary and variable objects to allow the autonomous car to focus on dynamic objects that may cause danger to its driving. This methodology takes advantage of prior knowledge of stationary and variable objects presented in a virtual city and verifies their existence in a real-time scene by matching keypoints between the virtual and real objects. In case of a stationary or variable object that does not exist in the virtual world due to incomplete pre-existing information, this method uses machine learning for object detection. Verified objects are then removed from the real-time image with a combined algorithm using contour detection and class activation map (CAM), which helps to enhance the efficiency and accuracy when recognizing moving objects
- …