1,021 research outputs found
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Most of the existing deep learning-based methods for 3D hand and human pose
estimation from a single depth map are based on a common framework that takes a
2D depth map and directly regresses the 3D coordinates of keypoints, such as
hand or human body joints, via 2D convolutional neural networks (CNNs). The
first weakness of this approach is the presence of perspective distortion in
the 2D depth map. While the depth map is intrinsically 3D data, many previous
methods treat depth maps as 2D images that can distort the shape of the actual
object through projection from 3D to 2D space. This compels the network to
perform perspective distortion-invariant estimation. The second weakness of the
conventional approach is that directly regressing 3D coordinates from a 2D
image is a highly non-linear mapping, which causes difficulty in the learning
procedure. To overcome these weaknesses, we firstly cast the 3D hand and human
pose estimation problem from a single depth map into a voxel-to-voxel
prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood
for each keypoint. We design our model as a 3D CNN that provides accurate
estimates while running in real-time. Our system outperforms previous methods
in almost all publicly available 3D hand and human pose estimation datasets and
placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV
2017), Published at CVPR 201
Masked Discrimination for Self-Supervised Learning on Point Clouds
Masked autoencoding has achieved great success for self-supervised learning
in the image and language domains. However, mask based pretraining has yet to
show benefits for point cloud understanding, likely due to standard backbones
like PointNet being unable to properly handle the training versus testing
distribution mismatch introduced by masking during training. In this paper, we
bridge this gap by proposing a discriminative mask pretraining Transformer
framework, MaskPoint}, for point clouds. Our key idea is to represent the point
cloud as discrete occupancy values (1 if part of the point cloud; 0 if not),
and perform simple binary classification between masked object points and
sampled noise points as the proxy task. In this way, our approach is robust to
the point sampling variance in point clouds, and facilitates learning rich
representations. We evaluate our pretrained models across several downstream
tasks, including 3D shape classification, segmentation, and real-word object
detection, and demonstrate state-of-the-art results while achieving a
significant pretraining speedup (e.g., 4.1x on ScanNet) compared to the prior
state-of-the-art Transformer baseline. Code is available at
https://github.com/haotian-liu/MaskPoint.Comment: ECCV 2022; Code: https://github.com/haotian-liu/MaskPoin
Superconducting transition of a two-dimensional Josephson junction array in weak magnetic fields
The superconducting transition of a two-dimensional (2D) Josephson junction
array exposed to weak magnetic fields has been studied experimentally.
Resistance measurements reveal a superconducting-resistive phase boundary in
serious disagreement with the theoretical and numerical expectations. Critical
scaling analyses of the characteristics indicate contrary to the
expectations that the superconducting-to-resistive transition in weak magnetic
fields is associated with a melting transition of magnetic-field-induced
vortices directly from a pinned-solid phase to a liquid phase. The expected
depinning transition of vortices from a pinned-solid phase to an intermediate
floating-solid phase was not observed. We discuss effects of the
disorder-induced random pinning potential on phase transitions of vortices in a
2D Josephson junction array.Comment: 9 pages, 7 figures (EPS+JPG format), RevTeX
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Recently, large language models (LLMs) have made significant advancements in
natural language understanding and generation. However, their potential in
computer vision remains largely unexplored. In this paper, we introduce a new,
exploratory approach that enables LLMs to process images using the Scalable
Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions
of SVG representations instead of raster images, we aim to bridge the gap
between the visual and textual modalities, allowing LLMs to directly understand
and manipulate images without the need for parameterized visual components. Our
method facilitates simple image classification, generation, and in-context
learning using only LLM capabilities. We demonstrate the promise of our
approach across discriminative and generative tasks, highlighting its (i)
robustness against distribution shift, (ii) substantial improvements achieved
by tapping into the in-context learning abilities of LLMs, and (iii) image
understanding and generation capabilities with human guidance. Our code, data,
and models can be found here https://github.com/mu-cai/svg-llm
- …