Search CORE

1,021 research outputs found

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Author: Chang Ju Yong
Lee Kyoung Mu
Moon Gyeongsik
Publication venue
Publication date: 16/08/2018
Field of study

Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge. The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV 2017), Published at CVPR 201

arXiv.org e-Print Archive

Crossref

Masked Discrimination for Self-Supervised Learning on Point Clouds

Author: Cai Mu
Lee Yong Jae
Liu Haotian
Publication venue
Publication date: 01/08/2022
Field of study

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains. However, mask based pretraining has yet to show benefits for point cloud understanding, likely due to standard backbones like PointNet being unable to properly handle the training versus testing distribution mismatch introduced by masking during training. In this paper, we bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint}, for point clouds. Our key idea is to represent the point cloud as discrete occupancy values (1 if part of the point cloud; 0 if not), and perform simple binary classification between masked object points and sampled noise points as the proxy task. In this way, our approach is robust to the point sampling variance in point clouds, and facilitates learning rich representations. We evaluate our pretrained models across several downstream tasks, including 3D shape classification, segmentation, and real-word object detection, and demonstrate state-of-the-art results while achieving a significant pretraining speedup (e.g., 4.1x on ScanNet) compared to the prior state-of-the-art Transformer baseline. Code is available at https://github.com/haotian-liu/MaskPoint.Comment: ECCV 2022; Code: https://github.com/haotian-liu/MaskPoin

arXiv.org e-Print Archive

Superconducting transition of a two-dimensional Josephson junction array in weak magnetic fields

Author: A. D. Zaikin
A. D. Zaikin
In-Cheol Baek
Jeong-Il Lee
Mu-Yong Choi
Young-Je Yun
Publication venue: 'American Physical Society (APS)'
Publication date: 11/06/2005
Field of study

The superconducting transition of a two-dimensional (2D) Josephson junction array exposed to weak magnetic fields has been studied experimentally. Resistance measurements reveal a superconducting-resistive phase boundary in serious disagreement with the theoretical and numerical expectations. Critical scaling analyses of the

IV

characteristics indicate contrary to the expectations that the superconducting-to-resistive transition in weak magnetic fields is associated with a melting transition of magnetic-field-induced vortices directly from a pinned-solid phase to a liquid phase. The expected depinning transition of vortices from a pinned-solid phase to an intermediate floating-solid phase was not observed. We discuss effects of the disorder-induced random pinning potential on phase transitions of vortices in a 2D Josephson junction array.Comment: 9 pages, 7 figures (EPS+JPG format), RevTeX

arXiv.org e-Print Archive

Crossref

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

Author: Cai Mu
Huang Zeyi
Lee Yong Jae
Li Yuheng
Wang Haohan
Publication venue
Publication date: 09/06/2023
Field of study

Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components. Our method facilitates simple image classification, generation, and in-context learning using only LLM capabilities. We demonstrate the promise of our approach across discriminative and generative tasks, highlighting its (i) robustness against distribution shift, (ii) substantial improvements achieved by tapping into the in-context learning abilities of LLMs, and (iii) image understanding and generation capabilities with human guidance. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm

arXiv.org e-Print Archive