Search CORE

59 research outputs found

Point-to-Pose Voting based Hand Pose Estimation using Residual Permutation Equivariant Layer

Author: Lee Dongheui
Li Shile
Publication venue
Publication date: 05/12/2018
Field of study

Recently, 3D input data based hand pose estimation methods have shown state-of-the-art performance, because 3D data capture more spatial information than the depth image. Whereas 3D voxel-based methods need a large amount of memory, PointNet based methods need tedious preprocessing steps such as K-nearest neighbour search for each point. In this paper, we present a novel deep learning hand pose estimation method for an unordered point cloud. Our method takes 1024 3D points as input and does not require additional information. We use Permutation Equivariant Layer (PEL) as the basic element, where a residual network version of PEL is proposed for the hand pose estimation task. Furthermore, we propose a voting based scheme to merge information from individual points to the final pose output. In addition to the pose estimation task, the voting-based scheme can also provide point cloud segmentation result without ground-truth for segmentation. We evaluate our method on both NYU dataset and the Hands2017Challenge dataset. Our method outperforms recent state-of-the-art methods, where our pose accuracy is currently the best for the Hands2017Challenge dataset

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Learning Equivariant Representations

Author: Esteves Carlos
Publication venue
Publication date: 01/01/2020
Field of study

State-of-the-art deep learning systems often require large amounts of data and computation. For this reason, leveraging known or unknown structure of the data is paramount. Convolutional neural networks (CNNs) are successful examples of this principle, their defining characteristic being the shift-equivariance. By sliding a filter over the input, when the input shifts, the response shifts by the same amount, exploiting the structure of natural images where semantic content is independent of absolute pixel positions. This property is essential to the success of CNNs in audio, image and video recognition tasks. In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling. We propose equivariant models for different transformations defined by groups of symmetries. The main contributions are (i) polar transformer networks, achieving equivariance to the group of similarities on the plane, (ii) equivariant multi-view networks, achieving equivariance to the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving equivariance to the continuous 3D rotation group, (iv) cross-domain image embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v) spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving equivariance to 3D rotations for spherical vector fields. Applications include image classification, 3D shape classification and retrieval, panoramic image classification and segmentation, shape alignment and pose estimation. What these models have in common is that they leverage symmetries in the data to reduce sample and model complexity and improve generalization performance. The advantages are more significant on (but not limited to) challenging tasks where data is limited or input perturbations such as arbitrary rotations are present

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

Author: Habermann M.
Habibie I.
Theobalt C.
Xu F.
Xu W.
Zhou Y.
Publication venue
Publication date: 01/01/2020
Field of study

We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data: image data with either 2D or 3D annotations, as well as stand-alone 3D animations without corresponding image data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass. This output makes the method more directly usable for applications in computer vision and graphics compared to only regressing 3D joint positions. We demonstrate that our architectural design leads to a significant quantitative and qualitative improvement over the state of the art on several challenging benchmarks. Our model is publicly available for future research

MPG.PuRe

Odhad polohy ruky v RGBD obrazech

Author: Šimoník Marek
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2020
Field of study

It is surprising that even with increasing ubiquitousness of Augmented Reality applications on mobile devices, users nowadays still interact with said applications via on-screen controls, rather than controlling presented environment directly with their hands in real-world. To enable this degree of interactivity, a stable and robust hand pose estimation pipeline is needed. This thesis therefore serves as a study of a possible approach to hand pose estimation that consists of two parts; segmentation and estimation of hand model parameters.Je překvapivé, že i s přibývajícím rozšířením mobilních aplikací využívající virtuální reality, uživatélé stále interagují s těmito aplikacemi pomocí dotyků obrazovky namísto ovládání virtuálního obsahu přímo pomocí rukou v prostoru. K vytvoření této úrovně interaktivity je zapotřebí stabilní a robustní algoritmus pro odhadování polohy ruky. Tato diplomová práce proto slouží jako studie možných přístupů k problému odhadu polohy ruky a sestává ze dvou částí; ze segmentace a z odhadu parametrů modelu ruky.460 - Katedra informatikyvýborn

DSpace at VSB Technical University of Ostrava

Hand Pose-based Task Learning from Visual Observations with Semantic Skill Extraction

Author: Eiband Thomas
Lee Dongheui
Li Shile
Qiu Zeju
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/08/2020
Field of study

Learning from Demonstrations is a promising technique to transfer task knowledge from a user to a robot. We propose a framework for task programming by observing the human hand pose and object locations solely with a depth camera. By extracting skills from the demonstrations, we are able to represent what the robot has learned, generalize to unseen object locations and optimize the robotic execution instead of replaying a non-optimal behavior. A two-staged segmentation algorithm that employs skill template matching via Hidden Markov Models has been developed to extract motion primitives from the demonstration and gives them semantic meanings. In this way, the transfer of task knowledge has been improved from a simple replay of the demonstration towards a semantically annotated, optimized and generalized execution. We evaluated the extraction of a set of skills in simulation and prove that the task execution can be optimized by such means

Institute of Transport Research:Publications

Crossref

Robustness, scalability and interpretability of equivariant neural networks across different low-dimensional geometries

Author: Mitton Joshua
Publication venue
Publication date: 01/01/2023
Field of study

In this thesis we develop neural networks that exploit the symmetries of four different low-dimensional geometries, namely 1D grids, 2D grids, 3D continuous spaces and graphs, through the consideration of translational, rotational, cylindrical and permutation symmetries. We apply these models to applications across a range of scientific disciplines demonstrating the predictive ability, robustness, scalability, and interpretability. We develop a neural network that exploits the translational symmetries on 1D grids to predict age and species of mosquitoes from high-dimensional mid-infrared spectra. We show that the model can learn to predict mosquito age and species with a higher accuracy than models that do not utilise any inductive bias. We also demonstrate that the model is sensitive to regions within the input spectra that are in agreement with regions identified by a domain expert. We present a transfer learning approach to overcome the challenge of working with small, real-world, wild collected data sets and demonstrate the benefit of the approach on a real-world application. We demonstrate the benefit of rotation equivariant neural networks on the task of segmenting deforestation regions from satellite images through exploiting the rotational symmetry present on 2D grids. We develop a novel physics-informed architecture, exploiting the cylindrical symmetries of the group SO+ (2, 1), which can invert the transmission effects of multi-mode optical fibres (MMFs). We develop a new connection between a physics understanding of MMFs and group equivariant neural networks. We show that this novel architecture requires fewer training samples to learn, better generalises to out-of-distribution data sets, scales to higher-resolution images, is more interpretable, and reduces the parameter count of the model. We demonstrate the capability of the model on real-world data and provide an adaption to the model to handle real-world deviations from theory. We also show that the model can scale to higher resolution images than was previously possible. We develop a novel architecture which provides a symmetry-preserving mapping between two different low-dimensional geometries and demonstrate its practical benefit for the application of 3D hand mesh generation from 2D images. This models exploits both the 2D rotational symmetries present in a 2D image and in a 3D hand mesh, and provides a mapping between the two data domains. We demonstrate that the model performs competitively on a range of benchmark data sets and justify the choice of inductive bias in the model. We develop an architecture which is equivariant to a novel choice of automorphism group through the use of a sub-graph selection policy. We demonstrate the benefit of the architecture, theoretically through proving the improved expressivity and improved scalability, and experimentally on a range of widely studied benchmark graph classification tasks. We present a method of comparison between models that had not been previously considered in this area of research, demonstrating recent SOTA methods are statistically indistinguishable

Glasgow Theses Service

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Author: Bhattarai Binod
Gupta Naman
Joshi Abhinav
Modi Ashutosh
Shah Jinang
Stoyanov Danail
Publication venue
Publication date: 07/11/2022
Field of study

A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. MRL involves learning reliable and robust representations of information from heterogeneous sources and fusing them. However, in practice, the data acquired from different sources are typically noisy. In some extreme cases, a noise of large magnitude can completely alter the semantics of the data leading to inconsistencies in the parallel multimodal data. In this paper, we propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality, and subsequently, the contribution from each modality is dynamically varied while estimating the joint distribution. We evaluate our method on two challenging benchmarks from two diverse domains: multimodal 3D hand-pose estimation and multimodal surgical video segmentation. We attain state-of-the-art performance on both benchmarks. Our extensive quantitative and qualitative evaluations show the advantages of our method compared to previous approaches.Comment: 11 Pages, Accepted at ICMI 2022 Ora

arXiv.org e-Print Archive

UCL Discovery