1,570 research outputs found
A simple technique for improving multi-class classification with neural networks
We present a novel method to perform multi-class pattern classification with
neural networks and test it on a challenging 3D hand gesture recognition
problem. Our method consists of a standard one-against-all (OAA)
classification, followed by another network layer classifying the resulting
class scores, possibly augmented by the original raw input vector. This allows
the network to disambiguate hard-to-separate classes as the distribution of
class scores carries considerable information as well, and is in fact often
used for assessing the confidence of a decision. We show that by this approach
we are able to significantly boost our results, overall as well as for
particular difficult cases, on the hard 10-class gesture classification task.Comment: European Symposium on artificial neural networks (ESANN), Jun 2015,
Bruges, Belgiu
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
Learning-based methods to solve dense 3D vision problems typically train on
3D sensor data. The respectively used principle of measuring distances provides
advantages and drawbacks. These are typically not compared nor discussed in the
literature due to a lack of multi-modal datasets. Texture-less regions are
problematic for structure from motion and stereo, reflective material poses
issues for active sensing, and distances for translucent objects are intricate
to measure with existing hardware. Training on inaccurate or corrupt data
induces model bias and hampers generalisation capabilities. These effects
remain unnoticed if the sensor measurement is considered as ground truth during
the evaluation. This paper investigates the effect of sensor errors for the
dense 3D vision tasks of depth estimation and reconstruction. We rigorously
show the significant impact of sensor characteristics on the learned
predictions and notice generalisation issues arising from various technologies
in everyday household environments. For evaluation, we introduce a carefully
designed dataset\footnote{dataset available at
https://github.com/Junggy/HAMMER-dataset} comprising measurements from
commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular
RGB+P. Our study quantifies the considerable sensor noise impact and paves the
way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note:
substantial text overlap with arXiv:2205.0456
Review of constraints on vision-based gesture recognition for human–computer interaction
The ability of computers to recognise hand gestures visually is essential for progress in human-computer interaction. Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and spatio-temporal variations but also because of the complex non-rigid properties of the hand. This study surveys major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition. Current challenges are explored in detail
Optimization of a Simultaneous Localization and Mapping (SLAM) System for an Autonomous Vehicle Using a 2-Dimensional Light Detection and Ranging Sensor (LiDAR) by Sensor Fusion
Fully autonomous vehicles must accurately estimate the extent of their environment as well as their relative location in their environment. A popular approach to organizing such information is creating a map of a given physical environment and defining a point in this map representing the vehicle’s location. Simultaneous Mapping and Localization (SLAM) is a computing algorithm that takes inputs from a Light Detection and Ranging (LiDAR) sensor to construct a map of the vehicle’s physical environment and determine its respective location in this map based on feature recognition simultaneously. Two fundamental requirements allow an accurate SLAM method: one being accurate distance measurements and the second being an accurate assessment of location. Researched are methods in which a 2D LiDAR sensor system with laser range finders, ultrasonic sensors and stereo camera vision is optimized for distance measurement accuracy, particularly a method using recurrent neural networks. Sensor fusion techniques with infrared, camera and ultrasonic sensors are implemented to investigate their effects on distance measurement accuracy. It was found that the use of a recurrent neural network for fusing data from a 2D LiDAR with laser range finders and ultrasonic sensors outperforms raw sensor data in accuracy (46.6% error reduced to 3.0% error) and precision (0.62m std. deviation reduced to 0.0015m std. deviation). These results demonstrate the effectiveness of machine learning based fusion algorithms for noise reduction, measurement accuracy improvement, and outlier measurement removal which would provide SLAM vehicles more robust performance
- …