75 research outputs found

    Real-time hand gesture recognition exploiting multiple 2D and 3D cues

    Get PDF
    The recent introduction of several 3D applications and stereoscopic display technologies has created the necessity of novel human-machine interfaces. The traditional input devices, such as keyboard and mouse, are not able to fully exploit the potential of these interfaces and do not offer a natural interaction. Hand gestures provide, instead, a more natural and sometimes safer way of interacting with computers and other machines without touching them. The use cases for gesture-based interfaces range from gaming to automatic sign language interpretation, health care, robotics, and vehicle automation. Automatic gesture recognition is a challenging problem that has been attaining a growing interest in the research field for several years due to its applications in natural interfaces. The first approaches, based on the recognition from 2D color pictures or video only, suffered of the typical problems characterizing such type of data. Inter occlusions, different skin colors among users even of the same ethnic group and unstable illumination conditions, in facts, often made this problem intractable. Other approaches, instead, solved the previous problems by making the user wear sensorized gloves or hold proper tools designed to help the hand localization in the scene. The recent introduction in the mass market of novel low-cost range cameras, like the Microsoft Kinect, Asus XTION, Creative Senz3D, and the Leap Motion, has opened the way to innovative gesture recognition approaches exploiting the geometry of the framed scene. Most methods share a common gesture recognition pipeline based on firstly identifying the hand in the framed scene, then extracting some relevant features on the hand samples and finally exploiting suitable machine learning techniques in order to recognize the performed gesture from a predefined ``gesture dictionary''. This thesis, based on the previous rationale, proposes a novel gesture recognition framework exploiting both color and geometric cues from low-cost color and range cameras. The dissertation starts by introducing the automatic hand gesture recognition problem, giving an overview of the state-of-art algorithms and the recognition pipeline employed in this work. Then, it briefly describes the major low-cost range cameras and setups used in literature for color and depth data acquisition for hand gesture recognition purposes, highlighting their capabilities and limitations. The methods employed for respectively detecting the hand in the framed scene and segmenting it in its relevant parts are then analyzed with a higher level of detail. The algorithm first exploits skin color information and geometrical considerations for discarding the background samples, then it reliably detects the palm and the finger regions, and removes the forearm. For the palm detection, the method fits the largest circle inscribed in the palm region or, in a more advanced version, an ellipse. A set of robust color and geometric features which can be extracted from the fingers and palm regions, previously segmented, is then illustrated accurately. Geometric features describe properties of the hand contour from its curvature variations, the distances in the 3D space or in the image plane of its points from the hand center or from the palm, or extract relevant information from the palm morphology and from the empty space in the hand convex hull. Color features exploit, instead, the histogram of oriented gradients (HOG), local phase quantization (LPQ) and local ternary patterns (LTP) algorithms to provide further helpful cues from the hand texture and the depth map treated as a grayscale image. Additional features extracted from the Leap Motion data complete the gesture characterization for a more reliable recognition. Moreover, the thesis also reports a novel approach jointly exploiting the geometric data provided by the Leap Motion and the depth data from a range camera for extracting the same depth features with a significantly lower computational effort. This work then addresses the delicate problem of constructing a robust gesture recognition model from the features previously described, using multi-class Support Vector Machines, Random Forests or more powerful ensembles of classifiers. Feature selection techniques, designed to detect the smallest subset of features that allow to train a leaner classification model without a significant accuracy loss, are also considered. The proposed recognition method, tested on subsets of the American Sign Language and experimentally validated, reported very high accuracies. The results showed also how higher accuracies are obtainable by combining proper sets of complementary features and using ensembles of classifiers. Moreover, it is worth noticing that the proposed approach is not sensor dependent, that is, the recognition algorithm is not bound to a specific sensor or technology adopted for the depth data acquisition. Eventually, the gesture recognition algorithm is able to run in real-time even in absence of a thorough optimization, and may be easily extended in a near future with novel descriptors and the support for dynamic gestures

    HD: Efficient Hand Detection and Tracking

    Full text link

    Automated Stabilization, Enhancement and Capillaries Segmentation in Videocapillaroscopy

    Get PDF
    Oral capillaroscopy is a critical and non-invasive technique used to evaluate microcirculation. Its ability to observe small vessels in vivo has generated significant interest in the field. Capillaroscopy serves as an essential tool for diagnosing and prognosing various pathologies, with anatomic–pathological lesions playing a crucial role in their progression. Despite its importance, the utilization of videocapillaroscopy in the oral cavity encounters limitations due to the acquisition setup, encompassing spatial and temporal resolutions of the video camera, objective magnification, and physical probe dimensions. Moreover, the operator’s influence during the acquisition process, particularly how the probe is maneuvered, further affects its effectiveness. This study aims to address these challenges and improve data reliability by developing a computerized support system for microcirculation analysis. The designed system performs stabilization, enhancement and automatic segmentation of capillaries in oral mucosal video sequences. The stabilization phase was performed by means of a method based on the coupling of seed points in a classification process. The enhancement process implemented was based on the temporal analysis of the capillaroscopic frames. Finally, an automatic segmentation phase of the capillaries was implemented with the additional objective of quantitatively assessing the signal improvement achieved through the developed techniques. Specifically, transfer learning of the renowned U-net deep network was implemented for this purpose. The proposed method underwent testing on a database with ground truth obtained from expert manual segmentation. The obtained results demonstrate an achieved Jaccard index of 90.1% and an accuracy of 96.2%, highlighting the effectiveness of the developed techniques in oral capillaroscopy. In conclusion, these promising outcomes encourage the utilization of this method to assist in the diagnosis and monitoring of conditions that impact microcirculation, such as rheumatologic or cardiovascular disorders

    Machine learning methods for sign language recognition: a critical review and analysis.

    Get PDF
    Sign language is an essential tool to bridge the communication gap between normal and hearing-impaired people. However, the diversity of over 7000 present-day sign languages with variability in motion position, hand shape, and position of body parts making automatic sign language recognition (ASLR) a complex system. In order to overcome such complexity, researchers are investigating better ways of developing ASLR systems to seek intelligent solutions and have demonstrated remarkable success. This paper aims to analyse the research published on intelligent systems in sign language recognition over the past two decades. A total of 649 publications related to decision support and intelligent systems on sign language recognition (SLR) are extracted from the Scopus database and analysed. The extracted publications are analysed using bibliometric VOSViewer software to (1) obtain the publications temporal and regional distributions, (2) create the cooperation networks between affiliations and authors and identify productive institutions in this context. Moreover, reviews of techniques for vision-based sign language recognition are presented. Various features extraction and classification techniques used in SLR to achieve good results are discussed. The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. Overall, it is expected that this study will facilitate knowledge accumulation and creation of intelligent-based SLR and provide readers, researchers, and practitioners a roadmap to guide future direction

    Banknote Authentication and Medical Image Diagnosis Using Feature Descriptors and Deep Learning Methods

    Get PDF
    Banknote recognition and medical image analysis have been the foci of image processing and pattern recognition research. As counterfeiters have taken advantage of the innovation in print media technologies for reproducing fake monies, hence the need to design systems which can reassure and protect citizens of the authenticity of banknotes in circulation. Similarly, many physicians must interpret medical images. But image analysis by humans is susceptible to error due to wide variations across interpreters, lethargy, and human subjectivity. Computer-aided diagnosis is vital to improvements in medical analysis, as they facilitate the identification of findings that need treatment and assist the expert’s workflow. Thus, this thesis is organized around three such problems related to Banknote Authentication and Medical Image Diagnosis. In our first research problem, we proposed a new banknote recognition approach that classifies the principal components of extracted HOG features. We further experimented on computing HOG descriptors from cells created from image patch vertices of SURF points and designed a feature reduction approach based on a high correlation and low variance filter. In our second research problem, we developed a mobile app for banknote identification and counterfeit detection using the Unity 3D software and evaluated its performance based on a Cascaded Ensemble approach. The algorithm was then extended to a client-server architecture using SIFT and SURF features reduced by Bag of Words and high correlation-based HOG vectors. In our third research problem, experiments were conducted on a pre-trained mobile app for medical image diagnosis using three convolutional layers with an Ensemble Classifier comprising PCA and bagging of five base learners. Also, we implemented a Bidirectional Generative Adversarial Network to mitigate the effect of the Binary Cross Entropy loss based on a Deep Convolutional Generative Adversarial Network as the generator and encoder with Capsule Network as the discriminator while experimenting on images with random composition and translation inferences. Lastly, we proposed a variant of the Single Image Super-resolution for medical analysis by redesigning the Super Resolution Generative Adversarial Network to increase the Peak Signal to Noise Ratio during image reconstruction by incorporating a loss function based on the mean square error of pixel space and Super Resolution Convolutional Neural Network layers

    Using computer vision to categorize tyres and estimate the number of visible tyres in tyre stockpile images

    Get PDF
    Pressures from environmental agencies contribute to the challenges associated with the disposal of waste tyres, particularly in South Africa. Recycling of waste tyres in South Africa is in its infancy resulting in the historically undocumented and uncontrolled existence of waste tyre stockpiles across the country. The remote and distant locations of such stockpiles typically complicate the logistics associated with the collection, transport and storage of waste tyres prior to entering the recycling process. In order to optimize the logistics associated with the collection of waste tyres from stockpiles, useful information about such stockpiles would include estimates of the types of tyres as well as the quantity of specific tyre types found in particular stockpiles. This research proposes the use of computer vision for categorizing individual tyres and estimating the number of visible tyres in tyre stockpile images to support the logistics in tyre recycling efforts. The study begins with a broad review of image processing and computer vision algorithms for categorization and counting objects in images. The bag of visual words (BoVW) model for categorization is tested on two small data sets of tread tyre images using a random sub-sampling holdout method. The categorization results are evaluated using performance metrics for multiclass classifiers, namely the average accuracy, precision, and recall. The results indicated that corner-based local feature detectors combined with speeded up robust features (SURF) descriptors in a BoVW model provide moderately accurate categorization of tyres based on tread images. Two feature extraction methods for extracting features for use in training neural networks (NNs) for tyre count estimations in tyre stockpiles are proposed. The two feature extraction methods are used to describe images in terms of feature vectors that can be used as input for NNs. The first feature extraction method uses the BoVW model with histograms of oriented gradients (HOG) features collected from overlapping sub-images to create a visual vocabulary and describe the images in terms of their visual word occurrence histogram. The second feature extraction method uses the image gradient magnitude, gradient orientation, and edge orientations of edges detected using the Canny edge detector. A concatenated histogram is constructed from individual histograms of gradient orientations and gradient magnitude. The histograms are then used to train NNs using backpropogation to approximate functions from the feature vectors describing the images to scalar count estimations. The accuracy of visible object count predictions are evaluated using NN evaluation techniques to determine the accuracy of predictions and the generalization ability of the fit model. The count estimation experiments using the two feature extraction methods for input to NNs showed that fairly accurate count estimations can be obtained and that the fit model could generalize fairly well to unseen images

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Scaled Autonomy for Networked Humanoids

    Get PDF
    Humanoid robots have been developed with the intention of aiding in environments designed for humans. As such, the control of humanoid morphology and effectiveness of human robot interaction form the two principal research issues for deploying these robots in the real world. In this thesis work, the issue of humanoid control is coupled with human robot interaction under the framework of scaled autonomy, where the human and robot exchange levels of control depending on the environment and task at hand. This scaled autonomy is approached with control algorithms for reactive stabilization of human commands and planned trajectories that encode semantically meaningful motion preferences in a sequential convex optimization framework. The control and planning algorithms have been extensively tested in the field for robustness and system verification. The RoboCup competition provides a benchmark competition for autonomous agents that are trained with a human supervisor. The kid-sized and adult-sized humanoid robots coordinate over a noisy network in a known environment with adversarial opponents, and the software and routines in this work allowed for five consecutive championships. Furthermore, the motion planning and user interfaces developed in the work have been tested in the noisy network of the DARPA Robotics Challenge (DRC) Trials and Finals in an unknown environment. Overall, the ability to extend simplified locomotion models to aid in semi-autonomous manipulation allows untrained humans to operate complex, high dimensional robots. This represents another step in the path to deploying humanoids in the real world, based on the low dimensional motion abstractions and proven performance in real world tasks like RoboCup and the DRC

    An integrated sign language recognition system

    Get PDF
    Doctor EducationisResearch has shown that five parameters are required to recognize any sign language gesture: hand shape, location, orientation and motion, as well as facial expressions. The South African Sign Language (SASL) research group at the University of the Western Cape has created systems to recognize Sign Language gestures using single parameters. Using a single parameter can cause ambiguities in the recognition of signs that are similarly signed resulting in a restriction of the possible vocabulary size. This research pioneers work at the group towards combining multiple parameters to achieve a larger recognition vocabulary set. The proposed methodology combines hand location and hand shape recognition into one combined recognition system. The system is shown to be able to recognize a very large vocabulary of 50 signs at a high average accuracy of 74.1%. This vocabulary size is much larger than existing SASL recognition systems, and achieves a higher accuracy than these systems in spite of the large vocabulary. It is also shown that the system is highly robust to variations in test subjects such as skin colour, gender and body dimension. Furthermore, the group pioneers research towards continuously recognizing signs from a video stream, whereas existing systems recognized a single sign at a time. To this end, a highly accurate continuous gesture segmentation strategy is proposed and shown to be able to accurately recognize sentences consisting of five isolated SASL gestures
    corecore