123 research outputs found

    The design and implementation of vision-based behavioural modules for a robotic assembly system

    Get PDF
    The work drsrrihrd in this thesis ia about, how to program robots to work re liably in the presence of uncertainty. Some architectural principle!: are proposed which address the problem of decomposing robotic assembly tasks into modular units such that a robot program can be implemented efficiently, tested easily, and can be maintained or modified without undue complexity. This architecture also provides a framework to integrate sensors into a robotic, assembly system.These modular units arc called behavioural modules. They perforin their tasks reliably. The problem of uncertainty is dealt with by encapsulating sensing and variation reducing strategies inside these modules. Experiments are performed with a working robotic assembly system using vision based behavioural modules. Analy sis of this system validates the principles presented in this thesis

    Automated Tracking of Hand Hygiene Stages

    Get PDF
    The European Centre for Disease Prevention and Control (ECDC) estimates that 2.5 millioncases of Hospital Acquired Infections (HAIs) occur each year in the European Union. Handhygiene is regarded as one of the most important preventive measures for HAIs. If it is implemented properly, hand hygiene can reduce the risk of cross-transmission of an infection in the healthcare environment. Good hand hygiene is not only important for healthcare settings. Therecent ongoing coronavirus pandemic has highlighted the importance of hand hygiene practices in our daily lives, with governments and health authorities around the world promoting goodhand hygiene practices. The WHO has published guidelines of hand hygiene stages to promotegood hand washing practices. A significant amount of existing research has focused on theproblem of tracking hands to enable hand gesture recognition. In this work, gesture trackingdevices and image processing are explored in the context of the hand washing environment.Hand washing videos of professional healthcare workers were carefully observed and analyzedin order to recognize hand features associated with hand hygiene stages that could be extractedautomatically. Selected hand features such as palm shape (flat or curved); palm orientation(palms facing or not); hand trajectory (linear or circular movement) were then extracted andtracked with the help of a 3D gesture tracking device - the Leap Motion Controller. These fea-tures were further coupled together to detect the execution of a required WHO - hand hygienestage,Rub hands palm to palm, with the help of the Leap sensor in real time. In certain conditions, the Leap Motion Controller enables a clear distinction to be made between the left andright hands. However, whenever the two hands came into contact with each other, sensor data from the Leap, such as palm position and palm orientation was lost for one of the two hands.Hand occlusion was found to be a major drawback with the application of the device to this usecase. Therefore, RGB digital cameras were selected for further processing and tracking of the hands. An image processing technique, using a skin detection algorithm, was applied to extractinstantaneous hand positions for further processing, to enable various hand hygiene poses to be detected. Contour and centroid detection algorithms were further applied to track the handtrajectory in hand hygiene video recordings. In addition, feature detection algorithms wereapplied to a hand hygiene pose to extract the useful hand features. The video recordings did not suffer from occlusion as is the case for the Leap sensor, but the segmentation of one handfrom another was identified as a major challenge with images because the contour detectionresulted in a continuous mass when the two hands were in contact. For future work, the datafrom gesture trackers, such as the Leap Motion Controller and cameras (with image processing)could be combined to make a robust hand hygiene gesture classification system

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator

    Template Matching Based Sign Language Recognition System for Android Devices

    Get PDF
    An android based sign language recognition system for selected English vocabularies was developed with the explicit objective to examine the specific characteristics that are responsible for gestures recognition. Also, a recognition model for the process was designed, implemented, and evaluated on 230 samples of hand gestures.  The collected samples were pre-processed and rescaled from 3024 ×4032 pixels to 245 ×350 pixels. The samples were examined for the specific characteristics using Oriented FAST and Rotated BRIEF, and the Principal Component Analysis used for feature extraction. The model was implemented in Android Studio using the template matching algorithm as its classifier. The performance of the system was evaluated using precision, recall, and accuracy as metrics. It was observed that the system obtained an average classification rate of 87%, an average precision value of 88% and 91% for the average recall rate on the test data of hand gestures.  The study, therefore, has successfully classified hand gestures for selected English vocabularies. The developed system will enhance the communication skills between hearing and hearing-impaired people, and also aid their teaching and learning processes. Future work include exploring state-of-the-art machining learning techniques such Generative Adversarial Networks (GANs) for large dataset to improve the accuracy of results. Keywords— Feature extraction; Gestures Recognition; Sign Language; Vocabulary, Android device

    Hand gesture recognition system based in computer vision and machine learning: Applications on human-machine interaction

    Get PDF
    Tese de Doutoramento em Engenharia de Eletrónica e de ComputadoresSendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.Hand gesture recognition is a natural way of human computer interaction and an area of very active research in computer vision and machine learning. This is an area with many different possible applications, giving users a simpler and more natural way to communicate with robots/systems interfaces, without the need for extra devices. So, the primary goal of gesture recognition research applied to Human-Computer Interaction (HCI) is to create systems, which can identify specific human gestures and use them to convey information or controlling devices. For that, vision-based hand gesture interfaces require fast and extremely robust hand detection, and gesture recognition in real time. Nowadays, vision-based gesture recognition systems are able to work with specific solutions, built to solve one particular problem and configured to work in a particular manner. This research project studied and implemented solutions, generic enough, with the help of machine learning algorithms, allowing its application in a wide range of human-computer interfaces, for real-time gesture recognition. The proposed solution, Gesture Learning Module Architecture (GeLMA), allows the definition in a simple way of a set of commands that can be based on static and dynamic gestures and that can be easily integrated and configured to be used in a number of applications. It is easy to train and use, and since it is mainly built with open source libraries it is also an inexpensive solution. Experiments carried out showed that the system achieved an accuracy of 99.2% in terms of hand posture recognition and an average accuracy of 93,72% in terms of dynamic gesture recognition. To validate the proposed framework, two systems were implemented. The first one is an online system able to help a robotic soccer game referee judge a game in real time. The proposed solution combines a vision-based hand gesture recognition system with a formal language definition, the Referee CommLang, into what is called the Referee Command Language Interface System (ReCLIS). The system builds a command based on system-interpreted static and dynamic referee gestures, and is able to send it to a computer interface which can then transmit the proper commands to the robots. The second one is an online system able to interpret the Portuguese Sign Language. The experiments showed that the system was able to reliably recognize the vowels in real-time. Although the implemented solution was only trained to recognize the five vowels, it is easily extended to recognize the rest of the alphabet. These experiments also showed that the core of vision-based interaction systems can be the same for all applications and thus facilitate its implementation. The proposed framework has the advantage of being generic enough and a solid foundation for the development of hand gesture recognition systems that can be integrated in any human-computer interface application. The interface language can be redefined and the system can be easily configured to train different sets of gestures that can be easily integrated into the final solution

    Hand pose recognition using a consumer depth camera

    Get PDF
    [no abstract

    Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way: A Survey

    Full text link

    High-precision grasping and placing for mobile robots

    Get PDF
    This work presents a manipulation system for multiple labware in life science laboratories using the H20 mobile robots. The H20 robot is equipped with the Kinect V2 sensor to identify and estimate the position of the required labware on the workbench. The local features recognition based on SURF algorithm is used. The recognition process is performed for the labware to be grasped and for the workbench holder. Different grippers and labware containers are designed to manipulate different weights of labware and to realize a safe transportation

    Sensing Highly Non-Rigid Objects with RGBD Sensors for Robotic Systems

    Get PDF
    The goal of this research is to enable a robotic system to manipulate clothing and other highly non-rigid objects using an RGBD sensor. The focus of this thesis is to define and test various algorithms / models that are used to solve parts of the laundry process (i.e. handling, classifying, sorting, unfolding, and folding). First, a system is presented for automatically extracting and classifying items in a pile of laundry. Using only visual sensors, the robot identifies and extracts items sequentially from the pile. When an item is removed and isolated, a model is captured of the shape and appearance of the object, which is then compared against a dataset of known items. The contributions of this part of the laundry process are a novel method for extracting articles of clothing from a pile of laundry, a novel method of classifying clothing using interactive perception, and a multi-layer approach termed L-M-H, more specifically L-C-S-H for clothing classification. This thesis describes two different approaches to classify clothing into categories. The first approach relies upon silhouettes, edges, and other low-level image measurements of the articles of clothing. Experiments from the first approach demonstrate the ability of the system to efficiently classify and label into one of six categories (pants, shorts, short-sleeve shirt, long-sleeve shirt, socks, or underwear). These results show that, on average, classification rates using robot interaction are 59% higher than those that do not use interaction. The second approach relies upon color, texture, shape, and edge information from 2D and 3D data within a local and global perspective. The multi-layer approach compartmentalizes the problem into a high (H) layer, multiple mid-level (characteristics(C), selection masks(S)) layers, and a low (L) layer. This approach produces \u27local\u27 solutions to solve the global classification problem. Experiments demonstrate the ability of the system to efficiently classify each article of clothing into one of seven categories (pants, shorts, shirts, socks, dresses, cloths, or jackets). The results presented in this paper show that, on average, the classification rates improve by +27.47% for three categories, +17.90% for four categories, and +10.35% for seven categories over the baseline system, using support vector machines. Second, an algorithm is presented for automatically unfolding a piece of clothing. A piece of cloth is pulled in different directions at various points of the cloth in order to flatten the cloth. The features of the cloth are extracted and calculated to determine a valid location and orientation in which to interact with it. The features include the peak region, corner locations, and continuity / discontinuity of the cloth. In this thesis, a two-stage algorithm is presented, introducing a novel solution to the unfolding / flattening problem using interactive perception. Simulations using 3D simulation software, and experiments with robot hardware demonstrate the ability of the algorithm to flatten pieces of laundry using different starting configurations. These results show that, at most, the algorithm flattens out a piece of cloth from 11.1% to 95.6% of the canonical configuration. Third, an energy minimization algorithm is presented that is designed to estimate the configuration of a deformable object. This approach utilizes an RGBD image to calculate feature correspondence (using SURF features), depth values, and boundary locations. Input from a Kinect sensor is used to segment the deformable surface from the background using an alpha-beta swap algorithm. Using this segmentation, the system creates an initial mesh model without prior information of the surface geometry, and it reinitializes the configuration of the mesh model after a loss of input data. This approach is able to handle in-plane rotation, out-of-plane rotation, and varying changes in translation and scale. Results display the proposed algorithm over a dataset consisting of seven shirts, two pairs of shorts, two posters, and a pair of pants. The current approach is compared using a simulated shirt model in order to calculate the mean square error of the distance from the vertices on the mesh model to the ground truth, provided by the simulation model
    corecore