43 research outputs found

    Human behavior understanding for worker-centered intelligent manufacturing

    Get PDF
    “In a worker-centered intelligent manufacturing system, sensing and understanding of the worker’s behavior are the primary tasks, which are essential for automatic performance evaluation & optimization, intelligent training & assistance, and human-robot collaboration. In this study, a worker-centered training & assistant system is proposed for intelligent manufacturing, which is featured with self-awareness and active-guidance. To understand the hand behavior, a method is proposed for complex hand gesture recognition using Convolutional Neural Networks (CNN) with multiview augmentation and inference fusion, from depth images captured by Microsoft Kinect. To sense and understand the worker in a more comprehensive way, a multi-modal approach is proposed for worker activity recognition using Inertial Measurement Unit (IMU) signals obtained from a Myo armband and videos from a visual camera. To automatically learn the importance of different sensors, a novel attention-based approach is proposed to human activity recognition using multiple IMU sensors worn at different body locations. To deploy the developed algorithms to the factory floor, a real-time assembly operation recognition system is proposed with fog computing and transfer learning. The proposed worker-centered training & assistant system has been validated and demonstrated the feasibility and great potential for applying to the manufacturing industry for frontline workers. Our developed approaches have been evaluated: 1) the multi-view approach outperforms the state-of-the-arts on two public benchmark datasets, 2) the multi-modal approach achieves an accuracy of 97% on a worker activity dataset including 6 activities and achieves the best performance on a public dataset, 3) the attention-based method outperforms the state-of-the-art methods on five publicly available datasets, and 4) the developed transfer learning model achieves a real-time recognition accuracy of 95% on a dataset including 10 worker operations”--Abstract, page iv

    Gesture Recognition of RGB and RGB-D Static Images Using Convolutional Neural Networks

    Get PDF
    In this era, the interaction between Human and Computers has always been a fascinating field. With the rapid development in the field of Computer Vision, gesture based recognition systems have always been an interesting and diverse topic. Though recognizing human gestures in the form of sign language is a very complex and challenging task. Recently various traditional methods were used for performing sign language recognition but achieving high accuracy is still a challenging task. This paper proposes a RGB and RGB-D static gesture recognition method by using a fine-tuned VGG19 model. The fine-tuned VGG19 model uses a feature concatenate layer of RGB and RGB-D images for increasing the accuracy of the neural network. Finally, on an American Sign Language (ASL) Recognition dataset, the authors implemented the proposed model. The authors achieved 94.8% recognition rate and compared the model with other CNN and traditional algorithms on the same dataset

    Japanese sign language classification based on gathered images and neural networks

    Get PDF
    This paper proposes a method to classify words in Japanese Sign Language (JSL). This approach employs a combined gathered image generation technique and a neural network with convolutional and pooling layers (CNNs). The gathered image generation generates images based on mean images. Herein, the maximum difference value is between blocks of mean and JSL motions images. The gathered images comprise blocks that having the calculated maximum difference value. CNNs extract the features of the gathered images, while a support vector machine for multi-class classification, and a multilayer perceptron are employed to classify 20 JSL words. The experimental results had 94.1% for the mean recognition accuracy of the proposed method. These results suggest that the proposed method can obtain information to classify the sample words

    Japanese sign language classification based on gathered images and neural networks

    Get PDF
    This paper proposes a method to classify words in Japanese Sign Language (JSL). This approach employs a combined gathered image generation technique and a neural network with convolutional and pooling layers (CNNs). The gathered image generation generates images based on mean images. Herein, the maximum difference value is between blocks of mean and JSL motions images. The gathered images comprise blocks that having the calculated maximum difference value. CNNs extract the features of the gathered images, while a support vector machine for multi-class classification, and a multilayer perceptron are employed to classify 20 JSL words. The experimental results had 94.1% for the mean recognition accuracy of the proposed method. These results suggest that the proposed method can obtain information to classify the sample words

    Continual Learing of Hand Gestures for Human Robot Interaction

    Get PDF
    Human communication is multimodal. For years, natural language processing has been studied as a form of human-machine or human-robot interaction. In recent years, computer vision techniques have been applied to the recognition of static and dynamic gestures, and progress is being made in sign language recognition too. The typical way to train a machine learning algorithm to perform a classification task is to provide training examples for all the classes that need to be identified by the model. In a real-world scenario, such as in the use of assistive robots, it is useful to learn new concepts from interaction. However, unlike biological brains, artificial neural networks suffer from catastrophic forgetting, and as a result, are not good at incrementally learning new classes. In this thesis, the HAnd Gesture Incremental Learning (HAGIL) framework is proposed as a method to incrementally learn to classify static hand gestures. We show that HAGIL is able to incrementally learn up to 36 new symbols using only 5 samples for each old symbol, achieving a final average accuracy of over 90%. In addition to that, the incremental training time is reduced to a 10% of the time required when using all data available

    Metal Additive Manufacturing Parts Inspection using Convolutional Neural Network

    Get PDF
    Metal additive manufacturing (AM) is gaining increasing attention from academia and industry due to its unique advantages compared to the traditional manufacturing process. Parts quality inspection is playing a crucial role in theAMindustry, which can be adopted for product improvement. However, the traditional inspection process has relied on manual recognition, which could suffer from low efficiency and potential bias. This study presented a convolutional neural network (CNN) approach toward robust AM quality inspection, such as good quality, crack, gas porosity, and lack of fusion. To obtain the appropriate model, experiments were performed on a series of architectures. Moreover, data augmentation was adopted to deal with data scarcity. L2 regularization (weight decay) and dropout were applied to avoid overfitting. The impact of each strategy was evaluated. The final CNN model achieved an accuracy of 92.1%, and it took 8.01 milliseconds to recognize one image. The CNN model presented here can help in automatic defect recognition in the AM industry

    Smart augmented reality instructional system for mechanical assembly

    Get PDF
    Quality and efficiency are pivotal indicators of a manufacturing company. Many companies are suffering from shortage of experienced workers across the production line to perform complex assembly tasks such as assembly of an aircraft engine. This could lead to a significant financial loss. In order to further reduce time and error in an assembly, a smart system consisting of multi-modal Augmented Reality (AR) instructions with the support of a deep learning network for tool detection is introduced. The multi-modal smart AR is designed to provide on-site information including various visual renderings with a fine-tuned Region-based Convolutional Neural Network, which is trained on a synthetic tool dataset. The dataset is generated using CAD models of tools augmented onto a 2D scene without the need of manually preparing real tool images. By implementing the system to mechanical assembly of a CNC carving machine, the result has shown that the system is not only able to correctly classify and localize the physical tools but also enables workers to successfully complete the given assembly tasks. With the proposed approaches, an efficiently customizable smart AR instructional system capable of sensing, characterizing the requirements, and enhancing worker\u27s performance effectively has been built and demonstrated --Abstract, page iii

    SFINGE 3D: A novel benchmark for online detection and recognition of heterogeneous hand gestures from 3D fingers' trajectories

    Get PDF
    In recent years gesture recognition has become an increasingly interesting topic for both research and industry. While interaction with a device through a gestural interface is a promising idea in several applications especially in the industrial field, some of the issues related to the task are still considered a challenge. In the scientific literature, a relevant amount of work has been recently presented on the problem of detecting and classifying gestures from 3D hands' joints trajectories that can be captured by cheap devices installed on head-mounted displays and desktop computers. The methods proposed so far can achieve very good results on benchmarks requiring the offline supervised classification of segmented gestures of a particular kind but are not usually tested on the more realistic task of finding gestures execution within a continuous hand tracking session.In this paper, we present a novel benchmark, SFINGE 3D, aimed at evaluating online gesture detection and recognition. The dataset is composed of a dictionary of 13 segmented gestures used as a training set and 72 trajectories each containing 3-5 of the 13 gestures, performed in continuous tracking, padded with random hand movements acting as noise. The presented dataset, captured with a head-mounted Leap Motion device, is particularly suitable to evaluate gesture detection methods in a realistic use-case scenario, as it allows the analysis of online detection performance on heterogeneous gestures, characterized by static hand pose, global hand motions, and finger articulation.We exploited SFINGE 3D to compare two different approaches for the online detection and classification, one based on visual rendering and Convolutional Neural Networks and the other based on geometrybased handcrafted features and dissimilarity-based classifiers. We discuss the results, analyzing strengths and weaknesses of the methods, and deriving useful hints for their improvement. (C) 2020 Elsevier Ltd. All rights reserved

    Extracting structured information from 2D images

    Get PDF
    Convolutional neural networks can handle an impressive array of supervised learning tasks while relying on a single backbone architecture, suggesting that one solution fits all vision problems. But for many tasks, we can directly make use of the problem structure within neural networks to deliver more accurate predictions. In this thesis, we propose novel deep learning components that exploit the structured output space of an increasingly complex set of problems. We start from Optical Character Recognition (OCR) in natural scenes and leverage the constraints imposed by a spatial outline of letters and language requirements. Conventional OCR systems do not work well in natural scenes due to distortions, blur, or letter variability. We introduce a new attention-based model, equipped with extra information about the neuron positions to guide its focus across characters sequentially. It beats the previous state-of-the-art benchmark by a significant margin. We then turn to dense labeling tasks employing encoder-decoder architectures. We start with an experimental study that documents the drastic impact that decoder design can have on task performance. Rather than optimizing one decoder per task separately, we propose new robust layers for the upsampling of high-dimensional encodings. We show that these better suit the structured per pixel output across the board of all tasks. Finally, we turn to the problem of urban scene understanding. There is an elaborate structure in both the input space (multi-view recordings, aerial and street-view scenes) and the output space (multiple fine-grained attributes for holistic building understanding). We design new models that benefit from a relatively simple cuboidal-like geometry of buildings to create a single unified representation from multiple views. To benchmark our model, we build a new multi-view large-scale dataset of buildings images and fine-grained attributes and show systematic improvements when compared to a broad range of strong CNN-based baselines
    corecore