958 research outputs found

    Coping with Data Scarcity in Deep Learning and Applications for Social Good

    Get PDF
    The recent years are experiencing an extremely fast evolution of the Computer Vision and Machine Learning fields: several application domains benefit from the newly developed technologies and industries are investing a growing amount of money in Artificial Intelligence. Convolutional Neural Networks and Deep Learning substantially contributed to the rise and the diffusion of AI-based solutions, creating the potential for many disruptive new businesses. The effectiveness of Deep Learning models is grounded by the availability of a huge amount of training data. Unfortunately, data collection and labeling is an extremely expensive task in terms of both time and costs; moreover, it frequently requires the collaboration of domain experts. In the first part of the thesis, I will investigate some methods for reducing the cost of data acquisition for Deep Learning applications in the relatively constrained industrial scenarios related to visual inspection. I will primarily assess the effectiveness of Deep Neural Networks in comparison with several classical Machine Learning algorithms requiring a smaller amount of data to be trained. Hereafter, I will introduce a hardware-based data augmentation approach, which leads to a considerable performance boost taking advantage of a novel illumination setup designed for this purpose. Finally, I will investigate the situation in which acquiring a sufficient number of training samples is not possible, in particular the most extreme situation: zero-shot learning (ZSL), which is the problem of multi-class classification when no training data is available for some of the classes. Visual features designed for image classification and trained offline have been shown to be useful for ZSL to generalize towards classes not seen during training. Nevertheless, I will show that recognition performances on unseen classes can be sharply improved by learning ad hoc semantic embedding (the pre-defined list of present and absent attributes that represent a class) and visual features, to increase the correlation between the two geometrical spaces and ease the metric learning process for ZSL. In the second part of the thesis, I will present some successful applications of state-of-the- art Computer Vision, Data Analysis and Artificial Intelligence methods. I will illustrate some solutions developed during the 2020 Coronavirus Pandemic for controlling the disease vii evolution and for reducing virus spreading. I will describe the first publicly available dataset for the analysis of face-touching behavior that we annotated and distributed, and I will illustrate an extensive evaluation of several computer vision methods applied to the produced dataset. Moreover, I will describe the privacy-preserving solution we developed for estimating the \u201cSocial Distance\u201d and its violations, given a single uncalibrated image in unconstrained scenarios. I will conclude the thesis with a Computer Vision solution developed in collaboration with the Egyptian Museum of Turin for digitally unwrapping mummies analyzing their CT scan, to support the archaeologists during mummy analysis and avoiding the devastating and irreversible process of physically unwrapping the bandages for removing amulets and jewels from the body

    Body gestures recognition for human robot interaction

    Get PDF
    In this project, a solution for human gesture classification is proposed. The solution uses a Deep Learning model and is meant to be useful for non-verbal communication between humans and robots. The State-of-the-Art is researched in an effort to achieve a model ready to work with natural gestures without restrictions. The research will focus on the creation of a temPoral bOdy geSTUre REcognition model (POSTURE) that can recognise continuous gestures performed in real-life situations. The suggested model takes into account spatial and temporal components so as to achieve the recognition of more natural and intuitive gestures. In a first step, a framework extracts from all the images the corresponding landmarks for each of the body joints. Next, some data filtering techniques are applied with the aim of avoiding problems related with the data. Afterwards, the filtered data is input into an State-of-the-Art Neural Network. And finally, different neural network configurations and approaches are tested to find the optimal performance. The obtained outcome shows the research has been done in the right track and how, despite of the dataset problems found, even better results can be achievedObjectius de Desenvolupament Sostenible::9 - Indústria, Innovació i Infraestructur

    Detecting, locating and recognising human touches in social robots with contact microphones

    Get PDF
    There are many situations in our daily life where touch gestures during natural human–human interaction take place: meeting people (shaking hands), personal relationships (caresses), moments of celebration or sadness (hugs), etc. Considering that robots are expected to form part of our daily life in the future, they should be endowed with the capacity of recognising these touch gestures and the part of its body that has been touched since the gesture’s meaning may differ. Therefore, this work presents a learning system for both purposes: detect and recognise the type of touch gesture (stroke, tickle, tap and slap) and its localisation. The interpretation of the meaning of the gesture is out of the scope of this paper. Different technologies have been applied to perceive touch by a social robot, commonly using a large number of sensors. Instead, our approach uses 3 contact microphones installed inside some parts of the robot. The audio signals generated when the user touches the robot are sensed by the contact microphones and processed using Machine Learning techniques. We acquired information from sensors installed in two social robots, Maggie and Mini (both developed by the RoboticsLab at the Carlos III University of Madrid), and a real-time version of the whole system has been deployed in the robot Mini. The system allows the robot to sense if it has been touched or not, to recognise the kind of touch gesture, and its approximate location. The main advantage of using contact microphones as touch sensors is that by using just one, it is possible to “cover” a whole solid part of the robot. Besides, the sensors are unaffected by ambient noises, such as human voice, TV, music etc. Nevertheless, the fact of using several contact microphones makes possible that a touch gesture is detected by all of them, and each may recognise a different gesture at the same time. The results show that this system is robust against this phenomenon. Moreover, the accuracy obtained for both robots is about 86%.The research leading to these results has received funding from the projects: ‘‘Robots Sociales para Estimulación Física, Cognitiva y Afectiva de Mayores (ROSES)’’, funded by the Spanish "Ministerio de Ciencia, Innovación y Universidades, Spain" and from RoboCity2030-DIH-CM, Madrid Robotics Digital Innovation Hub, S2018/NMT-4331, funded by ‘"Programas de Actividades I+D en la Comunidad de Madrid’" and cofunded by Structural Funds of the EU, Slovak Republic.Publicad

    Few-Shot User-Definable Radar-Based Hand Gesture Recognition at the Edge

    Get PDF
    This work was supported in part by ITEA3 Unleash Potentials in Simulation (UPSIM) by the German Federal Ministry of Education and Research (BMBF) under Project 19006, in part by the Austrian Research Promotion Agency (FFG), in part by the Rijksdienst voor Ondernemend Nederland (Rvo), and in part by the Innovation Fund Denmark (IFD).Technological advances and scalability are leading Human-Computer Interaction (HCI) to evolve towards intuitive forms, such as through gesture recognition. Among the various interaction strategies, radar-based recognition is emerging as a touchless, privacy-secure, and versatile solution in different environmental conditions. Classical radar-based gesture HCI solutions involve deep learning but require training on large and varied datasets to achieve robust prediction. Innovative self-learning algorithms can help tackling this problem by recognizing patterns and adapt from similar contexts. Yet, such approaches are often computationally expensive and hardly integrable into hardware-constrained solutions. In this paper, we present a gesture recognition algorithm which is easily adaptable to new users and contexts. We exploit an optimization-based meta-learning approach to enable gesture recognition in learning sequences. This method targets at learning the best possible initialization of the model parameters, simplifying training on new contexts when small amounts of data are available. The reduction in computational cost is achieved by processing the radar sensed data of gestures in the form of time maps, to minimize the input data size. This approach enables the adaptation of simple convolutional neural network (CNN) to new hand poses, thus easing the integration of the model into a hardware-constrained platform. Moreover, the use of a Variational Autoencoders (VAE) to reduce the gestures' dimensionality leads to a model size decrease of an order of magnitude and to half of the required adaptation time. The proposed framework, deployed on the Intel(R) Neural Compute Stick 2 (NCS 2), leads to an average accuracy of around 84% for unseen gestures when only one example per class is utilized at training time. The accuracy increases up to 92.6% and 94.2% when three and five samples per class are used.Federal Ministry of Education & Research (BMBF) 19006Austrian Research Promotion Agency (FFG)Rijksdienst voor Ondernemend Nederland (Rvo)Innovation Fund Denmark (IFD

    Detecting Surface Interactions via a Wearable Microphone to Improve Augmented Reality Text Entry

    Get PDF
    This thesis investigates whether we can detect and distinguish between surface interaction events such as tapping or swiping using a wearable mic from a surface. Also, what are the advantages of new text entry methods such as tapping with two fingers simultaneously to enter capital letters and punctuation? For this purpose, we conducted a remote study to collect audio and video of three different ways people might interact with a surface. We also built a CNN classifier to detect taps. Our results show that we can detect and distinguish between surface interaction events such as tap or swipe via a wearable mic on the user\u27s head
    corecore