57 research outputs found

    Real-time New Zealand sign language translator using convolution neural network

    Get PDF
    Over the past quarter of a century, machine Learning performs an essential role in information technology revolution. From predictive internet web browsing to autonomous vehicles; machine learning has become the heart of all intelligence applications in service today. Image classification through gesture recognition is sub field which has benefited immensely from the existence of this machine learning method. In particular, a subset of Machine Learning known as deep learning has exhibited impressive performance in this regard while outperforming other conventional approaches such as image processing. The advanced Deep Learning architectures come with artificial neural networks particularly convolution neural networks (CNN). Deep Learning has dominated the field of computer vision since 2012; however, a general criticism of this deep learning method is its dependence on large datasets. In order to overcome this criticism, research focusing on discovering data- efficient deep learning methods have been carried out. The foremost finding of the data-efficient deep learning function is a transfer learning technique, which is basically carried out with pre-trained networks. In this research, the InceptionV3 pre trained model has been used to perform the transfer learning method in a convolution neural network to implement New Zealand sign language translator in real-time. The focus of this research is to introduce a vision-based application that offers New Zealand sign language translation into text format by recognizing sign gestures to overcome the communication barriers between the deaf community and hearing-unimpaired community in New Zealand. As a byproduct of this research work, a new dataset for New Zealand sign Language alphabet has been created. After training the pre-trained InceptionV3 network with this captured dataset, a prototype for this New Zealand sign language translating system has been created

    RGB-D SLAM

    Get PDF
    This project has been developed as an implementation of a SLAM technique called GraphSLAM. This technique applies the theory of graphs to create an on-line optimization system that allows robots to map the scenario and locate themselves using a Time of Flight camera as the input source. To do that, a RGB-D system has been calibrated and used to create color 3D point clouds. With this information, the feature detector module estimates, as a first approximation, the pose of the camera. Therefore, a ICP pose refinement completes the graph structure. Finally, a HogMAN graph optimizer close the loop on each iteration using a hierarchical manifold optimization. As a result, 3D color maps are created containing, at the same time, the exact position of the robot over the map. ____________________________________________________________________________________________________________________________________Este proyecto ha sido desarrollado como una propuesta para implementación de una de las técnicas de SLAM denominada GraphSLAM. Esta técnica aplica la teoría de grafos para crear un sistema de optimización en tiempo real que permite a los robots mapear un escenario y localizarse utilizando una cámara de tiempo de vuelo como fuente de información. Para llevarlo a cabo, ha sido desarrollado y calibrado un sistema RGB-D que tiene como finalidad crear una nube de puntos 3D. Con esta información, el detector de características estima, como primera aproximación, la posición de la cámara. A continuación, mediante ICP se realiza una corrección más fina de la estructura del grafo. Finalmente, mediante un optimizador global de grafos denominado HogMAN se cierra el bucle en cada iteración basándose en manifolds jerárquicos. Como resultado, se generan mapas 3D a color que contien, al mismo tiempo, la posición exacta del robot dentro del mapa.Ingeniería Técnica en Telemátic

    Vision-based context-aware assistance for minimally invasive surgery.

    Get PDF
    Context-aware surgical system is a system that can collect surgical data and analyze the operating environment to guide responses for surgeons at any given time, which improves the efficiency, augment the performance and lowers the risk of minimally invasive surgery (MIS). It allows various applications through the whole patient care pathway, such as medical resources scheduling and report generation. Automatic surgical activities understanding is an essential component for building context-aware surgical system. However, analyzing surgical activities is a challenging task, because the operating environment is considerably complicated. Previous methods either require the additional devices or have limited ability to capture discriminating features from surgical data. This thesis aims to solve the challenges of surgical activities analysis and provide context-aware assistance for MIS. In our study, we consider the surgical visual data as the only input. Because videos and images own high-dimensional and representative features, and it is much easier to access than other data format, for example, kinematic information or motion trajectory. Following the granularity of surgical activity in a top-down manner, we first propose an attention-based multi-task framework to assess the expertise level and evaluate six standards for surgeons with different skill level in three fundamental surgical robotic tasks, namely suturing, knot tying and needle passing. Second, we present a symmetric dilated convolution structure embedded with self-attention kernel to jointly detect and segment fine-grained surgical gestures for surgical videos. In addition, we use the transformer encoder-decoder architecture with reinforcement learning to generate surgical instructions based on images. Overall, this thesis develops a series of novel deep learning frame- works to extract high-level semantic information from surgical video and image content to assist MIS, pushing the boundaries towards integrated context-aware system through the patient care pathway

    Visual place recognition for improved open and uncertain navigation

    Get PDF
    Visual place recognition localises a query place image by comparing it against a reference database of known place images, a fundamental element of robotic navigation. Recent work focuses on using deep learning to learn image descriptors for this task that are invariant to appearance changes from dynamic lighting, weather and seasonal conditions. However, these descriptors: require greater computational resources than are available on robotic hardware, have few SLAM frameworks designed to utilise them, return a relative comparison between image descriptors which is difficult to interpret, cannot be used for appearance invariance in other navigation tasks such as scene classification and are unable to identify query images from an open environment that have no true match in the reference database. This thesis addresses these challenges with three contributions. The first is a lightweight visual place recognition descriptor combined with a probabilistic filter to address a subset of the visual SLAM problem in real-time. The second contribution combines visual place recognition and scene classification for appearance invariant scene classification, which is extended to recognise unknown scene classes when navigating an open environment. The final contribution uses comparisons between query and reference image descriptors to classify whether they result in a true, or false positive localisation and whether a true match for the query image exists in the reference database.Edinburgh Centre for Robotics and Engineering and Physical Sciences Research Council (EPSRC) fundin

    The 2nd International Electronic Conference on Applied Sciences

    Get PDF
    This book is focused on the works presented at the 2nd International Electronic Conference on Applied Sciences, organized by Applied Sciences from 15 to 31 October 2021 on the MDPI Sciforum platform. Two decades have passed since the start of the 21st century. The development of sciences and technologies is growing ever faster today than in the previous century. The field of science is expanding, and the structure of science is becoming ever richer. Because of this expansion and fine structure growth, researchers may lose themselves in the deep forest of the ever-increasing frontiers and sub-fields being created. This international conference on the Applied Sciences was started to help scientists conduct their own research into the growth of these frontiers by breaking down barriers and connecting the many sub-fields to cut through this vast forest. These functions will allow researchers to see these frontiers and their surrounding (or quite distant) fields and sub-fields, and give them the opportunity to incubate and develop their knowledge even further with the aid of this multi-dimensional network

    Smart Sensor Technologies for IoT

    Get PDF
    The recent development in wireless networks and devices has led to novel services that will utilize wireless communication on a new level. Much effort and resources have been dedicated to establishing new communication networks that will support machine-to-machine communication and the Internet of Things (IoT). In these systems, various smart and sensory devices are deployed and connected, enabling large amounts of data to be streamed. Smart services represent new trends in mobile services, i.e., a completely new spectrum of context-aware, personalized, and intelligent services and applications. A variety of existing services utilize information about the position of the user or mobile device. The position of mobile devices is often achieved using the Global Navigation Satellite System (GNSS) chips that are integrated into all modern mobile devices (smartphones). However, GNSS is not always a reliable source of position estimates due to multipath propagation and signal blockage. Moreover, integrating GNSS chips into all devices might have a negative impact on the battery life of future IoT applications. Therefore, alternative solutions to position estimation should be investigated and implemented in IoT applications. This Special Issue, “Smart Sensor Technologies for IoT” aims to report on some of the recent research efforts on this increasingly important topic. The twelve accepted papers in this issue cover various aspects of Smart Sensor Technologies for IoT

    Image-Based Scene Analysis for Computer-Assisted Laparoscopic Surgery

    Get PDF
    This thesis is concerned on image-based scene analysis for computer-assisted laparoscopic surgery. The focus lies on how to extract different types of information from laparoscopic video data. Methods for semantic analysis can be used to determine what instruments and organs are currently visible and where they are located. Quantitative analysis provides numerical information on the size and distances of structures. Workflow analysis uses information from previously seen images to estimate the progression of surgery. To demonstrate that the proposed methods function in real-world scenarios, multiple evaluations on actual laparoscopic image data recorded from surgeries were performed. The proposed methods for semantic and quantitative analysis were successfully evaluated in live phantom and animal studies and also used during a live gastric bypass on a human patient

    Apprentissage neuronal de caractéristiques spatio-temporelles pour la classification automatique de séquences vidéo

    Get PDF
    Cette thèse s'intéresse à la problématique de la classification automatique des séquences vidéo. L'idée est de se démarquer de la méthodologie dominante qui se base sur l'utilisation de caractéristiques conçues manuellement, et de proposer des modèles qui soient les plus génériques possibles et indépendants du domaine. Ceci est fait en automatisant la phase d'extraction des caractéristiques, qui sont dans notre cas générées par apprentissage à partir d'exemples, sans aucune connaissance a priori. Nous nous appuyons pour ce faire sur des travaux existants sur les modèles neuronaux pour la reconnaissance d'objets dans les images fixes, et nous étudions leur extension au cas de la vidéo. Plus concrètement, nous proposons deux modèles d'apprentissage des caractéristiques spatio-temporelles pour la classification vidéo : (i) Un modèle d'apprentissage supervisé profond, qui peut être vu comme une extension des modèles ConvNets au cas de la vidéo, et (ii) Un modèle d'apprentissage non supervisé, qui se base sur un schéma d'auto-encodage, et sur une représentation parcimonieuse sur-complète des données. Outre les originalités liées à chacune de ces deux approches, une contribution supplémentaire de cette thèse est une étude comparative entre plusieurs modèles de classification de séquences parmi les plus populaires de l'état de l'art. Cette étude a été réalisée en se basant sur des caractéristiques manuelles adaptées à la problématique de la reconnaissance d'actions dans les vidéos de football. Ceci a permis d'identifier le modèle de classification le plus performant (un réseau de neurone récurrent bidirectionnel à longue mémoire à court-terme -BLSTM-), et de justifier son utilisation pour le reste des expérimentations. Enfin, afin de valider la généricité des deux modèles proposés, ceux-ci ont été évalués sur deux problématiques différentes, à savoir la reconnaissance d'actions humaines (sur la base KTH), et la reconnaissance d'expressions faciales (sur la base GEMEP-FERA). L'étude des résultats a permis de valider les approches, et de montrer qu'elles obtiennent des performances parmi les meilleures de l'état de l'art (avec 95,83% de bonne reconnaissance pour la base KTH, et 87,57% pour la base GEMEP-FERA).This thesis focuses on the issue of automatic classification of video sequences. We aim, through this work, at standing out from the dominant methodology, which relies on so-called hand-crafted features, by proposing generic and problem-independent models. This can be done by automating the feature extraction process, which is performed in our case through a learning scheme from training examples, without any prior knowledge. To do so, we rely on existing neural-based methods, which are dedicated to object recognition in still images, and investigate their extension to the video case. More concretely, we introduce two learning-based models to extract spatio-temporal features for video classification: (i) A deep learning model, which is trained in a supervised way, and which can be considered as an extension of the popular ConvNets model to the video case, and (ii) An unsupervised learning model that relies on an auto-encoder scheme, and a sparse over-complete representation. Moreover, an additional contribution of this work lies in a comparative study between several sequence classification models. This study was performed using hand-crafted features especially designed to be optimal for the soccer action recognition problem. Obtained results have permitted to select the best classifier (a bidirectional long short-term memory recurrent neural network -BLSTM-) to be used for all experiments. In order to validate the genericity of the two proposed models, experiments were carried out on two different problems, namely human action recognition (using the KTH dataset) and facial expression recognition (using the GEMEP-FERA dataset). Obtained results show that our approaches achieve outstanding performances, among the best of the related works (with a recognition rate of 95,83% for the KTH dataset, and 87,57% for the GEMEP-FERA dataset).VILLEURBANNE-DOC'INSA-Bib. elec. (692669901) / SudocSudocFranceF

    Spatio-Temporal Modeling for Action Recognition in Videos

    Get PDF
    Technological innovation in the field of video action recognition drives the development of video-based real-world applications. This PhD thesis provides a new set of machine learning algorithms for processing videos efficiently, leading to outstanding results in human action recognition in videos. First of all, two video representation extraction methods, Temporal Squeezed Pooling (TSP) and Pixel-Wise Temporal Projection (PWTP), are proposed in order to enhance the discriminative video feature learning abilities of Deep Neural Networks (DNNs). TSP enables spatio-temporal modeling by temporally aggregating the information from long video frame sequences. PWTP is an improved version TSP, which filters out static appearance while performing information aggregation. Secondly, we discuss how to address the long-term dependency modeling problem of video DNNs. To this end, we develop two spatio-temporal attention mechanisms, Region-based Non-local (RNL) and Convolution Pyramid Attention (CPA). We devise an attention chain by connecting the RNL or CPA module to the Squeeze-Excitation (SE) operation. We demonstrate how the attention mechanisms can be embedded into deep networks to alleviate the optimization difficulty. Finally, we are focused on tackling the problem of heavy computational cost in video models. To this end, we introduce the concept of busy-quiet video disentangling for exceedingly fast video modeling. We propose the Motion Band-Pass Module (MBPM) embedded into the Busy-Quiet Net (BQN) architecture to reduce videos’ information redundancy in the spatial and temporal dimensions. The BQN architecture is extremely lightweight while still performing better than other heavier models. Extensive experiments for all the proposed methods are provided on multiple video benchmarks, including UCF101, HMDB51, Kinetics400, Something-Something V1
    corecore