20 research outputs found

    Software Controller using Hand Gestures

    Get PDF
    New technologies emerge in response to the passage of time. Robotic hand gesture control is one of them. Collaboration frameworks based on gestures are becoming increasingly popular in the business and at home. The approach we suggested can greatly reduce the utilization of hardware components such as a keyboard and mouse. The goal of this research is to create a system that can recognize hand gestures and use them as an input command to connect with a computer or laptop. According to a recent study, the use of CNN technology is still lacking in Hand Gesture Recognition. Our research aims to leverage CNN technology to recognize gestures in both static and dynamic modes, and then deploy the trained model in real-time applications

    Egocentric Action Understanding by Learning Embodied Attention

    Get PDF
    Videos captured from wearable cameras, known as egocentric videos, create a continuous record of human daily visual experience, and thereby offer a new perspective for human activity understanding. Importantly, egocentric video aligns gaze, embodied movement, and action in the same “first-person” coordinate system. The rich egocentric cues reflect the attended scene context of an action, and thereby provide novel means for reasoning human daily routines. In my thesis work, I describe my efforts on developing novel computational models that learn the embodied egocentric attention for the automatic analysis of egocentric actions. First, I introduce a probabilistic model for learning gaze and actions in egocentric video and further demonstrate that attention can serve as a robust tool for learning motion-aware video representation. Second, I develop a novel deep model to address the challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos. Third, I present a novel deep latent variable model that makes use of human intentional body movement (motor attention) as a key representation for forecasting human-object interaction in egocentric video. Finally, I propose a novel task of future hand segmentation from egocentric videos, and show how explicitly modeling the future head motion can facilitate future hand movement forecasting.Ph.D

    Deep Learning in Mobile and Wireless Networking: A Survey

    Get PDF
    The rapid uptake of mobile devices and the rising popularity of mobile applications and services pose unprecedented demands on mobile and wireless networking infrastructure. Upcoming 5G systems are evolving to support exploding mobile traffic volumes, agile management of network resource to maximize user experience, and extraction of fine-grained real-time analytics. Fulfilling these tasks is challenging, as mobile environments are increasingly complex, heterogeneous, and evolving. One potential solution is to resort to advanced machine learning techniques to help managing the rise in data volumes and algorithm-driven applications. The recent success of deep learning underpins new and powerful tools that tackle problems in this space. In this paper we bridge the gap between deep learning and mobile and wireless networking research, by presenting a comprehensive survey of the crossovers between the two areas. We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. Subsequently, we provide an encyclopedic review of mobile and wireless networking research based on deep learning, which we categorize by different domains. Drawing from our experience, we discuss how to tailor deep learning to mobile environments. We complete this survey by pinpointing current challenges and open future directions for research

    Design Framework of UAV-Based Environment Sensing, Localization, and Imaging System

    Get PDF
    In this dissertation research, we develop a framework for designing an Unmanned Aerial Vehicle or UAV-based environment sensing, localization, and imaging system for challenging environments with no GPS signals and low visibility. The UAV system relies on the various sensors that it carries to conduct accurate sensing and localization of the objects in an environment, and further to reconstruct the 3D shapes of those objects. The system can be very useful when exploring an unknown or dangerous environment, e.g., a disaster site, which is not convenient or not accessible for humans. In addition, the system can be used for monitoring and object tracking in a large scale environment, e.g., a smart manufacturing factory, for the purposes of workplace management/safety, and maintaining optimal system performance/productivity. In our framework, the UAV system is comprised of two subsystems: a sensing and localization subsystem; and a mmWave radar-based 3D object reconstruction subsystem. The first subsystem is referred to as LIDAUS (Localization of IoT Device via Anchor UAV SLAM), which is an infrastructure-free, multi-stage SLAM (Simultaneous Localization and Mapping) system that utilizes a UAV to accurately localize and track IoT devices in a space with weak or no GPS signals. The rapidly increasing deployment of Internet of Things (IoT) around the world is changing many aspects of our society. IoT devices can be deployed in various places for different purposes, e.g., in a manufacturing site or a large warehouse, and they can be displaced over time due to human activities, or manufacturing processes. Usually in an indoor environment, the lack of GPS signals and infrastructure support makes most existing indoor localization systems not practical when localizing a large number of wireless IoT devices. In addition, safety concerns, access restriction, and simply the huge amount of IoT devices make it not practical for humans to manually localize and track IoT devices. Our LIDAUS is developed to address these problems. The UAV in our LIDAUS system conducts multi-stage 3D SLAM trips to localize devices based only on Received Signal Strength Indicator (RSSI), the most widely available measurement of the signals of almost all commodity IoT devices. Our simulations and experiments of Bluetooth IoT devices demonstrate that our system LIDAUS can achieve high localization accuracy based only on RSSIs of commodity IoT devices. Build on the first subsystem, we further develop the second subsystem for environment reconstruction and imaging via mmWave radar and deep learning. This subsystem is referred to as 3DRIMR/R2P (3D Reconstruction and Imaging via mmWave Radar/Radar to Point Cloud). It enables an exploring UAV to fly within an environment and collect mmWave radar data by scanning various objects in the environment. Taking advantage of the accurate locations given by the first subsystem, the UAV can scan an object from different viewpoints. Then based on radar data only, the UAV can reconstruct the 3D shapes of the objects in the space. mmWave radar has been shown as an effective sensing technique in low visibility, smoke, dusty, and dense fog environment. However, tapping the potential of radar sensing to reconstruct 3D object shapes remains a great challenge, due to the characteristics of radar data such as sparsity, low resolution, specularity, large noise, and multi-path induced shadow reflections and artifacts. Hence, it is challenging to reconstruct 3D object shapes based on the raw sparse and low-resolution mmWave radar signals. To address the challenges, our second subsystem utilizes deep learning models to extract features from sparse raw mmWave radar intensity data, and reconstructs 3D shapes of objects in the format of dense and detailed point cloud. We first develop a deep learning model to reconstruct a single object’s 3D shape. The model first converts mmWave radar data to depth images, and then reconstructs an object’s 3D shape in point cloud format. Our experiments demonstrate the significant performance improvement of our system over the popular existing methods such as PointNet, PointNet++ and PCN. Then we further explore the feasibility of utilizing a mmWave radar sensor installed on a UAV to reconstruct the 3D shapes of multiple objects in a space. We evaluate two different models. Model 1 is 3DRIMR/R2P model, and Model 2 is formed by adding a segmentation stage in the processing pipeline of Model 1. Our experiments demonstrate that both models are promising in solving the multiple object reconstruction problem. We also show that Model 2, despite producing denser and smoother point clouds, can lead to higher reconstruction loss or even missing objects. In addition, we find that both models are robust to the highly noisy radar data obtained by unstable Synthetic Aperture Radar (SAR) operation due to the instability or vibration of a small UAV hovering at its intended scanning point. Our research shows a promising direction of applying mmWave radar sensing in 3D object reconstruction

    Methods for three-dimensional Registration of Multimodal Abdominal Image Data

    Get PDF
    Multimodal image registration benefits the diagnosis, treatment planning and the performance of image-guided procedures in the liver, since it enables the fusion of complementary information provided by pre- and intrainterventional data about tumor localization and access. Although there exist various registration methods, approaches which are specifically optimized for the registration of multimodal abdominal scans are only scarcely available. The work presented in this thesis aims to tackle this problem by focusing on the development, optimization and evaluation of registration methods specifically for the registration of multimodal liver scans. The contributions to the research field of medical image registration include the development of a registration evaluation methodology that enables the comparison and optimization of linear and non-linear registration algorithms using a point-based accuracy measure. This methodology has been used to benchmark standard registration methods as well as novel approaches that were developed within the frame of this thesis. The results of the methodology showed that the employed similarity measure used during the registration has a major impact on the registration accuracy of the method. Due to this influence, two alternative similarity metrics bearing the potential to be used on multimodal image data are proposed and evaluated. The first metric relies on the use of gradient information in form of Histograms of Oriented Gradients (HOG) whereas the second metric employs a siamese neural network to learn a similarity measure directly on the image data. The evaluation showed, that both metrics could compete with state of the art similarity measures in terms of registration accuracy. The HOG-metric offers the advantage that it does not require ground truth data to learn a similarity estimation, but instead it is applicable to various data sets with the sole requirement of distinct gradients. However, the Siamese metric is characterized by a higher robustness for large rotations than the HOG-metric. To train such a network, registered ground truth data is required which may be critical for multimodal image data. Yet, the results show that it is possible to apply models trained on registered synthetic data on real patient data. The last part of this thesis focuses on methods to learn an entire registration process using neural networks, thereby offering the advantage to replace the traditional, time-consuming iterative registration procedure. Within the frame of this thesis, the so-called VoxelMorph network which was originally proposed for monomodal, non-linear registration learning is extended for affine and multimodal registration learning tasks. This extension includes the consideration of an image mask during metric evaluation as well as loss functions for multimodal data, such as the pretrained Siamese metric and a loss relying on the comparison of deformation fields. Based on the developed registration evaluation methodology, the performance of the original network as well as the extended variants are evaluated for monomodal and multimodal registration tasks using multiple data sets. With the extended network variants, it is possible to learn an entire multimodal registration process for the correction of large image displacements. As for the Siamese metric, the results imply a general transferability of models trained with synthetic data to registration tasks including real patient data. Due to the lack of multimodal ground truth data, this transfer represents an important step towards making Deep Learning based registration procedures clinically usable

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
    corecore