13 research outputs found

    Distributionally Robust Semi-Supervised Learning for People-Centric Sensing

    Full text link
    Semi-supervised learning is crucial for alleviating labelling burdens in people-centric sensing. However, human-generated data inherently suffer from distribution shift in semi-supervised learning due to the diverse biological conditions and behavior patterns of humans. To address this problem, we propose a generic distributionally robust model for semi-supervised learning on distributionally shifted data. Considering both the discrepancy and the consistency between the labeled data and the unlabeled data, we learn the latent features that reduce person-specific discrepancy and preserve task-specific consistency. We evaluate our model in a variety of people-centric recognition tasks on real-world datasets, including intention recognition, activity recognition, muscular movement recognition and gesture recognition. The experiment results demonstrate that the proposed model outperforms the state-of-the-art methods.Comment: 8 pages, accepted by AAAI201

    SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints

    Full text link
    We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi-supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labelling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public.Comment: This work has been submitted to the ICRA 2024 IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    A Semisupervised Recurrent Convolutional Attention Model for Human Activity Recognition.

    Full text link
    Recent years have witnessed the success of deep learning methods in human activity recognition (HAR). The longstanding shortage of labeled activity data inherently calls for a plethora of semisupervised learning methods, and one of the most challenging and common issues with semisupervised learning is the imbalanced distribution of labeled data over classes. Although the problem has long existed in broad real-world HAR applications, it is rarely explored in the literature. In this paper, we propose a semisupervised deep model for imbalanced activity recognition from multimodal wearable sensory data. We aim to address not only the challenges of multimodal sensor data (e.g., interperson variability and interclass similarity) but also the limited labeled data and class-imbalance issues simultaneously. In particular, we propose a pattern-balanced semisupervised framework to extract and preserve diverse latent patterns of activities. Furthermore, we exploit the independence of multi-modalities of sensory data and attentively identify salient regions that are indicative of human activities from inputs by our recurrent convolutional attention networks. Our experimental results demonstrate that the proposed model achieves a competitive performance compared to a multitude of state-of-the-art methods, both semisupervised and supervised ones, with 10% labeled training data. The results also show the robustness of our method over imbalanced, small training data sets

    A Novel Weakly-supervised approach for RGB-D-based Nuclear Waste Object Detection and Categorization

    Get PDF
    This paper addresses the problem of RGBD-based detection and categorization of waste objects for nuclear decommissioning. To enable autonomous robotic manipulation for nuclear decommissioning, nuclear waste objects must be detected and categorized. However, as a novel industrial application, large amounts of annotated waste object data are currently unavailable. To overcome this problem, we propose a weakly-supervised learning approach which is able to learn a deep convolutional neural network (DCNN) from unlabelled RGBD videos while requiring very few annotations. The proposed method also has the potential to be applied to other household or industrial applications. We evaluate our approach on the Washington RGBD object recognition benchmark, achieving the state-of-the-art performance among semi-supervised methods. More importantly, we introduce a novel dataset, i.e. Birmingham nuclear waste simulants dataset, and evaluate our proposed approach on this novel industrial object recognition challenge. We further propose a complete real-time pipeline for RGBD-based detection and categorization of nuclear waste simulants. Our weakly-supervised approach has demonstrated to be highly effective in solving a novel RGB-D object detection and recognition application with limited human annotations

    RGB-D Multicamera Object Detection and Tracking Implemented through Deep Learning

    Get PDF
    In this thesis we present the development of a multi object detection and tracking system in low light environment implemented by using a RGB-D multicamera system and the deep learning framework. For better understanding how the system works, some hardware and software components are presented such as RGB-D sensor cameras, multi object detection and tracking techniques. In addition a brief introduction of the main concepts of the neural networks are presented

    Система розпізнавання 3D об'єктів за стереоскопічним зображеннями для мобільного робота

    Get PDF
    Дипломний проект охоплює вивчення та розробку системи розпізнавання об'єктів в робототехніці на основі використання стереокамери та нейронних мереж. Було розглянуто проблематику розпізнавання в робототехніці, зокрема, розглядаються основні технології розпізнавання та технічні засоби у Computer Vision, на основі яких обрано апаратну частину. Досліджено нейронні мережі та їх використання для розпізнавання об'єктів, з акцентом на аналіз 3D даних та опис процесу вибору і адаптації архітектури нейронної мережі для розпізнавання 3D об'єктів на основі RGB та глибини. За даною архітектурою розроблено нейронну мережу, яку натренувано та випробувано для поставленої задачі. Проведений експеримент з метою порівняти вплив даних про глибину на точність розпізнавання нейронною мережею. Останнім пунктом, описана розробка системи для розпізнавання об’єктів, яка інтегрує навчену нейронну модель для роботи з обраною стереокамерою в єдиний застосунок. В результаті проекту було створено ефективну систему розпізнавання об'єктів для робототехніки, яка поєднує сучасні техніки комп'ютерного зору та нейронних мереж.The Bachelor's thesis covers the study and development of an object recognition system in robotics based on the use of a stereo camera and neural networks. The issues of recognition in robotics were considered, in particular, the main recognition technologies and technical means in Computer Vision were discussed, based on which the hardware part was chosen. Neural networks and their use for object recognition were explored, with an emphasis on analyzing 3D data and describing the process of selecting and adapting the architecture of the neural network for recognizing 3D objects based on RGB and depth. According to this architecture, a neural network was developed, trained, and tested for the set task. An experiment was conducted to compare the influence of depth data on the accuracy of recognition by a neural network. The final point describes the development of a system for object recognition, which integrates the trained neural model to work with the chosen stereo camera into a single application. As a result of the project, an effective object recognition system for robotics was created, which combines modern techniques of computer vision and neural networks

    An Overview of Deep Semi-Supervised Learning

    Full text link
    Deep neural networks demonstrated their ability to provide remarkable performances on a wide range of supervised learning tasks (e.g., image classification) when trained on extensive collections of labeled data (e.g., ImageNet). However, creating such large datasets requires a considerable amount of resources, time, and effort. Such resources may not be available in many practical cases, limiting the adoption and the application of many deep learning methods. In a search for more data-efficient deep learning methods to overcome the need for large annotated datasets, there is a rising research interest in semi-supervised learning and its applications to deep neural networks to reduce the amount of labeled data required, by either developing novel methods or adopting existing semi-supervised learning frameworks for a deep learning setting. In this paper, we provide a comprehensive overview of deep semi-supervised learning, starting with an introduction to the field, followed by a summarization of the dominant semi-supervised approaches in deep learning.Comment: Preprin

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition
    corecore