50 research outputs found

    Incremental Learning of Object Models From Natural Human-Robot Interactions

    Get PDF
    In order to perform complex tasks in realistic human environments, robots need to be able to learn new concepts in the wild, incrementally, and through their interactions with humans. This article presents an end-to-end pipeline to learn object models incrementally during the human-robot interaction (HRI). The pipeline we propose consists of three parts: 1) recognizing the interaction type; 2) detecting the object that the interaction is targeting; and 3) learning incrementally the models from data recorded by the robot sensors. Our main contributions lie in the target object detection, guided by the recognized interaction, and in the incremental object learning. The novelty of our approach is the focus on natural, heterogeneous, and multimodal HRIs to incrementally learn new object models. Throughout the article, we highlight the main challenges associated with this problem, such as high degree of occlusion and clutter, domain change, low-resolution data, and interaction ambiguity. This article shows the benefits of using multiview approaches and combining visual and language features, and our experimental results outperform standard baselines

    Learning from human-robot interaction

    Get PDF
    En los 煤ltimos a帽os cada vez es m谩s frecuente ver robots en los hogares. La rob贸tica est谩 cada vez m谩s presente en muchos aspectos de nuestras vidas diarias, en aparatos de asistencia dom茅stica, coches aut贸nomos o asistentes personales. La interacci贸n entre estos robots asistentes y los usuarios es uno de los aspectos clave en la rob贸tica de servicio. Esta interacci贸n necesita ser c贸moda e intuitiva para que sea efectiva su utilizaci贸n. Estas interacciones con los usuarios son necesarias para que el robot aprenda y actualice de manera natural tanto su modelo del mundo como sus capacidades. Dentro de los sistemas roboticos de servicio, hay muchos componentes que son necesarios para su buen funcionamiento. Esta tesis esta centrada en el sistema de percepci贸n visual de dichos sistemas.Para los humanos la percepci贸n visual es uno de los componentes m谩s esenciales, permitiendo tareas como reconocimiento de objetos u otras personas, o estimaci贸n de informaci贸n 3D. Los grandes logros obtenidos en los 煤ltimos a帽os en tareas de reconocimiento autom谩tico utilizan los enfoques basados en aprendizaje autom谩tico, en particular t茅cnicas de deep learning. La mayor铆a de estos trabajos actuales se centran en modelos entrenados 'a priori' en un conjunto de datos muy grandes. Sin embargo, estos modelos, aunque entrenados en una gran cantidad de datos, no pueden, en general, hacer frente a los retos que aparecen al tratar con datos reales en entornos dom茅sticos. Por ejemplo, es frecuente que se de el caso de tener nuevos objetos que no exist铆an durante el entrenamiento de los modelos. Otro reto viene de la dispersi贸n de los objetos, teniendo objetos que aparecen muy raramente y por lo tanto habia muy pocos, o ning煤n, ejemplos en los datos de entenamiento disponibles al crear el modelo.Esta tesis se ha desarrollado dentro del contexto del proyecto IGLU (Interactive Grounded Language Understanding). Dentro del proyecto y sus objetivos, el objetivo principal de esta Tesis doctoral es investigar m茅todos novedosos para que un robot aprenda de manera incremental mediante la interacci贸n multimodal con el usuario.Desarrollando dicho objetivo principal, los principales trabajos desarrollados durante esta tesis han sido:-Crear un benchmark m谩s adecuado para las tareas de aprendizaje mediante la interacci贸n natural de usuario y robot. Por ejemplo, la mayor铆a de los datasets para la tarea de reconocimiento de objetos se centra en fotos de diferentes escenarios con m煤ltiples clases por foto. Es necesario un dataset que combine interacci贸n usuario robot con aprendizaje de objetos.-Mejorar sistemas existentes de aprendizaje de objetos y adecuarlos para aprendizaje desde la interacci贸n multimodal humana. Los trabajos de detecci贸n de objetos se focalizan en detectar todos los objetos aprendidos en una imagen. Nuestro objetivo es usar la interacci贸n para encontrar el objeto de referencia y aprenderlo incrementalmente.-Desarrollar m茅todos de aprendizaje incremental que se puedan utilizar en escenarios incrementales, p.e., la aparici贸n de una nueva clase de objeto o cambios a lo largo del tiempo dentro de una clase objetos. Nuestro objetivo es dise帽ar un sistema que pueda aprender clases desde cero y que pueda actualizar los datos cuando estos aparecen.-Crear un completo prototipo para el aprendizaje incremental y multimodal usando la interacci贸n humana-robot. Se necesita realizar la integraci贸n de los distintos m茅todos desarrollados como parte de los otros objetivos y evaluarlo.<br /

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Deep Learning for Aerial Scene Understanding in High Resolution Remote Sensing Imagery from the Lab to the Wild

    Get PDF
    Diese Arbeit pr盲sentiert die Anwendung von Deep Learning beim Verst盲ndnis von Luftszenen, z. B. Luftszenenerkennung, Multi-Label-Objektklassifizierung und semantische Segmentierung. Abgesehen vom Training tiefer Netzwerke unter Laborbedingungen bietet diese Arbeit auch Lernstrategien f眉r praktische Szenarien, z. B. werden Daten ohne Einschr盲nkungen gesammelt oder Annotationen sind knapp

    MCCFNet: multi-channel color fusion network for cognitive classification of traditional Chinese paintings.

    Get PDF
    The computational modeling and analysis of traditional Chinese painting rely heavily on cognitive classification based on visual perception. This approach is crucial for understanding and identifying artworks created by different artists. However, the effective integration of visual perception into artificial intelligence (AI) models remains largely unexplored. Additionally, the classification research of Chinese painting faces certain challenges, such as insufficient investigation into the specific characteristics of painting images for author classification and recognition. To address these issues, we propose a novel framework called multi-channel color fusion network (MCCFNet), which aims to extract visual features from diverse color perspectives. By considering multiple color channels, MCCFNet enhances the ability of AI models to capture intricate details and nuances present in Chinese painting. To improve the performance of the DenseNet model, we introduce a regional weighted pooling (RWP) strategy specifically designed for the DenseNet169 architecture. This strategy enhances the extraction of highly discriminative features. In our experimental evaluation, we comprehensively compared the performance of our proposed MCCFNet model against six state-of-the-art models. The comparison was conducted on a dataset consisting of 2436 TCP samples, derived from the works of 10 renowned Chinese artists. The evaluation metrics employed for performance assessment were Top-1 Accuracy and the area under the curve (AUC). The experimental results have shown that our proposed MCCFNet model significantly outperform all other benchmarking methods with the highest classification accuracy of 98.68%. Meanwhile, the classification accuracy of any deep learning models on TCP can be much improved when adopting our proposed framework

    Smart Road Danger Detection and Warning

    Get PDF
    Road dangers have caused numerous accidents, thus detecting them and warning users are critical to improving traffic safety. However, it is challenging to recognize road dangers from numerous normal data and warn road users due to cluttered real-world backgrounds, ever-changing road danger appearances, high intra-class differences, limited data for one party, and high privacy leakage risk of sensitive information. To address these challenges, in this thesis, three novel road danger detection and warning frameworks are proposed to improve the performance of real-time road danger prediction and notification in challenging real-world environments in four main aspects, i.e., accuracy, latency, communication efficiency, and privacy. Firstly, many existing road danger detection systems mainly process data on clouds. However, they cannot warn users timely about road dangers due to long distances. Meanwhile, supervised machine learning algorithms are usually used in these systems requiring large and precisely labeled datasets to perform well. The EcRD is proposed to improve latency and reduce labeling cost, which is an Edge-cloud-based Road Damage detection and warning framework that leverages the fast-responding advantage of edges and the large storage and computation resources advantages of the cloud. In EcRD, a simple yet efficient road segmentation algorithm is introduced for fast and accurate road area detection by filtering out noisy backgrounds. Additionally, a light-weighted road damage detector is developed based on Gray Level Co-occurrence Matrix (GLCM) features on edges for rapid hazardous road damage detection and warning. Further, a multi-types road damage detection model is proposed for long-term road management on the cloud, embedded with a novel image-label generator based on Cycle-Consistent Adversarial Networks, which automatically generates images with corresponding labels to improve road damage detection accuracy further. EcRD achieves 91.96% accuracy with only 0.0043s latency, which is around 579 times faster than cloud-based approaches without affecting users' experience while requiring very low storage and labeling cost. Secondly, although EcRD relieves the problem of high latency by edge computing techniques, road users can only achieve warnings of hazardous road damages within a small area due to the limited communication range of edges. Besides, untrusted edges might misuse users' personal information. A novel FedRD named FedRD is developed to improve the coverage range of warning information and protect data privacy. In FedRD, a new hazardous road damage detection model is proposed leveraging the advantages of feature fusion. A novel adaptive federated learning strategy is designed for high-performance model learning from different edges. A new individualized differential privacy approach with pixelization is proposed to protect users' privacy before sharing data. Simulation results show that FedRD achieves similar high detection performance (i.e., 90.32% accuracy) but with more than 1000 times wider coverage than the state-of-the-art, and works well when some edges only have limited samples; besides, it largely preserves users' privacy. Finally, despite the success of EcRD and FedRD in improving latency and protecting privacy, they are only based on a single modality (i.e., image/video) while nowadays, different modalities data becomes ubiquitous. Also, the communication cost of EcRD and FedRD are very high due to undifferentiated data transmission (both normal and dangerous data) and frequent model exchanges in its federated learning setting, respectively. A novel edge-cloud-based privacy-preserving Federated Multimodal learning framework for Road Danger detection and warning named FedMRD is introduced to leverage the multi-modality data in the real-world and reduce communication costs. In FedMRD, a novel multimodal road danger detection model considering both inter-and intra-class relations is developed. A communication-efficient federated learning strategy is proposed for collaborative model learning from edges with non-iid and imbalanced data. Further, a new multimodal differential privacy technique for high dimensional multimodal data with multiple attributes is introduced to protect data privacy directly on users' devices before uploading to edges. Experimental results demonstrate that FedMRD achieves around 96.42% higher accuracy with only 0.0351s latency and up to 250 times less communication cost compared with the state-of-the-art, and enables collaborative learning from multiple edges with non-iid and imbalanced data in different modalities while preservers users' privacy.2021-11-2

    Leveraging Supervoxels for Medical Image Volume Segmentation With Limited Supervision

    Get PDF
    The majority of existing methods for machine learning-based medical image segmentation are supervised models that require large amounts of fully annotated images. These types of datasets are typically not available in the medical domain and are difficult and expensive to generate. A wide-spread use of machine learning based models for medical image segmentation therefore requires the development of data-efficient algorithms that only require limited supervision. To address these challenges, this thesis presents new machine learning methodology for unsupervised lung tumor segmentation and few-shot learning based organ segmentation. When working in the limited supervision paradigm, exploiting the available information in the data is key. The methodology developed in this thesis leverages automatically generated supervoxels in various ways to exploit the structural information in the images. The work on unsupervised tumor segmentation explores the opportunity of performing clustering on a population-level in order to provide the algorithm with as much information as possible. To facilitate this population-level across-patient clustering, supervoxel representations are exploited to reduce the number of samples, and thereby the computational cost. In the work on few-shot learning-based organ segmentation, supervoxels are used to generate pseudo-labels for self-supervised training. Further, to obtain a model that is robust to the typically large and inhomogeneous background class, a novel anomaly detection-inspired classifier is proposed to ease the modelling of the background. To encourage the resulting segmentation maps to respect edges defined in the input space, a supervoxel-informed feature refinement module is proposed to refine the embedded feature vectors during inference. Finally, to improve trustworthiness, an architecture-agnostic mechanism to estimate model uncertainty in few-shot segmentation is developed. Results demonstrate that supervoxels are versatile tools for leveraging structural information in medical data when training segmentation models with limited supervision
    corecore