187 research outputs found

    Contributions to improve the technologies supporting unmanned aircraft operations

    Get PDF
    Mención Internacional en el título de doctorUnmanned Aerial Vehicles (UAVs), in their smaller versions known as drones, are becoming increasingly important in today's societies. The systems that make them up present a multitude of challenges, of which error can be considered the common denominator. The perception of the environment is measured by sensors that have errors, the models that interpret the information and/or define behaviors are approximations of the world and therefore also have errors. Explaining error allows extending the limits of deterministic models to address real-world problems. The performance of the technologies embedded in drones depends on our ability to understand, model, and control the error of the systems that integrate them, as well as new technologies that may emerge. Flight controllers integrate various subsystems that are generally dependent on other systems. One example is the guidance systems. These systems provide the engine's propulsion controller with the necessary information to accomplish a desired mission. For this purpose, the flight controller is made up of a control law for the guidance system that reacts to the information perceived by the perception and navigation systems. The error of any of the subsystems propagates through the ecosystem of the controller, so the study of each of them is essential. On the other hand, among the strategies for error control are state-space estimators, where the Kalman filter has been a great ally of engineers since its appearance in the 1960s. Kalman filters are at the heart of information fusion systems, minimizing the error covariance of the system and allowing the measured states to be filtered and estimated in the absence of observations. State Space Models (SSM) are developed based on a set of hypotheses for modeling the world. Among the assumptions are that the models of the world must be linear, Markovian, and that the error of their models must be Gaussian. In general, systems are not linear, so linearization are performed on models that are already approximations of the world. In other cases, the noise to be controlled is not Gaussian, but it is approximated to that distribution in order to be able to deal with it. On the other hand, many systems are not Markovian, i.e., their states do not depend only on the previous state, but there are other dependencies that state space models cannot handle. This thesis deals a collection of studies in which error is formulated and reduced. First, the error in a computer vision-based precision landing system is studied, then estimation and filtering problems from the deep learning approach are addressed. Finally, classification concepts with deep learning over trajectories are studied. The first case of the collection xviiistudies the consequences of error propagation in a machine vision-based precision landing system. This paper proposes a set of strategies to reduce the impact on the guidance system, and ultimately reduce the error. The next two studies approach the estimation and filtering problem from the deep learning approach, where error is a function to be minimized by learning. The last case of the collection deals with a trajectory classification problem with real data. This work completes the two main fields in deep learning, regression and classification, where the error is considered as a probability function of class membership.Los vehículos aéreos no tripulados (UAV) en sus versiones de pequeño tamaño conocidos como drones, van tomando protagonismo en las sociedades actuales. Los sistemas que los componen presentan multitud de retos entre los cuales el error se puede considerar como el denominador común. La percepción del entorno se mide mediante sensores que tienen error, los modelos que interpretan la información y/o definen comportamientos son aproximaciones del mundo y por consiguiente también presentan error. Explicar el error permite extender los límites de los modelos deterministas para abordar problemas del mundo real. El rendimiento de las tecnologías embarcadas en los drones, dependen de nuestra capacidad de comprender, modelar y controlar el error de los sistemas que los integran, así como de las nuevas tecnologías que puedan surgir. Los controladores de vuelo integran diferentes subsistemas los cuales generalmente son dependientes de otros sistemas. Un caso de esta situación son los sistemas de guiado. Estos sistemas son los encargados de proporcionar al controlador de los motores información necesaria para cumplir con una misión deseada. Para ello se componen de una ley de control de guiado que reacciona a la información percibida por los sistemas de percepción y navegación. El error de cualquiera de estos sistemas se propaga por el ecosistema del controlador siendo vital su estudio. Por otro lado, entre las estrategias para abordar el control del error se encuentran los estimadores en espacios de estados, donde el filtro de Kalman desde su aparición en los años 60, ha sido y continúa siendo un gran aliado para los ingenieros. Los filtros de Kalman son el corazón de los sistemas de fusión de información, los cuales minimizan la covarianza del error del sistema, permitiendo filtrar los estados medidos y estimarlos cuando no se tienen observaciones. Los modelos de espacios de estados se desarrollan en base a un conjunto de hipótesis para modelar el mundo. Entre las hipótesis se encuentra que los modelos del mundo han de ser lineales, markovianos y que el error de sus modelos ha de ser gaussiano. Generalmente los sistemas no son lineales por lo que se realizan linealizaciones sobre modelos que a su vez ya son aproximaciones del mundo. En otros casos el ruido que se desea controlar no es gaussiano, pero se aproxima a esta distribución para poder abordarlo. Por otro lado, multitud de sistemas no son markovianos, es decir, sus estados no solo dependen del estado anterior, sino que existen otras dependencias que los modelos de espacio de estados no son capaces de abordar. Esta tesis aborda un compendio de estudios sobre los que se formula y reduce el error. En primer lugar, se estudia el error en un sistema de aterrizaje de precisión basado en visión por computador. Después se plantean problemas de estimación y filtrado desde la aproximación del aprendizaje profundo. Por último, se estudian los conceptos de clasificación con aprendizaje profundo sobre trayectorias. El primer caso del compendio estudia las consecuencias de la propagación del error de un sistema de aterrizaje de precisión basado en visión artificial. En este trabajo se propone un conjunto de estrategias para reducir el impacto sobre el sistema de guiado, y en última instancia reducir el error. Los siguientes dos estudios abordan el problema de estimación y filtrado desde la perspectiva del aprendizaje profundo, donde el error es una función que minimizar mediante aprendizaje. El último caso del compendio aborda un problema de clasificación de trayectorias con datos reales. Con este trabajo se completan los dos campos principales en aprendizaje profundo, regresión y clasificación, donde se plantea el error como una función de probabilidad de pertenencia a una clase.I would like to thank the Ministry of Science and Innovation for granting me the funding with reference PRE2018-086793, associated to the project TEC2017-88048-C2-2-R, which provide me the opportunity to carry out all my PhD. activities, including completing an international research internship.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Antonio Berlanga de Jesús.- Secretario: Daniel Arias Medina.- Vocal: Alejandro Martínez Cav

    Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot Interaction

    Get PDF
    Roboter finden aufgrund ihrer außergewöhnlichen Arbeitsleistung, Präzision, Effizienz und Skalierbarkeit immer mehr Verwendung in den verschiedensten Anwendungsbereichen. Diese Entwicklung wurde zusätzlich begünstigt durch Fortschritte in der Künstlichen Intelligenz (KI), insbesondere im Maschinellem Lernen (ML). Mit Hilfe moderner neuronaler Netze sind Roboter in der Lage, Objekte in ihrer Umgebung zu erkennen und mit ihnen zu interagieren. Ein erhebliches Manko besteht jedoch darin, dass das Training dieser Objekterkennungsmodelle, in aller Regel mit einer zugrundeliegenden Abhängig von umfangreichen Datensätzen und der Verfügbarkeit großer Datenmengen einhergeht. Dies ist insbesondere dann problematisch, wenn der konkrete Einsatzort des Roboters und die Umgebung, einschließlich der darin befindlichen Objekte, nicht im Voraus bekannt sind. Die breite und ständig wachsende Palette von Objekten macht es dabei praktisch unmöglich, das gesamte Spektrum an existierenden Objekten allein mit bereits zuvor erstellten Datensätzen vollständig abzudecken. Das Ziel dieser Dissertation war es, einem Roboter unbekannte Objekte mit Hilfe von Human-Robot Interaction (HRI) beizubringen, um ihn von seiner Abhängigkeit von Daten sowie den Einschränkungen durch vordefinierte Szenarien zu befreien. Die Synergie von Eye Tracking und Augmented Reality (AR) ermöglichte es dem als Lehrer fungierenden Menschen, mit dem Roboter zu kommunizieren und ihn mittels des menschlichen Blickes auf Objekte hinzuweisen. Dieser holistische Ansatz ermöglichte die Konzeption eines multimodalen HRI-Systems, durch das der Roboter Objekte identifizieren und dreidimensional segmentieren konnte, obwohl sie ihm zu diesem Zeitpunkt noch unbekannt waren, um sie anschließend aus unterschiedlichen Blickwinkeln eigenständig zu inspizieren. Anhand der Klasseninformationen, die ihm der Mensch mitteilte, war der Roboter daraufhin in der Lage, die entsprechenden Objekte zu erlernen und später wiederzuerkennen. Mit dem Wissen, das dem Roboter durch diesen auf HRI basierenden Lehrvorgang beigebracht worden war, war dessen Fähigkeit Objekte zu erkennen vergleichbar mit den Fähigkeiten modernster Objektdetektoren, die auf umfangreichen Datensätzen trainiert worden waren. Dabei war der Roboter jedoch nicht auf vordefinierte Klassen beschränkt, was seine Vielseitigkeit und Anpassungsfähigkeit unter Beweis stellte. Die im Rahmen dieser Dissertation durchgeführte Forschung leistete bedeutende Beiträge an der Schnittstelle von Machine Learning (ML), AR, Eye Tracking und Robotik. Diese Erkenntnisse tragen nicht nur zum besseren Verständnis der genannten Felder bei, sondern ebnen auch den Weg für weitere interdisziplinäre Forschung. Die in dieser Dissertation enthalten wissenschaftlichen Artikel wurden auf hochrangigen Konferenzen in den Bereichen Robotik, Eye Tracking und HRI veröffentlicht.Robots are becoming increasingly popular in a wide range of environments due to their exceptional work capacity, precision, efficiency, and scalability. This development has been further encouraged by advances in Artificial Intelligence (AI), particularly Machine Learning (ML). By employing sophisticated neural networks, robots are given the ability to detect and interact with objects in their vicinity. However, a significant drawback arises from the underlying dependency on extensive datasets and the availability of substantial amounts of training data for these object detection models. This issue becomes particularly problematic when the specific deployment location of the robot and the surroundings, including the objects within it, are not known in advance. The vast and ever-expanding array of objects makes it virtually impossible to comprehensively cover the entire spectrum of existing objects using preexisting datasets alone. The goal of this dissertation was to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) in order to liberate it from its data dependency, unleashing it from predefined scenarios. In this context, the combination of eye tracking and Augmented Reality (AR) created a powerful synergy that empowered the human teacher to seamlessly communicate with the robot and effortlessly point out objects by means of human gaze. This holistic approach led to the development of a multimodal HRI system that enabled the robot to identify and visually segment the Objects of Interest (OOIs) in three-dimensional space, even though they were initially unknown to it, and then examine them autonomously from different angles. Through the class information provided by the human, the robot was able to learn the objects and redetect them at a later stage. Due to the knowledge gained from this HRI based teaching process, the robot’s object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets, without being restricted to predefined classes, showcasing its versatility and adaptability. The research conducted within the scope of this dissertation made significant contributions at the intersection of ML, AR, eye tracking, and robotics. These findings not only enhance the understanding of these fields, but also pave the way for further interdisciplinary research. The scientific articles included in this dissertation have been published at high-impact conferences in the fields of robotics, eye tracking, and HRI

    Deep Representation Learning with Limited Data for Biomedical Image Synthesis, Segmentation, and Detection

    Get PDF
    Biomedical imaging requires accurate expert annotation and interpretation that can aid medical staff and clinicians in automating differential diagnosis and solving underlying health conditions. With the advent of Deep learning, it has become a standard for reaching expert-level performance in non-invasive biomedical imaging tasks by training with large image datasets. However, with the need for large publicly available datasets, training a deep learning model to learn intrinsic representations becomes harder. Representation learning with limited data has introduced new learning techniques, such as Generative Adversarial Networks, Semi-supervised Learning, and Self-supervised Learning, that can be applied to various biomedical applications. For example, ophthalmologists use color funduscopy (CF) and fluorescein angiography (FA) to diagnose retinal degenerative diseases. However, fluorescein angiography requires injecting a dye, which can create adverse reactions in the patients. So, to alleviate this, a non-invasive technique needs to be developed that can translate fluorescein angiography from fundus images. Similarly, color funduscopy and optical coherence tomography (OCT) are also utilized to semantically segment the vasculature and fluid build-up in spatial and volumetric retinal imaging, which can help with the future prognosis of diseases. Although many automated techniques have been proposed for medical image segmentation, the main drawback is the model's precision in pixel-wise predictions. Another critical challenge in the biomedical imaging field is accurately segmenting and quantifying dynamic behaviors of calcium signals in cells. Calcium imaging is a widely utilized approach to studying subcellular calcium activity and cell function; however, large datasets have yielded a profound need for fast, accurate, and standardized analyses of calcium signals. For example, image sequences from calcium signals in colonic pacemaker cells ICC (Interstitial cells of Cajal) suffer from motion artifacts and high periodic and sensor noise, making it difficult to accurately segment and quantify calcium signal events. Moreover, it is time-consuming and tedious to annotate such a large volume of calcium image stacks or videos and extract their associated spatiotemporal maps. To address these problems, we propose various deep representation learning architectures that utilize limited labels and annotations to address the critical challenges in these biomedical applications. To this end, we detail our proposed semi-supervised, generative adversarial networks and transformer-based architectures for individual learning tasks such as retinal image-to-image translation, vessel and fluid segmentation from fundus and OCT images, breast micro-mass segmentation, and sub-cellular calcium events tracking from videos and spatiotemporal map quantification. We also illustrate two multi-modal multi-task learning frameworks with applications that can be extended to other domains of biomedical applications. The main idea is to incorporate each of these as individual modules to our proposed multi-modal frameworks to solve the existing challenges with 1) Fluorescein angiography synthesis, 2) Retinal vessel and fluid segmentation, 3) Breast micro-mass segmentation, and 4) Dynamic quantification of calcium imaging datasets

    Long-term liver lesion tracking in contrast-enhanced ultrasound videos via a siamese network with temporal motion attention

    Get PDF
    Propose: Contrast-enhanced ultrasound has shown great promises for diagnosis and monitoring in a wide range of clinical conditions. Meanwhile, to obtain accurate and effective location of lesion in contrast-enhanced ultrasound videos is the basis for subsequent diagnosis and qualitative treatment, which is a challenging task nowadays.Methods: We propose to upgrade a siamese architecture-based neural network for robust and accurate landmark tracking in contrast-enhanced ultrasound videos. Due to few researches on it, the general inherent assumptions of the constant position model and the missing motion model remain unaddressed limitations. In our proposed model, we overcome these limitations by introducing two modules into the original architecture. We use a temporal motion attention based on Lucas Kanade optic flow and Karman filter to model the regular movement and better instruct location prediction. Moreover, we design a pipeline of template update to ensure timely adaptation to feature changes.Results: Eventually, the whole framework was performed on our collected datasets. It has achieved the average mean IoU values of 86.43% on 33 labeled videos with a total of 37,549 frames. In terms of tracking stability, our model has smaller TE of 19.2 pixels and RMSE of 27.6 with the FPS of 8.36 ± 3.23 compared to other classical tracking models.Conclusion: We designed and implemented a pipeline for tracking focal areas in contrast-enhanced ultrasound videos, which takes the siamese network as the backbone and uses optical flow and Kalman filter algorithm to provide position prior information. It turns out that these two additional modules are helpful for the analysis of CEUS videos. We hope that our work can provide an idea for the analysis of CEUS videos

    Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking

    Full text link
    Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.Comment: 13 pages, 8 figures, under revie

    Deep Neural Networks for Visual Object Tracking: An Investigation of Performance Optimization

    Get PDF
    This thesis advances visual object tracking by introducing four key enhancements that address current limitations in training data, network architecture, and tracking methodologies. It proposes a refined sampling strategy for Siamese Networks to enrich training data and develops a more efficient Partially Siamese Network through neural architecture search, achieving superior performance on benchmarks. The work further streamlines tracking with a new transformer-based pipeline and breaks ground with a speech-guided tracking framework, improving human-machine collaboration. These advancements are thoroughly validated, marking significant progress in the visual object tracking domain

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Deep Learning for Video Object Segmentation:A Review

    Get PDF
    As one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review

    Towards Real-World Visual Tracking with Temporal Contexts

    Full text link
    Visual tracking has made significant improvements in the past few decades. Most existing state-of-the-art trackers 1) merely aim for performance in ideal conditions while overlooking the real-world conditions; 2) adopt the tracking-by-detection paradigm, neglecting rich temporal contexts; 3) only integrate the temporal information into the template, where temporal contexts among consecutive frames are far from being fully utilized. To handle those problems, we propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently. Based on it, we propose a stronger version for real-world visual tracking, i.e., TCTrack++. It boils down to two levels: features and similarity maps. Specifically, for feature extraction, we propose an attention-based temporally adaptive convolution to enhance the spatial features using temporal information, which is achieved by dynamically calibrating the convolution weights. For similarity map refinement, we introduce an adaptive temporal transformer to encode the temporal knowledge efficiently and decode it for the accurate refinement of the similarity map. To further improve the performance, we additionally introduce a curriculum learning strategy. Also, we adopt online evaluation to measure performance in real-world conditions. Exhaustive experiments on 8 wellknown benchmarks demonstrate the superiority of TCTrack++. Real-world tests directly verify that TCTrack++ can be readily used in real-world applications.Comment: Accepted by IEEE TPAMI, Code: https://github.com/vision4robotics/TCTrac

    Vision and Depth Based Computerized Anthropometry and Object Tracking

    Get PDF
    The thesis has two interconnected parts: Computerized Anthropometry and RGBD (RGB plus Depth) object tracking. In the first part of this thesis, we start from the mathematical representation of the human body shape model. It briefly introduces prior arts from the classic human body models to the latest deep neural network based approaches. We describe the performance metrics and popular datasets for evaluating computerized anthropometry estimation algorithms in a unified setting. The first part of this thesis is about describing our contribution over two aspects of human body anthropometry research: 1) a statistical method for estimating anthropometric measurements from scans, and 2) a deep neural network based solution for learning anthropometric measurements from binary silhouettes. We also release two body shape datasets for accommodating data driven learning methods. In the second part of this thesis, we explore RGBD object tracking. We start from the current state of RGBD tracking compared to RGB tracking and briefly introduce prior arts from engineered features based methods to deep neural network based methods. We present three deep learning based methods that integrate deep depth features into RGBD object tracking. We also release a unified RGBD tracking benchmark for data driven RGBD tracking algorithms. Finally, we explore RGBD tracking with deep depth features and demonstrate that depth cues significantly benefit the target model learning
    corecore