72 research outputs found

    DeepSUM: Deep Neural Network for Super-Resolution of Unregistered Multitemporal Images

    Get PDF
    Recently, convolutional neural networks (CNNs) have been successfully applied to many remote sensing problems. However, deep learning techniques for multi-image super-resolution (SR) from multitemporal unregistered imagery have received little attention so far. This article proposes a novel CNN-based technique that exploits both spatial and temporal correlations to combine multiple images. This novel framework integrates the spatial registration task directly inside the CNN, and allows one to exploit the representation learning capabilities of the network to enhance registration accuracy. The entire SR process relies on a single CNN with three main stages: shared 2-D convolutions to extract high-dimensional features from the input images; a subnetwork proposing registration filters derived from the high-dimensional feature representations; 3-D convolutions for slow fusion of the features from multiple images. The whole network can be trained end-to-end to recover a single high-resolution image from multiple unregistered low-resolution images. The method presented in this article is the winner of the PROBA-V SR challenge issued by the European Space Agency (ESA)

    Deep learning for inverse problems in remote sensing: super-resolution and SAR despeckling

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Towards Developing Computer Vision Algorithms and Architectures for Real-world Applications

    Get PDF
    abstract: Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time. Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists. Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Tracking and analysis of movement at different scales: from endosomes to humans

    Get PDF
    Movement is apparent across all spatio-temporal scales in biology and can have a significant effect on the survival of the individual. For this reason, it has been the object of study in a wide range of research fields, i.e. in molecular biology, pharmaceutics, medical research but also in behavioural biology and ecology. The aim of the thesis was to provide methodologies and insight on the movement patterns seen at different spatio-temporal scales in biology; the intra-cellular, the cellular and the organism level. At the intra-cellular level, current thesis studied the compartmental inheritance in Human Osteosarcoma (U2-OS) cells. The inheritance pattern of the endosomal quantum dot fluorescence across two consecutive generations was for first time empirically revealed. In addition, a in silico model was developed to predict the inheritance across multiple generations. At the cellular level, a semi-automated routine was developed that can realize long-term nuclei tracking in U2-OS cell populations labeled with a cell cycle marker in their cytoplasm. A method to extract cell cycle information without the need to explicitly segment the cells was proposed. The movement behaviour of the cellular population and their possible inter-individual differences was also studied. Lastly, at the organism level, the focus of the thesis was to study the emergence of coordination in unfamiliar free-swimming stickleback fish shoals. It was demonstrated that there exist two different phases, the uncoordinated and the coordinated. In addition, the significance of uncoordinated phase to the establishment of the group’s social network was for first time evinced. The adaptation of the stickleback collectives was also studied over time, i.e. the effect of group’s repeated interactions on the emergence of coordination. Findings at the intra-cellular and cellular level can have significant implications on medical and pharmaceutical research. Findings at the organism level can also contribute to the understanding of how social interactions are formed and maintained in animal collectives

    Deep learning for image restoration and enhancement

    Get PDF
    Image restoration is the process of recovering an original clean image from its degraded version, and image enhancement takes the goal of improving the image quality either objectively or subjectively. Both of them play a key part in computer vision and image processing and have broad applications in industry. The past few years have witnessed the revival of deep learning in computer vision, and substantial progress has been made due to the use of deep neural networks. In this dissertation, we use deep learning to address the problems of image restoration and enhancement, with the focus on the following topics: image and video super-resolution (SR), as well as image denoising. For these problems, deep neural networks are generally used as a regression model to predict the original clean image content from the input. However, designing a network structure that can effectively exploit the intrinsic image properties to achieve remarkable performances is not a trivial task. For image SR, several models based on deep neural networks have been recently proposed and attained superior performance that overshadows all previous handcrafted models. The question then arises whether large-capacity and data-driven models have become the dominant solution to this ill-posed problem. We argue that domain expertise represented by the conventional sparse coding model is still valuable, and it can be combined with the key ingredients of deep learning to achieve further improved results. We experimentally show that a sparse coding model particularly designed for SR can be incarnated as a neural network, which can be trained from end to end. The interpretation of the network based on sparse coding leads to much more efficient and effective training, as well as a reduced model size. In addition, we design a unified framework to learn a mixture of sub-networks for image SR so as to further boost SR accuracy. Video SR aims to generate a high-resolution (HR) frame from multiple low-resolution (LR) frames in a local temporal window. The inter-frame temporal relation is as crucial as the intra-frame spatial relation for tackling this problem. We design deep networks for utilizing the temporal relation from two aspects. First, we propose a temporal adaptive neural network that can adaptively determine the optimal scale of temporal dependency. Filters on various temporal scales are applied to the input LR sequence before their responses are adaptively aggregated. Second, we reduce the complexity of motion between neighboring frames using a spatial alignment network which is much more robust and efficient than competing alignment methods and can be jointly trained with the temporal adaptive network. Image denoising, as another important task of image restoration, is dedicated to recovering the underlying image signal from its noisy measurement. First we customize a convolutional neural network for image denoising. Second we investigate the mutual relation between image denoising and high-level vision tasks in a deep learning fashion when image denoising serves as a preprocessing step for high-level vision tasks. We design a network that cascades two modules for image denoising and various high-level tasks, and use the joint loss for updating only the denoising network via back-propagation. Self-similarity in natural images is widely used for image restoration by many classic approaches. We propose a non-local recurrent network as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration. Unlike existing methods that measure self-similarity in an isolated manner, the proposed non-local module can be flexibly integrated into existing deep networks for end-to-end training to capture deep feature correlation between each location and its neighborhood. We fully use an RNN for its parameter efficiency and allow deep feature correlation to be propagated along adjacent recurrent states. This design boosts robustness against inaccurate correlation estimation due to severely degraded images. Finally, we show that it is essential to choose a proper neighborhood size for computing deep feature correlation given degraded images, in order to obtain the best restoration performance

    Blind restoration of images with penalty-based decision making : a consensus approach

    Get PDF
    In this thesis we show a relationship between fuzzy decision making and image processing . Various applications for image noise reduction with consensus methodology are introduced. A new approach is introduced to deal with non-stationary Gaussian noise and spatial non-stationary noise in MRI

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

    Full text link
    To mitigate the challenges arising from partial occlusion in human pose keypoint based pedestrian detection methods , we present a novel pedestrian pose keypoint completion method called the separation and dimensionality reduction-based generative adversarial imputation networks (SDR-GAIN) . Firstly, we utilize OpenPose to estimate pedestrian poses in images. Then, we isolate the head and torso keypoints of pedestrians with incomplete keypoints due to occlusion or other factors and perform dimensionality reduction to enhance features and further unify feature distribution. Finally, we introduce two generative models based on the generative adversarial networks (GAN) framework, which incorporate Huber loss, residual structure, and L1 regularization to generate missing parts of the incomplete head and torso pose keypoints of partially occluded pedestrians, resulting in pose completion. Our experiments on MS COCO and JAAD datasets demonstrate that SDR-GAIN outperforms basic GAIN framework, interpolation methods PCHIP and MAkima, machine learning methods k-NN and MissForest in terms of pose completion task. In addition, the runtime of SDR-GAIN is approximately 0.4ms, displaying high real-time performance and significant application value in the field of autonomous driving

    Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services

    Full text link
    Artificial Intelligence-Generated Content (AIGC) is an automated method for generating, manipulating, and modifying valuable and diverse data using AI algorithms creatively. This survey paper focuses on the deployment of AIGC applications, e.g., ChatGPT and Dall-E, at mobile edge networks, namely mobile AIGC networks, that provide personalized and customized AIGC services in real time while maintaining user privacy. We begin by introducing the background and fundamentals of generative models and the lifecycle of AIGC services at mobile AIGC networks, which includes data collection, training, finetuning, inference, and product management. We then discuss the collaborative cloud-edge-mobile infrastructure and technologies required to support AIGC services and enable users to access AIGC at mobile edge networks. Furthermore, we explore AIGCdriven creative applications and use cases for mobile AIGC networks. Additionally, we discuss the implementation, security, and privacy challenges of deploying mobile AIGC networks. Finally, we highlight some future research directions and open issues for the full realization of mobile AIGC networks

    Distributed cloud-edge analytics and machine learning for transportation emissions estimation

    Get PDF
    (English) In recent years IoT and Smart Cities have become a popular paradigm of computing that is based on network-enabled devices connected providing different functionalities, from sensor measures to domotic actions. With this paradigm, it is possible to provide to the stakeholders near-realtime information of the field, e.g. the current pollution of the city. Along with the mentioned paradigms, Fog Computing enables computation near the sensors where the data is produced, i.e. Edge nodes. This paradigm provides low latency and fault tolerance given the possible independence of the sensor devices. Moreover, pushing this computation enables derived results in a near-realtime fashion. This ability to push the computation to where the data is produced can be beneficial in many situations, however it also requires to include in the Edge the data preparation processes that ensure the fitness for use of the data as the incoming data can be erroneous. Given this situation, Machine Learning can be useful to correct data and also to produce predictions of the future values. Even though there have been studies regarding on the uses of data at the Edge, to our knowledge there is no evaluation of the different modeling situations and the viability of the approach. Therefore, this thesis aims to evaluate the possibility of building a distributed system that ensures the fitness for use of the incoming data through Machine Learning enabled Data Preparation, estimates the emissions and predicts the future status of the city in a near-realtime fashion. We evaluate the viability through three contributions. The first contribution focuses on forecasting in a distributed scenario with road traffic dataset for evaluation. It provides a robust solution to build a central model. This approach is based on Federated Learning, which allows training models at the Edge nodes and then merging them centrally. This way the models in the Edge can be independent but also can be synchronized. The results show the trade-off between accuracy versions training time and a comparison between low-powered devices versus server-class machines. These analyses show that it is viable to use Machine Learning with this paradigm. The second contribution focuses on a particular use case of ship emission estimation. To estimate exhaust emissions data must be correct, which is not always the case. This contribution explores the different techniques available to correct ship registry data and proposes the usage of simple Machine Learning techniques to do imputation of missing or erroneous values. This contribution analyzes the different variables and their relationship to provide the practitioners with guidelines for correction and data treatment. The results show that with classical Machine Learning it is possible to improve the state-of-the-art results. Moreover, as these algorithms are simple enough, they can be used in an Edge device if required. The third contribution focuses on generating new variables from the ones available with a ship trace dataset obtained from the Automatic Identification System (AIS). We use a pipeline of two different methods, a Neural Networks and a clustering algorithm, to group movements into movement patterns or \emph{behaviors}. We test the predicting power of these behaviors to predict ship type, main engine power, and navigational status. The prediction of the main engine power is compared against the standard technique used in ship emission estimation when the ship registry is missing. Our approach was able to detect 45\% of the otherwise undetected emissions if the baseline method was to be used. As ship navigational status is prone to error, the behaviors found are proposed as an alternative variable based in robust data. These contributions build a framework that can distribute the learning processes and that resists network failures in low-powered devices.(Español) En los últimos años, IoT y las Smart Cities se han convertido en un paradigma popular de computación que se basa en dispositivos conectados a la red que proporcionan diferentes funcionalidades, desde medidas de sensores hasta acciones domóticas. Con este paradigma, es posible tener información en casi tiempo real, como por ejemplo la contaminación actual de la ciudad. Junto con los paradigmas mencionados, Fog Computing permite computar cerca de donde se producen los datos, es decir, los nodos Edge. Este paradigma proporciona baja latencia y tolerancia a fallos dada la posible independencia de los dispositivos sensores. Esta posibilidad puede ser beneficiosa en muchas situaciones, sin embargo, requiere incluir en el Edge los procesos de preparación de datos que aseguran la idoneidad para su uso, ya que los datos entrantes pueden ser erróneos. Ante esta situación, el Machine Learning es útil para corregir datos y también para producir predicciones de los valores futuros. A pesar de que se han realizado estudios sobre los usos de los datos en el Edge, hasta donde sabemos, no hay una evaluación de las diferentes situaciones de modelado y la viabilidad del enfoque. Por lo tanto, esta tesis tiene como objetivo evaluar la posibilidad de construir un sistema distribuido que garantice que los datos sean correctos a través de su preparación con Machine Learning. También el sistema deberá estimar las emisiones y predecir el estado futuro de la ciudad de una manera casi en tiempo real. La viabilidad se evalúa a través a través de tres contribuciones. La primera contribución se centra en escenario distribuido con un conjunto de datos de tráfico vial que proporciona una solución robusta para construir un modelo central. Este enfoque se basa en Federated Learning, que permite entrenar modelos en los nodos Edge y luego fusionarlos de forma centralizada. De esta manera, los modelos en el Edge pueden ser independientes, pero también se pueden sincronizar. Los resultados muestran la comparación de la precisión con un modelo central y uno distribuido y una comparación con dispositivos de bajo consumos contra servidores. Estos análisis muestran que es viable utilizar el Machine Learning en este paradigma. La segunda contribución se centra en un caso de uso particular de estimación de las emisiones de barcos. Para estimar las emisiones, los datos deben ser correctos, cosa que no siempre pasa. Esta contribución explora las diferentes técnicas disponibles para corregir los datos del registro de barcos y propone el uso de técnicas simples de Machine Learning para hacer imputación de valores faltantes o erróneos. Esta contribución analiza las diferentes variables y su relación para proporcionar a los profesionales pautas para la corrección y el tratamiento de datos. Los resultados muestran que con el Machine Learning clásico es posible mejorar los resultados frente a métodos del estado del arte. Además, como estos algoritmos son lo suficientemente simples como para poder utilizarse en dispositivos Edge. La tercera contribución se centra en generar nuevas variables a partir de las disponibles con un conjunto de datos de trazabilidad de barcos obtenido del Sistema AIS. Esto se hace utilizando en conjunto una red neuronal y un algoritmo de agrupación para agrupar los movimientos en patrones de movimiento o comportamientos. Se evalúa su funcionamiento para predecir el tipo de barco, la potencia del motor principal y el estado de navegación. Con esta predicción, nuestro sistema es capaz de detectar el 45% de las emisiones que no se detectan con métodos standard. Como el estado de navegación del barco es propenso a errores, los comportamientos encontrados se proponen como una variable alternativa basada en datos robustos. Estas contribuciones constituyen un marco para distribuir los procesos de aprendizaje y que resiste errores en la red con dispositivos de bajo consumo.Arquitectura de computador