9 research outputs found

    Machine Learning Algorithms for Robotic Navigation and Perception and Embedded Implementation Techniques

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns

    Exploring the Internal Statistics: Single Image Super-Resolution, Completion and Captioning

    Full text link
    Image enhancement has drawn increasingly attention in improving image quality or interpretability. It aims to modify images to achieve a better perception for human visual system or a more suitable representation for further analysis in a variety of applications such as medical imaging, remote sensing, and video surveillance. Based on different attributes of the given input images, enhancement tasks vary, e.g., noise removal, deblurring, resolution enhancement, prediction of missing pixels, etc. The latter two are usually referred to as image super-resolution and image inpainting (or completion). Image super-resolution and completion are numerically ill-posed problems. Multi-frame-based approaches make use of the presence of aliasing in multiple frames of the same scene. For cases where only one input image is available, it is extremely challenging to estimate the unknown pixel values. In this dissertation, we target at single image super-resolution and completion by exploring the internal statistics within the input image and across scales. An internal gradient similarity-based single image super-resolution algorithm is first presented. Then we demonstrate that the proposed framework could be naturally extended to accomplish super-resolution and completion simultaneously. Afterwards, a hybrid learning-based single image super-resolution approach is proposed to benefit from both external and internal statistics. This framework hinges on image-level hallucination from externally learned regression models as well as gradient level pyramid self-awareness for edges and textures refinement. The framework is then employed to break the resolution limitation of the passive microwave imagery and to boost the tracking accuracy of the sea ice movements. To extend our research to the quality enhancement of the depth maps, a novel system is presented to handle circumstances where only one pair of registered low-resolution intensity and depth images are available. High quality RGB and depth images are generated after the system. Extensive experimental results have demonstrated the effectiveness of all the proposed frameworks both quantitatively and qualitatively. Different from image super-resolution and completion which belong to low-level vision research, image captioning is a high-level vision task related to the semantic understanding of an input image. It is a natural task for human beings. However, image captioning remains challenging from a computer vision point of view especially due to the fact that the task itself is ambiguous. In principle, descriptions of an image can talk about any visual aspects in it varying from object attributes to scene features, or even refer to objects that are not depicted and the hidden interaction or connection that requires common sense knowledge to analyze. Therefore, learning-based image captioning is in general a data-driven task, which relies on the training dataset. Descriptions in the majority of the existing image-sentence datasets are generated by humans under specific instructions. Real-world sentence data is rarely directly utilized for training since it is sometimes noisy and unbalanced, which makes it ‘imperfect’ for the training of the image captioning task. In this dissertation, we present a novel image captioning framework to deal with the uncontrolled image-sentence dataset where descriptions could be strongly or weakly correlated to the image content and in arbitrary lengths. A self-guiding learning process is proposed to fully reveal the internal statistics of the training dataset and to look into the learning process in a global way and generate descriptions that are syntactically correct and semantically sound

    Indirect 3D Reconstruction Through Appearance Prediction

    Get PDF
    As humans, we easily perceive shape and depth, which helps us navigate our environment and interact with objects around us. Automating these abilities for computers is critical for many applications such as self-driving cars, augmented reality or architectural surveying. While active 3D reconstruction methods, such as laser scanning or structured light can produce very accurate results, they are typically expensive and their use cases can be limited. In contrast, passive methods that make use of only easily captured photographs, are typically less accurate as mapping from 2D images to 3D is an under-constrained problem. In this thesis we will focus on passive reconstruction techniques. We explore ways to get 3D shape from images in two challenging situations: 1) where a collection of images features a highly specular surface whose appearance changes drastically between the images, and 2) where only one input image is available. For both cases, we pose the reconstruction task as an indirect problem. In the first situation, the rapid change in appearance of highly specular objects makes it infeasible to directly establish correspondences between images. Instead, we develop an indirect approach using a panoramic image of the environment to simulate reflections, and recover the surface which best predicts the appearance of the object. In the second situation, the ambiguity inherent in single-view reconstruction is typically solved with machine learning, but acquiring depth data for training is both difficult and expensive. We present an indirect approach, where we train a neural network to regress depth by performing the proxy task of predicting the appearance of the image when the viewpoint changes. We prove that highly specular objects can be accurately reconstructed in uncontrolled environments, producing results that are 30% more accurate compared to the initialisation surface. For single frame depth estimation, our approach improves object boundaries in the reconstructions and significantly outperforms all previously published methods. In both situations, the proposed methods shrink the accuracy gap between camera-based reconstruction versus what is achievable through active sensors

    Pedestrian localization, tracking and behavior analysis from multiple cameras

    Get PDF
    Video surveillance is currently undergoing a rapid growth. However, while thousands of cameras are being installed in public places all over the world, computer programs that could reliably detect and track people in order to analyze their behavior are not yet operational. Challenges are numerous, ranging from low image quality, suboptimal scene lighting, changing appearances of pedestrians, occlusions with environment and between people, complex interacting trajectories in crowds, etc. In this thesis, we propose a complete approach for detecting and tracking an unknown number of interacting people from multiple cameras located at eye level. Our system works reliably in spite of significant occlusions and delivers metrically accurate trajectories for each tracked individual. Furthermore, we develop a method for representing the most common types of motion in a specific environment and learning them automatically from image data. We demonstrate that a generative model for detection can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. We then advocate that multi-people tracking can be achieved by detecting people in individual frames and then linking detections across frames. We formulate the linking step as a problem of finding the most probable state of a hidden Markov process given the set of images and frame-independent detections. We first propose to solve this problem by optimizing trajectories independently with Dynamic Programming. In a second step, we reformulate the problem as a constrained flow optimization resulting in a convex problem that can be solved using standard Linear Programming techniques and is far simpler formally and algorithmically than existing techniques. We show that the particular structure of this framework lets us solve it equivalently using the k-shortest paths algorithm, which leads to a much faster optimization. Finally, we introduce a novel behavioral model to describe pedestrians motions, which is able to capture sophisticated motion patterns resulting from the mixture of different categories of random trajectories. Due to its simplicity, this model can be learned from video sequences in a totally unsupervised manner through an Expectation-Maximization procedure. We show that this behavior model can be used to make tracking systems more robust in ambiguous situations. Moreover, we demonstrate its ability to characterize and detect atypical individual motions

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Image Mosaicing and Super-resolution

    Full text link

    DEEP LEARNING-BASED APPROACHES FOR IMAGE RESTORATION

    Get PDF
    Image restoration is the operation of taking a corrupted or degraded low-quality image and estimating a high-quality clean image that is free of degradations. The most common degradations that affect the quality of the image are blur, atmospheric turbulence, adverse weather conditions (like rain, haze, and snow), and noise. Images captured under the influence of these corruptions or degradations can significantly affect the performance of subsequent computer vision algorithms such as segmentation, recognition, object detection, and tracking. With such algorithms becoming vital components in several applications such as autonomous navigation and video surveillance, it is increasingly important to develop sophisticated algorithms to remove these degradations and high-quality clean images. These reasons have motivated a plethora of research on single image restoration methods to remove such effects. Recently, following the success of deep learning-based convolutional neural networks, many approaches have been proposed to remove the degradations from the corrupted image. We study the following single image restoration problems: (i) atmospheric turbulence removal, (ii) deblurring, (iii) removing distortions introduced by adverse weather conditions such as rain, haze, and snow, and (iv) removing noise. However, existing single image restoration techniques suffer from the following major limitations: (i) They construct global priors without taking into account that these degradations can have a different effect on different local regions of the image. (ii) They use synthetic datasets for training which often results in sub-optimal performance on the real-world images, typically because of the distributional-shift between synthetic and real-world degraded images. (iii) Existing semi-supervised approaches don't account for the effect of unlabeled or real-world degraded image on semi-supervised performance. To address these limitations, we propose supervised image restoration techniques where we use uncertainty to improve the restoration performance. To overcome the second limitation, we propose a Gaussian process-based pseudo-labeling approach to leverage the real-world rain information and train the deraininng network in a semi-supervised fashion. Furthermore, to address the third limitation we theoretically study the effect of unlabeled images on semi-supervised performance and propose an adaptive rejection technique to boost semi-supervised performance. Finally, we recognize that existing supervised and semi-supervised methods need some kind of paired labeled data to train the network, and training on any kind of synthetic paired clean-degraded images may not completely solve the domain gap between synthetic and real-world degraded image distributions. Thus we propose a self-supervised transformer-based approach for image denoising. Here, given a noisy image, we generate multiple down-sampled images and learn the joint relation between these down-sampled using the Gaussian process to denoise the image
    corecore