134 research outputs found

    RGB-T salient object detection via fusing multi-level CNN features

    Get PDF
    RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead of improving RGB based saliency detection, this paper takes advantage of the complementary benefits of RGB and thermal infrared images. Specifically, we propose a novel end-to-end network for multi-modal salient object detection, which turns the challenge of RGB-T saliency detection to a CNN feature fusion problem. To this end, a backbone network (e.g., VGG-16) is first adopted to extract the coarse features from each RGB or thermal infrared image individually, and then several adjacent-depth feature combination (ADFC) modules are designed to extract multi-level refined features for each single-modal input image, considering that features captured at different depths differ in semantic information and visual details. Subsequently, a multi-branch group fusion (MGF) module is employed to capture the cross-modal features by fusing those features from ADFC modules for a RGB-T image pair at each level. Finally, a joint attention guided bi-directional message passing (JABMP) module undertakes the task of saliency prediction via integrating the multi-level fused features from MGF modules. Experimental results on several public RGB-T salient object detection datasets demonstrate the superiorities of our proposed algorithm over the state-of-the-art approaches, especially under challenging conditions, such as poor illumination, complex background and low contrast

    RGBT Salient Object Detection: A Large-scale Dataset and Benchmark

    Full text link
    Salient object detection in complex scenes and environments is a challenging research topic. Most works focus on RGB-based salient object detection, which limits its performance of real-life applications when confronted with adverse conditions such as dark environments and complex backgrounds. Taking advantage of RGB and thermal infrared images becomes a new research direction for detecting salient object in complex scenes recently, as thermal infrared spectrum imaging provides the complementary information and has been applied to many computer vision tasks. However, current research for RGBT salient object detection is limited by the lack of a large-scale dataset and comprehensive benchmark. This work contributes such a RGBT image dataset named VT5000, including 5000 spatially aligned RGBT image pairs with ground truth annotations. VT5000 has 11 challenges collected in different scenes and environments for exploring the robustness of algorithms. With this dataset, we propose a powerful baseline approach, which extracts multi-level features within each modality and aggregates these features of all modalities with the attention mechanism, for accurate RGBT salient object detection. Extensive experiments show that the proposed baseline approach outperforms the state-of-the-art methods on VT5000 dataset and other two public datasets. In addition, we carry out a comprehensive analysis of different algorithms of RGBT salient object detection on VT5000 dataset, and then make several valuable conclusions and provide some potential research directions for RGBT salient object detection.Comment: 12 pages, 10 figures https://github.com/lz118/RGBT-Salient-Object-Detectio

    Méthodes de vision à la motion et leurs applications

    Get PDF
    La détection de mouvement est une opération de base souvent utilisée en vision par ordinateur, que ce soit pour la détection de piétons, la détection d’anomalies, l’analyse de scènes vidéo ou le suivi d’objets en temps réel. Bien qu’un très grand nombre d’articles ait été publiés sur le sujet, plusieurs questions restent en suspens. Par exemple, il n’est toujours pas clair comment détecter des objets en mouvement dans des vidéos contenant des situations difficiles à gérer comme d'importants mouvements de fonds et des changements d’illumination. De plus, il n’y a pas de consensus sur comment quantifier les performances des méthodes de détection de mouvement. Aussi, il est souvent difficile d’incorporer de l’information de mouvement à des opérations de haut niveau comme par exemple la détection de piétons. Dans cette thèse, j’aborde quatre problèmes en lien avec la détection de mouvement: 1. Comment évaluer efficacement des méthodes de détection de mouvement? Pour répondre à cette question, nous avons mis sur pied une procédure d’évaluation de telles méthodes. Cela a mené à la création de la plus grosse base de données 100\% annotée au monde dédiée à la détection de mouvement et organisé une compétition internationale (CVPR 2014). J’ai également exploré différentes métriques d’évaluation ainsi que des stratégies de combinaison de méthodes de détection de mouvement. 2. L’annotation manuelle de chaque objet en mouvement dans un grand nombre de vidéos est un immense défi lors de la création d’une base de données d’analyse vidéo. Bien qu’il existe des méthodes de segmentation automatiques et semi-automatiques, ces dernières ne sont jamais assez précises pour produire des résultats de type “vérité terrain”. Pour résoudre ce problème, nous avons proposé une méthode interactive de segmentation d’objets en mouvement basée sur l’apprentissage profond. Les résultats obtenus sont aussi précis que ceux obtenus par un être humain tout en étant 40 fois plus rapide. 3. Les méthodes de détection de piétons sont très souvent utilisées en analyse de la vidéo. Malheureusement, elles souffrent parfois d’un grand nombre de faux positifs ou de faux négatifs tout dépendant de l’ajustement des paramètres de la méthode. Dans le but d’augmenter les performances des méthodes de détection de piétons, nous avons proposé un filtre non linéaire basée sur la détection de mouvement permettant de grandement réduire le nombre de faux positifs. 4. L’initialisation de fond ({\em background initialization}) est le processus par lequel on cherche à retrouver l’image de fond d’une vidéo sans les objets en mouvement. Bien qu’un grand nombre de méthodes ait été proposé, tout comme la détection de mouvement, il n’existe aucune base de donnée ni procédure d’évaluation pour de telles méthodes. Nous avons donc mis sur pied la plus grosse base de données au monde pour ce type d’applications et avons organisé une compétition internationale (ICPR 2016).Abstract : Motion detection is a basic video analytic operation on which many high-level computer vision tasks are built upon, e.g., pedestrian detection, anomaly detection, scene understanding and object tracking strategies. Even though a large number of motion detection methods have been proposed in the last decades, some important questions are still unanswered, including: (1) how to separate the foreground from the background accurately even under extremely challenging circumstances? (2) how to evaluate different motion detection methods? And (3) how to use motion information extracted by motion detection to help improving high-level computer vision tasks? In this thesis, we address four problems related to motion detection: 1. How can we benchmark (and on which videos) motion detection method? Current datasets are either too small with a limited number of scenarios, or only provide bounding box ground truth that indicates the rough location of foreground objects. As a solution, we built the largest and most objective motion detection dataset in the world with pixel accurate ground truth to evaluate and compare motion detection methods. We also explore various evaluation metrics as well as different combination strategies. 2. Providing pixel accurate ground truth is a huge challenge when building a motion detection dataset. While automatic labeling methods suffer from a too large false detection rate to be used as ground truth, manual labeling of hundreds of thousands of frames is extremely time consuming. To solve this problem, we proposed an interactive deep learning method for segmenting moving objects from videos. The proposed method can reach human-level accuracies while lowering the labeling time by a factor of 40. 3. Pedestrian detectors always suffer from either false positive detections or false negative detections all depending on the parameter tuning. Unfortunately, manual adjustment of parameters for a large number of videos is not feasible in practice. In order to make pedestrian detectors more robust on a large variety of videos, we combined motion detection with various state-of-the-art pedestrian detectors. This is done by a novel motion-based nonlinear filtering process which improves detectors by a significant margin. 4. Scene background initialization is the process by which a method tries to recover the RGB background image of a video without foreground objects in it. However, one of the reasons that background modeling is challenging is that there is no good dataset and benchmarking framework to estimate the performance of background modeling methods. To fix this problem, we proposed an extensive survey as well as a novel benchmarking framework for scene background initialization

    Data-Driven Image Restoration

    Get PDF
    Every day many images are taken by digital cameras, and people are demanding visually accurate and pleasing result. Noise and blur degrade images captured by modern cameras, and high-level vision tasks (such as segmentation, recognition, and tracking) require high-quality images. Therefore, image restoration specifically, image deblurring and image denoising is a critical preprocessing step. A fundamental problem in image deblurring is to recover reliably distinct spatial frequencies that have been suppressed by the blur kernel. Existing image deblurring techniques often rely on generic image priors that only help recover part of the frequency spectrum, such as the frequencies near the high-end. To this end, we pose the following specific questions: (i) Does class-specific information offer an advantage over existing generic priors for image quality restoration? (ii) If a class-specific prior exists, how should it be encoded into a deblurring framework to recover attenuated image frequencies? Throughout this work, we devise a class-specific prior based on the band-pass filter responses and incorporate it into a deblurring strategy. Specifically, we show that the subspace of band-pass filtered images and their intensity distributions serve as useful priors for recovering image frequencies. Next, we present a novel image denoising algorithm that uses external, category specific image database. In contrast to existing noisy image restoration algorithms, our method selects clean image “support patches” similar to the noisy patch from an external database. We employ a content adaptive distribution model for each patch where we derive the parameters of the distribution from the support patches. Our objective function composed of a Gaussian fidelity term that imposes category specific information, and a low-rank term that encourages the similarity between the noisy and the support patches in a robust manner. Finally, we propose to learn a fully-convolutional network model that consists of a Chain of Identity Mapping Modules (CIMM) for image denoising. The CIMM structure possesses two distinctive features that are important for the noise removal task. Firstly, each residual unit employs identity mappings as the skip connections and receives pre-activated input to preserve the gradient magnitude propagated in both the forward and backward directions. Secondly, by utilizing dilated kernels for the convolution layers in the residual branch, each neuron in the last convolution layer of each module can observe the full receptive field of the first layer

    Image Analysis Algorithms for Single-Cell Study in Systems Biology

    Get PDF
    With the contiguous shift of biology from a qualitative toward a quantitative field of research, digital microscopy and image-based measurements are drawing increased interest. Several methods have been developed for acquiring images of cells and intracellular organelles. Traditionally, acquired images are analyzed manually through visual inspection. The increasing volume of data is challenging the scope of manual analysis, and there is a need to develop methods for automated analysis. This thesis examines the development and application of computational methods for acquisition and analysis of images from single-cell assays. The thesis proceeds with three different aspects.First, a study evaluates several methods for focusing microscopes and proposes a novel strategy to perform focusing in time-lapse imaging. The method relies on the nature of the focus-drift and its predictability. The study shows that focus-drift is a dynamical system with a small randomness. Therefore, a prediction-based method is employed to track the focus-drift overtime. A prototype implementation of the proposed method is created by extending the Nikon EZ-C1 Version 3.30 (Tokyo, Japan) imaging platform for acquiring images with a Nikon Eclipse (TE2000-U, Nikon, Japan) microscope.Second, a novel method is formulated to segment individual cells from a dense cluster. The method incorporates multi-resolution analysis with maximum-likelihood estimation (MAMLE) for cell detection. The MAMLE performs cell segmentation in two phases. The initial phase relies on a cutting-edge filter, edge detection in multi-resolution with a morphological operator, and threshold decomposition for adaptive thresholding. It estimates morphological features from the initial results. In the next phase, the final segmentation is constructed by boosting the initial results with the estimated parameters. The MAMLE method is evaluated with de novo data sets as well as with benchmark data from public databases. An empirical evaluation of the MAMLE method confirms its accuracy.Third, a comparative study is carried out on performance evaluation of state-ofthe-art methods for the detection of subcellular organelles. This study includes eleven algorithms developed in different fields for segmentation. The evaluation procedure encompasses a broad set of samples, ranging from benchmark data to synthetic images. The result from this study suggests that there is no particular method which performs superior to others in the test samples. Next, the effect of tetracycline on transcription dynamics of tetA promoter in Escherichia coli (E. coli ) cells is studied. This study measures expressions of RNA by tagging the MS2d-GFP vector with a target gene. The RNAs are observed as intracellular spots in confocal images. The kernel density estimation (KDE) method for detecting the intracellular spots is employed to quantify the individual RNA molecules.The thesis summarizes the results from five publications. Most of the publications are associated with different methods for imaging and analysis of microscopy. Confocal images with E. coli cells are targeted as the primary area of application. However, potential applications beyond the primary target are also made evident. The findings of the research are confirmed empirically

    Color Image Processing based on Graph Theory

    Full text link
    [ES] La visión artificial es uno de los campos en mayor crecimiento en la actualidad que, junto con otras tecnologías como la Biometría o el Big Data, se ha convertido en el foco de interés de numerosas investigaciones y es considerada como una de las tecnologías del futuro. Este amplio campo abarca diversos métodos entre los que se encuentra el procesamiento y análisis de imágenes digitales. El éxito del análisis de imágenes y otras tareas de procesamiento de alto nivel, como pueden ser el reconocimiento de patrones o la visión 3D, dependerá en gran medida de la buena calidad de las imágenes de partida. Hoy en día existen multitud de factores que dañan las imágenes dificultando la obtención de imágenes de calidad óptima, esto ha convertido el (pre-) procesamiento digital de imágenes en un paso fundamental previo a la aplicación de cualquier otra tarea de procesado. Los factores más comunes son el ruido y las malas condiciones de adquisición: los artefactos provocados por el ruido dificultan la interpretación adecuada de la imagen y la adquisición en condiciones de iluminación o exposición deficientes, como escenas dinámicas, causan pérdida de información de la imagen que puede ser clave para ciertas tareas de procesamiento. Los pasos de (pre-)procesamiento de imágenes conocidos como suavizado y realce se aplican comúnmente para solventar estos problemas: El suavizado tiene por objeto reducir el ruido mientras que el realce se centra en mejorar o recuperar la información imprecisa o dañada. Con estos métodos conseguimos reparar información de los detalles y bordes de la imagen con una nitidez insuficiente o un contenido borroso que impide el (post-)procesamiento óptimo de la imagen. Existen numerosos métodos que suavizan el ruido de una imagen, sin embargo, en muchos casos el proceso de filtrado provoca emborronamiento en los bordes y detalles de la imagen. De igual manera podemos encontrar una enorme cantidad de técnicas de realce que intentan combatir las pérdidas de información, sin embargo, estas técnicas no contemplan la existencia de ruido en la imagen que procesan: ante una imagen ruidosa, cualquier técnica de realce provocará también un aumento del ruido. Aunque la idea intuitiva para solucionar este último caso será el previo filtrado y posterior realce, este enfoque ha demostrado no ser óptimo: el filtrado podrá eliminar información que, a su vez, podría no ser recuperable en el siguiente paso de realce. En la presente tesis doctoral se propone un modelo basado en teoría de grafos para el procesamiento de imágenes en color. En este modelo, se construye un grafo para cada píxel de tal manera que sus propiedades permiten caracterizar y clasificar dicho pixel. Como veremos, el modelo propuesto es robusto y capaz de adaptarse a una gran variedad de aplicaciones. En particular, aplicamos el modelo para crear nuevas soluciones a los dos problemas fundamentales del procesamiento de imágenes: suavizado y realce. Se ha estudiado el modelo en profundidad en función del umbral, parámetro clave que asegura la correcta clasificación de los píxeles de la imagen. Además, también se han estudiado las posibles características y posibilidades del modelo que nos han permitido sacarle el máximo partido en cada una de las posibles aplicaciones. Basado en este modelo se ha diseñado un filtro adaptativo capaz de eliminar ruido gaussiano de una imagen sin difuminar los bordes ni perder información de los detalles. Además, también ha permitido desarrollar un método capaz de realzar los bordes y detalles de una imagen al mismo tiempo que se suaviza el ruido presente en la misma. Esta aplicación simultánea consigue combinar dos operaciones opuestas por definición y superar así los inconvenientes presentados por el enfoque en dos etapas.[CA] La visió artificial és un dels camps en major creixement en l'actualitat que, junt amb altres tecnlogies com la Biometria o el Big Data, s'ha convertit en el focus d'interés de nombroses investigacions i és considerada com una de les tecnologies del futur. Aquest ampli camp comprén diversos m`etodes entre els quals es troba el processament digital d'imatges i anàlisis d'imatges digitals. L'èxit de l'anàlisis d'imatges i altres tasques de processament d'alt nivell, com poden ser el reconeixement de patrons o la visió 3D, dependrà en gran manera de la bona qualitat de les imatges de partida. Avui dia existeixen multitud de factors que danyen les imatges dificultant l'obtenció d'imatges de qualitat òptima, açò ha convertit el (pre-) processament digital d'imatges en un pas fonamental previa la l'aplicació de qualsevol altra tasca de processament. Els factors més comuns són el soroll i les males condicions d'adquisició: els artefactes provocats pel soroll dificulten la inter- pretació adequada de la imatge i l'adquisició en condicions d'il·luminació o exposició deficients, com a escenes dinàmiques, causen pèrdua d'informació de la imatge que pot ser clau per a certes tasques de processament. Els passos de (pre-) processament d'imatges coneguts com suavitzat i realç s'apliquen comunament per a resoldre aquests problemes: El suavitzat té com a objecte reduir el soroll mentres que el real se centra a millorar o recuperar la informació imprecisa o danyada. Amb aquests mètodes aconseguim reparar informació dels detalls i bords de la imatge amb una nitidesa insuficient o un contingut borrós que impedeix el (post-)processament òptim de la imatge. Existeixen nombrosos mètodes que suavitzen el soroll d'una imatge, no obstant això, en molts casos el procés de filtrat provoca emborronamiento en els bords i detalls de la imatge. De la mateixa manera podem trobar una enorme quantitat de tècniques de realç que intenten combatre les pèrdues d'informació, no obstant això, aquestes tècniques no contemplen l'existència de soroll en la imatge que processen: davant d'una image sorollosa, qualsevol tècnica de realç provocarà també un augment del soroll. Encara que la idea intuïtiva per a solucionar aquest últim cas seria el previ filtrat i posterior realç, aquest enfocament ha demostrat no ser òptim: el filtrat podria eliminar informació que, al seu torn, podria no ser recuperable en el seguënt pas de realç. En la present Tesi doctoral es proposa un model basat en teoria de grafs per al processament d'imatges en color. En aquest model, es construïx un graf per a cada píxel de tal manera que les seues propietats permeten caracteritzar i classificar el píxel en quëstió. Com veurem, el model proposat és robust i capaç d'adaptar-se a una gran varietat d'aplicacions. En particular, apliquem el model per a crear noves solucions als dos problemes fonamentals del processament d'imatges: suavitzat i realç. S'ha estudiat el model en profunditat en funció del llindar, paràmetre clau que assegura la correcta classificació dels píxels de la imatge. A més, també s'han estudiat les possibles característiques i possibilitats del model que ens han permés traure-li el màxim partit en cadascuna de les possibles aplicacions. Basat en aquest model s'ha dissenyat un filtre adaptatiu capaç d'eliminar soroll gaussià d'una imatge sense difuminar els bords ni perdre informació dels detalls. A més, també ha permés desenvolupar un mètode capaç de realçar els bords i detalls d'una imatge al mateix temps que se suavitza el soroll present en la mateixa. Aquesta aplicació simultània aconseguix combinar dues operacions oposades per definició i superar així els inconvenients presentats per l'enfocament en dues etapes.[EN] Computer vision is one of the fastest growing fields at present which, along with other technologies such as Biometrics or Big Data, has become the focus of interest of many research projects and it is considered one of the technologies of the future. This broad field includes a plethora of digital image processing and analysis tasks. To guarantee the success of image analysis and other high-level processing tasks as 3D imaging or pattern recognition, it is critical to improve the quality of the raw images acquired. Nowadays all images are affected by different factors that hinder the achievement of optimal image quality, making digital image processing a fundamental step prior to the application of any other practical application. The most common of these factors are noise and poor acquisition conditions: noise artefacts hamper proper image interpretation of the image; and acquisition in poor lighting or exposure conditions, such as dynamic scenes, causes loss of image information that can be key for certain processing tasks. Image (pre-) processing steps known as smoothing and sharpening are commonly applied to overcome these inconveniences: Smoothing is aimed at reducing noise and sharpening at improving or recovering imprecise or damaged information of image details and edges with insufficient sharpness or blurred content that prevents optimal image (post-)processing. There are many methods for smoothing the noise in an image, however in many cases the filtering process causes blurring at the edges and details of the image. Besides, there are also many sharpening techniques, which try to combat the loss of information due to blurring of image texture and need to contemplate the existence of noise in the image they process. When dealing with a noisy image, any sharpening technique may amplify the noise. Although the intuitive idea to solve this last case would be the previous filtering and later sharpening, this approach has proved not to be optimal: the filtering could remove information that, in turn, may not be recoverable in the later sharpening step. In the present PhD dissertation we propose a model based on graph theory for color image processing from a vector approach. In this model, a graph is built for each pixel in such a way that its features allow to characterize and classify the pixel. As we will show, the model we proposed is robust and versatile: potentially able to adapt to a variety of applications. In particular, we apply the model to create new solutions for the two fundamentals problems in image processing: smoothing and sharpening. To approach high performance image smoothing we use the proposed model to determine if a pixel belongs to a at region or not, taking into account the need to achieve a high-precision classification even in the presence of noise. Thus, we build an adaptive soft-switching filter by employing the pixel classification to combine the outputs from a filter with high smoothing capability and a softer one to smooth edge/detail regions. Further, another application of our model allows to use pixels characterization to successfully perform a simultaneous smoothing and sharpening of color images. In this way, we address one of the classical challenges within the image processing field. We compare all the image processing techniques proposed with other state-of-the-art methods to show that they are competitive both from an objective (numerical) and visual evaluation point of view.Pérez Benito, C. (2019). Color Image Processing based on Graph Theory [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/123955TESI
    corecore