204 research outputs found

    A brief survey of visual saliency detection

    Get PDF

    Slow and steady feature analysis: higher order temporal coherence in video

    Full text link
    How can unlabeled video augment visual learning? Existing methods perform "slow" feature analysis, encouraging the representations of temporally close frames to exhibit only small differences. While this standard approach captures the fact that high-level visual signals change slowly over time, it fails to capture *how* the visual content changes. We propose to generalize slow feature analysis to "steady" feature analysis. The key idea is to impose a prior that higher order derivatives in the learned feature space must be small. To this end, we train a convolutional neural network with a regularizer on tuples of sequential frames from unlabeled video. It encourages feature changes over time to be smooth, i.e., similar to the most recent changes. Using five diverse datasets, including unlabeled YouTube and KITTI videos, we demonstrate our method's impact on object, scene, and action recognition tasks. We further show that our features learned from unlabeled video can even surpass a standard heavily supervised pretraining approach.Comment: in Computer Vision and Pattern Recognition (CVPR) 2016, Las Vegas, NV, June 201

    A machine learning approach to the unsupervised segmentation of mitochondria in subcellular electron microscopy data

    Get PDF
    Recent advances in cellular and subcellular microscopy demonstrated its potential towards unravelling the mechanisms of various diseases at the molecular level. The biggest challenge in both human- and computer-based visual analysis of micrographs is the variety of nanostructures and mitochondrial morphologies. The state-of-the-art is, however, dominated by supervised manual data annotation and early attempts to automate the segmentation process were based on supervised machine learning techniques which require large datasets for training. Given a minimal number of training sequences or none at all, unsupervised machine learning formulations, such as spectral dimensionality reduction, are known to be superior in detecting salient image structures. This thesis presents three major contributions developed around the spectral clustering framework which is proven to capture perceptual organization features. Firstly, we approach the problem of mitochondria localization. We propose a novel grouping method for the extracted line segments which describes the normal mitochondrial morphology. Experimental findings show that the clusters obtained successfully model the inner mitochondrial membrane folding and therefore can be used as markers for the subsequent segmentation approaches. Secondly, we developed an unsupervised mitochondria segmentation framework. This method follows the evolutional ability of human vision to extrapolate salient membrane structures in a micrograph. Furthermore, we designed robust non-parametric similarity models according to Gestaltic laws of visual segregation. Experiments demonstrate that such models automatically adapt to the statistical structure of the biological domain and return optimal performance in pixel classification tasks under the wide variety of distributional assumptions. The last major contribution addresses the computational complexity of spectral clustering. Here, we introduced a new anticorrelation-based spectral clustering formulation with the objective to improve both: speed and quality of segmentation. The experimental findings showed the applicability of our dimensionality reduction algorithm to very large scale problems as well as asymmetric, dense and non-Euclidean datasets

    OSC-CO2: coattention and cosegmentation framework for plant state change with multiple features

    Get PDF
    Cosegmentation and coattention are extensions of traditional segmentation methods aimed at detecting a common object (or objects) in a group of images. Current cosegmentation and coattention methods are ineffective for objects, such as plants, that change their morphological state while being captured in different modalities and views. The Object State Change using Coattention-Cosegmentation (OSC-CO2) is an end-to-end unsupervised deep-learning framework that enhances traditional segmentation techniques, processing, analyzing, selecting, and combining suitable segmentation results that may contain most of our target object’s pixels, and then displaying a final segmented image. The framework leverages coattention-based convolutional neural networks (CNNs) and cosegmentation-based dense Conditional Random Fields (CRFs) to address segmentation accuracy in high-dimensional plant imagery with evolving plant objects. The efficacy of OSC-CO2 is demonstrated using plant growth sequences imaged with infrared, visible, and fluorescence cameras in multiple views using a remote sensing, high-throughput phenotyping platform, and is evaluated using Jaccard index and precision measures. We also introduce CosegPP+, a dataset that is structured and can provide quantitative information on the efficacy of our framework. Results show that OSC-CO2 out performed state-of-the art segmentation and cosegmentation methods by improving segementation accuracy by 3% to 45%

    A deep-learning approach to aid in diagnosing Barrett’s oesophagus related dysplasia

    Get PDF
    Barrett's oesophagus is the only known precursor to oesophagus carcinoma. Histologically, it is defined as a condition of columnar cells replacing the standard squamous lining. Those altered cells are prone to cytological and architectural abnormalities, known as dysplasia. The dysplastic degree varies from low to high grade and can evolve into invasive carcinoma or adenocarcinoma. Thus, detecting high-grade and intramucosal carcinoma during the surveillance of Barrett's oesophagus patients is vital so they can be treated by surgical resection. Unfortunately, the achieved interobserver agreement for grading dysplasia among pathologists is only fair to moderate. Nowadays, grading Barrett's dysplasia is limited to visual examination by pathologists for glass or virtual slides. This work aims to diagnose different grades of dysplasia in Barrett’s oesophagus, particularly high-grade dysplasia, from virtual histopathological slides of oesophagus tissue. In the first approach, virtual slides were analysed at a low magnification to detect regions of interest and predict the grade of dysplasia based on the analysis of the virtual slides at 10X magnification. Transfer learning was employed to partially fine-tune two deep-learning networks using healthy and Barrett’s oesophagus tissue. Then, the two networks were connected. The proposed model achieved 0.57 sensitivity, 0.79 specificity and moderate agreement with a pathologist. On the contrary, the second approach processed the slides at a higher magnification (40X magnification). It adapted novelty detection and local outlier factor alongside transfer learning to solve the multiple instances learning problem. It increased the performance of the diagnosis to 0.84 sensitivity and 0.92 specificity, and the interobserver agreement reached a substantial level. Finally, the last approach mimics the pathologists’ procedure to diagnose dysplasia, relying on both magnifications. Thus, their behaviours during the assessment were analysed. As a result, it was found that employing a multi-scale approach to detect dysplastic tissue using a low magnification level (10X magnification) and grade dysplasia at a higher level (40X magnification). The proposed computer-aided diagnosis system was built using networks from the first two approaches. It scored 0.90 sensitivity, 0.94 specificity and a substantial agreement with the pathologist and a moderate agreement with the other expert

    Content-based Information Retrieval via Nearest Neighbor Search

    Get PDF
    Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process. On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data\u27s similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches. On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK

    Tensor-cut: A tensor-based graph-cut blood vessel segmentation method and its application to renal artery segmentation

    Get PDF
    Blood vessel segmentation plays a fundamental role in many computer-aided diagnosis (CAD) systems, such as coronary artery stenosis quantification, cerebral aneurysm quantification, and retinal vascular tree analysis. Fine blood vessel segmentation can help build a more accurate computer-aided diagnosis system and help physicians gain a better understanding of vascular structures. The purpose of this article is to develop a blood vessel segmentation method that can improve segmentation accuracy in tiny blood vessels. In this work, we propose a tensor-based graph-cut method for blood vessel segmentation. With our method, each voxel can be modeled by a second-order tensor, allowing the capture of the intensity information and the geometric information for building a more accurate model for blood vessel segmentation. We compared our proposed method’s accuracy to several state-of-the-art blood vessel segmentation algorithms and performed experiments on both simulated and clinical CT datasets. Both experiments showed that our method achieved better state-of-the-art results than the competing techniques. The mean centerline overlap ratio of our proposed method is 84% on clinical CT data. Our proposed blood vessel segmentation method outperformed other state-of-the-art methods by 10% on clinical CT data. Tiny blood vessels in clinical CT data with a 1-mm radius can be extracted using the proposed technique. The experiments on a clinical dataset showed that the proposed method significantly improved the segmentation accuracy in tiny blood vessels

    A practical vision system for the detection of moving objects

    Get PDF
    The main goal of this thesis is to review and offer robust and efficient algorithms for the detection (or the segmentation) of foreground objects in indoor and outdoor scenes using colour image sequences captured by a stationary camera. For this purpose, the block diagram of a simple vision system is offered in Chapter 2. First this block diagram gives the idea of a precise order of blocks and their tasks, which should be performed to detect moving foreground objects. Second, a check mark () on the top right corner of a block indicates that this thesis contains a review of the most recent algorithms and/or some relevant research about it. In many computer vision applications, segmenting and extraction of moving objects in video sequences is an essential task. Background subtraction has been widely used for this purpose as the first step. In this work, a review of the efficiency of a number of important background subtraction and modelling algorithms, along with their major features, are presented. In addition, two background approaches are offered. The first approach is a Pixel-based technique whereas the second one works at object level. For each approach, three algorithms are presented. They are called Selective Update Using Non-Foreground Pixels of the Input Image , Selective Update Using Temporal Averaging and Selective Update Using Temporal Median , respectively in this thesis. The first approach has some deficiencies, which makes it incapable to produce a correct dynamic background. Three methods of the second approach use an invariant colour filter and a suitable motion tracking technique, which selectively exclude foreground objects (or blobs) from the background frames. The difference between the three algorithms of the second approach is in updating process of the background pixels. It is shown that the Selective Update Using Temporal Median method produces the correct background image for each input frame. Representing foreground regions using their boundaries is also an important task. Thus, an appropriate RLE contour tracing algorithm has been implemented for this purpose. However, after the thresholding process, the boundaries of foreground regions often have jagged appearances. Thus, foreground regions may not correctly be recognised reliably due to their corrupted boundaries. A very efficient boundary smoothing method based on the RLE data is proposed in Chapter 7. It just smoothes the external and internal boundaries of foreground objects and does not distort the silhouettes of foreground objects. As a result, it is very fast and does not blur the image. Finally, the goal of this thesis has been presenting simple, practical and efficient algorithms with little constraints which can run in real time

    Image Partitioning based on Semidefinite Programming

    Full text link
    Many tasks in computer vision lead to combinatorial optimization problems. Automatic image partitioning is one of the most important examples in this context: whether based on some prior knowledge or completely unsupervised, we wish to find coherent parts of the image. However, the inherent combinatorial complexity of such problems often prevents to find the global optimum in polynomial time. For this reason, various approaches have been proposed to find good approximative solutions for image partitioning problems. As an important example, we will first consider different spectral relaxation techniques: based on straightforward eigenvector calculations, these methods compute suboptimal solutions in short time. However, the main contribution of this thesis is to introduce a novel optimization technique for discrete image partitioning problems which is based on a semidefinite programming relaxation. In contrast to approximation methods employing annealing algorithms, this approach involves solving a convex optimization problem, which does not suffer from possible local minima. Using interior point techniques, the solution of the relaxation can be found in polynomial time, and without elaborate parameter tuning. High quality solutions to the original combinatorial problem are then obtained with a randomized rounding technique. The only potential drawback of the semidefinite relaxation approach is that the number of variables of the optimization problem is squared. Nevertheless, it can still be applied to problems with up to a few thousand variables, as is demonstrated for various computer vision tasks including unsupervised segmentation, perceptual grouping and image restoration. Concerning problems of higher dimensionality, we study two different approaches to effectively reduce the number of variables. The first one is based on probabilistic sampling: by considering only a small random fraction of the pixels in the image, our semidefinite relaxation method can be applied in an efficient way while maintaining a reliable quality of the resulting segmentations. The second approach reduces the problem size by computing an over-segmentation of the image in a preprocessing step. After that, the image is partitioned based on the resulting "superpixels" instead of the original pixels. Since the real world does not consist of pixels, it can even be argued that this is the more natural image representation. Initially, our semidefinite relaxation method is defined only for binary partitioning problems. To derive image segmentations into multiple parts, one possibility is to apply the binary approach in a hierarchical way. Besides this natural extension, we also discuss how multiclass partitioning problems can be solved in a direct way based on semidefinite relaxation techniques
    corecore