10 research outputs found

    Saliency-guided Adaptive Seeding for Supervoxel Segmentation

    Full text link
    We propose a new saliency-guided method for generating supervoxels in 3D space. Rather than using an evenly distributed spatial seeding procedure, our method uses visual saliency to guide the process of supervoxel generation. This results in densely distributed, small, and precise supervoxels in salient regions which often contain objects, and larger supervoxels in less salient regions that often correspond to background. Our approach largely improves the quality of the resulting supervoxel segmentation in terms of boundary recall and under-segmentation error on publicly available benchmarks.Comment: 6 pages, accepted to IROS201

    Unsupervised brain anomaly detection in MR images

    Get PDF
    Brain disorders are characterized by morphological deformations in shape and size of (sub)cortical structures in one or both hemispheres. These deformations cause deviations from the normal pattern of brain asymmetries, resulting in asymmetric lesions that directly affect the patient’s condition. Unsupervised methods aim to learn a model from unlabeled healthy images, so that an unseen image that breaks priors of this model, i.e., an outlier, is considered an anomaly. Consequently, they are generic in detecting any lesions, e.g., coming from multiple diseases, as long as these notably differ from healthy training images. This thesis addresses the development of solutions to leverage unsupervised machine learning for the detection/analysis of abnormal brain asymmetries related to anomalies in magnetic resonance (MR) images. First, we propose an automatic probabilistic-atlas-based approach for anomalous brain image segmentation. Second, we explore an automatic method for the detection of abnormal hippocampi from abnormal asymmetries based on deep generative networks and a one-class classifier. Third, we present a more generic framework to detect abnormal asymmetries in the entire brain hemispheres. Our approach extracts pairs of symmetric regions — called supervoxels — in both hemispheres of a test image under study. One-class classifiers then analyze the asymmetries present in each pair. Experimental results on 3D MR-T1 images from healthy subjects and patients with a variety of lesions show the effectiveness and robustness of the proposed unsupervised approaches for brain anomaly detection

    Learning to segment in images and videos with different forms of supervision

    Get PDF
    Much progress has been made in image and video segmentation over the last years. To a large extent, the success can be attributed to the strong appearance models completely learned from data, in particular using deep learning methods. However, to perform best these methods require large representative datasets for training with expensive pixel-level annotations, which in case of videos are prohibitive to obtain. Therefore, there is a need to relax this constraint and to consider alternative forms of supervision, which are easier and cheaper to collect. In this thesis, we aim to develop algorithms for learning to segment in images and videos with different levels of supervision. First, we develop approaches for training convolutional networks with weaker forms of supervision, such as bounding boxes or image labels, for object boundary estimation and semantic/instance labelling tasks. We propose to generate pixel-level approximate groundtruth from these weaker forms of annotations to train a network, which allows to achieve high-quality results comparable to the full supervision quality without any modifications of the network architecture or the training procedure. Second, we address the problem of the excessive computational and memory costs inherent to solving video segmentation via graphs. We propose approaches to improve the runtime and memory efficiency as well as the output segmentation quality by learning from the available training data the best representation of the graph. In particular, we contribute with learning must-link constraints, the topology and edge weights of the graph as well as enhancing the graph nodes - superpixels - themselves. Third, we tackle the task of pixel-level object tracking and address the problem of the limited amount of densely annotated video data for training convolutional networks. We introduce an architecture which allows training with static images only and propose an elaborate data synthesis scheme which creates a large number of training examples close to the target domain from the given first frame mask. With the proposed techniques we show that densely annotated consequent video data is not necessary to achieve high-quality temporally coherent video segmentation results. In summary, this thesis advances the state of the art in weakly supervised image segmentation, graph-based video segmentation and pixel-level object tracking and contributes with the new ways of training convolutional networks with a limited amount of pixel-level annotated training data.In der Bild- und Video-Segmentierung wurden im Laufe der letzten Jahre große Fortschritte erzielt. Dieser Erfolg beruht weitgehend auf starken Appearance Models, die vollständig aus Daten gelernt werden, insbesondere mit Deep Learning Methoden. Für beste Performanz benötigen diese Methoden jedoch große repräsentative Datensätze für das Training mit teuren Annotationen auf Pixelebene, die bei Videos unerschwinglich sind. Deshalb ist es notwendig, diese Einschränkung zu überwinden und alternative Formen des überwachten Lernens in Erwägung zu ziehen, die einfacher und kostengünstiger zu sammeln sind. In dieser Arbeit wollen wir Algorithmen zur Segmentierung von Bildern und Videos mit verschiedenen Ebenen des überwachten Lernens entwickeln. Zunächst entwickeln wir Ansätze zum Training eines faltenden Netzwerkes (convolutional network) mit schwächeren Formen des überwachten Lernens, wie z.B. Begrenzungsrahmen oder Bildlabel, für Objektbegrenzungen und Semantik/Instanz- Klassifikationsaufgaben. Wir schlagen vor, aus diesen schwächeren Formen von Annotationen eine annähernde Ground Truth auf Pixelebene zu generieren, um ein Netzwerk zu trainieren, das hochwertige Ergebnisse ermöglicht, die qualitativ mit denen bei voll überwachtem Lernen vergleichbar sind, und dies ohne Änderung der Netzwerkarchitektur oder des Trainingsprozesses. Zweitens behandeln wir das Problem des beträchtlichen Rechenaufwands und Speicherbedarfs, das der Segmentierung von Videos mittels Graphen eigen ist. Wir schlagen Ansätze vor, um sowohl die Laufzeit und Speichereffizienz als auch die Qualität der Segmentierung zu verbessern, indem aus den verfügbaren Trainingsdaten die beste Darstellung des Graphen gelernt wird. Insbesondere leisten wir einen Beitrag zum Lernen mit must-link Bedingungen, zur Topologie und zu Kantengewichten des Graphen sowie zu verbesserten Superpixeln. Drittens gehen wir die Aufgabe des Objekt-Tracking auf Pixelebene an und befassen uns mit dem Problem der begrenzten Menge von dicht annotierten Videodaten zum Training eines faltenden Netzwerkes. Wir stellen eine Architektur vor, die das Training nur mit statischen Bildern ermöglicht, und schlagen ein aufwendiges Schema zur Datensynthese vor, das aus der gegebenen ersten Rahmenmaske eine große Anzahl von Trainingsbeispielen ähnlich der Zieldomäne schafft. Mit den vorgeschlagenen Techniken zeigen wir, dass dicht annotierte zusammenhängende Videodaten nicht erforderlich sind, um qualitativ hochwertige zeitlich kohärente Resultate der Segmentierung von Videos zu erhalten. Zusammenfassend lässt sich sagen, dass diese Arbeit den Stand der Technik in schwach überwachter Segmentierung von Bildern, graphenbasierter Segmentierung von Videos und Objekt-Tracking auf Pixelebene weiter entwickelt, und mit neuen Formen des Trainings faltender Netzwerke bei einer begrenzten Menge von annotierten Trainingsdaten auf Pixelebene einen Beitrag leistet

    Interactive Segmentation, Uncertainty and Learning

    Get PDF
    Interactive segmentation is an important paradigm in image processing. To minimize the number of user interactions (“seeds”) required until the result is correct, the computer should actively query the human for input at the most critical locations, in analogy to active learning. These locations are found by means of suitable uncertainty measures. I propose various such measures for the watershed cut algorithm along with a theoretical analysis of some of their properties in Chapter 2. Furthermore, real-world images often admit many different segmentations that have nearly the same quality according to the underlying energy function. The diversity of these solutions may be a powerful uncertainty indicator. In Chapter 3 the crucial prerequisite in the context of seeded segmentation with minimum spanning trees (i.e. edge-weighted watersheds) is provided. Specifically, it is shown how to efficiently enumerate the k smallest spanning trees that result in different segmentations. Furthermore, I propose a scheme that allows to partition an image into a previously unknown number of segments, using only minimal supervision in terms of a few must-link and cannot-link annotations. The algorithm presented in Chapter 4 makes no use of regional data terms, learning instead what constitutes a likely boundary between segments. Since boundaries are only implicitly specified through cannot-link constraints, this is a hard and nonconvex latent variable problem. This problem is adressed in a greedy fashion using a randomized decision tree on features associated with interpixel edges. I propose to use a structured purity criterion during tree construction and also show how a backtracking strategy can be used to prevent the greedy search from ending up in poor local optima. The problem of learning a boundary classifier from sparse user annotations is also considered in Chapter 5. Here the problem is mapped to a multiple instance learning task where positive bags consist of paths on a graph that cross a segmentation boundary and negative bags consist of paths inside a user scribble. Multiple instance learning is also the topic of Chapter 6. Here I propose a multiple instance learning algorithm based on randomized decision trees. Experiments on the typical benchmark data sets show that this model’s prediction performance is clearly better than earlier tree based methods, and is only slightly below that of more expensive methods. Finally, a flow graph based computation library is discussed in Chapter 7. The presented library is used as the backend in a interactive learning and segmentation toolkit and supports a rich set of notification mechanisms for the interaction with a graphical user interface

    Machine Learning for Instance Segmentation

    Get PDF
    Volumetric Electron Microscopy images can be used for connectomics, the study of brain connectivity at the cellular level. A prerequisite for this inquiry is the automatic identification of neural cells, which requires machine learning algorithms and in particular efficient image segmentation algorithms. In this thesis, we develop new algorithms for this task. In the first part we provide, for the first time in this field, a method for training a neural network to predict optimal input data for a watershed algorithm. We demonstrate its superior performance compared to other segmentation methods of its category. In the second part, we develop an efficient watershed-based algorithm for weighted graph partitioning, the \emph{Mutex Watershed}, which uses negative edge-weights for the first time. We show that it is intimately related to the multicut and has a cutting edge performance on a connectomics challenge. Our algorithm is currently used by the leaders of two connectomics challenges. Finally, motivated by inpainting neural networks, we create a method to learn the graph weights without any supervision

    Learning Instance Segmentation from Sparse Supervision

    Get PDF
    Instance segmentation is an important task in many domains of automatic image processing, such as self-driving cars, robotics and microscopy data analysis. Recently, deep learning-based algorithms have brought image segmentation close to human performance. However, most existing models rely on dense groundtruth labels for training, which are expensive, time consuming and often require experienced annotators to perform the labeling. Besides the annotation burden, training complex high-capacity neural networks depends upon non-trivial expertise in the choice and tuning of hyperparameters, making the adoption of these models challenging for researchers in other fields. The aim of this work is twofold. The first is to make the deep learning segmentation methods accessible to non-specialist. The second is to address the dense annotation problem by developing instance segmentation methods trainable with limited groundtruth data. In the first part of this thesis, I bring state-of-the-art instance segmentation methods closer to non-experts by developing PlantSeg: a pipeline for volumetric segmentation of light microscopy images of biological tissues into cells. PlantSeg comes with a large repository of pre-trained models and delivers highly accurate results on a variety of samples and image modalities. We exemplify its usefulness to answer biological questions in several collaborative research projects. In the second part, I tackle the dense annotation bottleneck by introducing SPOCO, an instance segmentation method, which can be trained from just a few annotated objects. It demonstrates strong segmentation performance on challenging natural and biological benchmark datasets at a very reduced manual annotation cost and delivers state-of-the-art results on the CVPPP benchmark. In summary, my contributions enable training of instance segmentation models with limited amounts of labeled data and make these methods more accessible for non-experts, speeding up the process of quantitative data analysis

    Semantic models of scenes and objects for service and industrial robotics

    Get PDF
    What may seem straightforward for the human perception system is still challenging for robots. Automatically segmenting the elements with highest relevance or salience, i.e. the semantics, is non-trivial given the high level of variability in the world and the limits of vision sensors. This stands up when multiple ambiguous sources of information are available, which is the case when dealing with moving robots. This thesis leverages on the availability of contextual cues and multiple points of view to make the segmentation task easier. Four robotic applications will be presented, two designed for service robotics and two for an industrial context. Semantic models of indoor environments will be built enriching geometric reconstructions with semantic information about objects, structural elements and humans. Our approach leverages on the importance of context, the availability of multiple source of information, as well as multiple view points showing with extensive experiments on several datasets that these are all crucial elements to boost state-of-the-art performances. Furthermore, moving to applications with robots analyzing object surfaces instead of their surroundings, semantic models of Carbon Fiber Reinforced Polymers will be built augmenting geometric models with accurate measurements of superficial fiber orientations, and inner defects invisible to the human-eye. We succeeded in reaching an industrial grade accuracy making these models useful for autonomous quality inspection and process optimization. In all applications, special attention will be paid towards fast methods suitable for real robots like the two prototypes presented in this thesis
    corecore