420 research outputs found

    Automatic Graph Cut Segmentation of Lesions in CT Using Mean Shift Superpixels

    Get PDF
    This paper presents a new, automatic method of accurately extracting lesions from CT data. It first determines, at each voxel, a five-dimensional (5D) feature vector that contains intensity, shape index, and 3D spatial location. Then, nonparametric mean shift clustering forms superpixels from these 5D features, resulting in an oversegmentation of the image. Finally, a graph cut algorithm groups the superpixels using a novel energy formulation that incorporates shape, intensity, and spatial features. The mean shift superpixels increase the robustness of the result while reducing the computation time. We assume that the lesion is part spherical, resulting in high shape index values in a part of the lesion. From these spherical subregions, foreground and background seeds for the graph cut segmentation can be automatically obtained. The proposed method has been evaluated on a clinical CT dataset. Visual inspection on different types of lesions (lung nodules and colonic polyps), as well as a quantitative evaluation on 101 solid and 80 GGO nodules, both demonstrate the potential of the proposed method. The joint spatial-intensity-shape features provide a powerful cue for successful segmentation of lesions adjacent to structures of similar intensity but different shape, as well as lesions exhibiting partial volume effect

    Surface analysis and fingerprint recognition from multi-light imaging collections

    Get PDF
    Multi-light imaging captures a scene from a fixed viewpoint through multiple photographs, each of which are illuminated from a different direction. Every image reveals information about the surface, with the intensity reflected from each point being measured for all lighting directions. The images captured are known as multi-light image collections (MLICs), for which a variety of techniques have been developed over recent decades to acquire information from the images. These techniques include shape from shading, photometric stereo and reflectance transformation imaging (RTI). Pixel coordinates from one image in a MLIC will correspond to exactly the same position on the surface across all images in the MLIC since the camera does not move. We assess the relevant literature to the methods presented in this thesis in chapter 1 and describe different types of reflections and surface types, as well as explaining the multi-light imaging process. In chapter 2 we present a novel automated RTI method which requires no calibration equipment (i.e. shiny reference spheres or 3D printed structures as other methods require) and automatically computes the lighting direction and compensates for non-uniform illumination. Then in chapter 3 we describe our novel MLIC method termed Remote Extraction of Latent Fingerprints (RELF) which segments each multi-light imaging photograph into superpixels (small groups of pixels) and uses a neural network classifier to determine whether or not the superpixel contains fingerprint. The RELF algorithm then mosaics these superpixels which are classified as fingerprint together in order to obtain a complete latent print image, entirely contactlessly. In chapter 4 we detail our work with the Metropolitan Police Service (MPS) UK, who described to us with their needs and requirements which helped us to create a prototype RELF imaging device which is now being tested by MPS officers who are validating the quality of the latent prints extracted using our technique. In chapter 5 we then further developed our multi-light imaging latent fingerprint technique to extract latent prints from curved surfaces and automatically correct for surface curvature distortions. We have a patent pending for this method

    Depth map compression via 3D region-based representation

    Get PDF
    In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.Postprint (author's final draft

    SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels

    Full text link
    We present Spline-based Convolutional Neural Networks (SplineCNNs), a variant of deep neural networks for irregular structured and geometric input, e.g., graphs or meshes. Our main contribution is a novel convolution operator based on B-splines, that makes the computation time independent from the kernel size due to the local support property of the B-spline basis functions. As a result, we obtain a generalization of the traditional CNN convolution operator by using continuous kernel functions parametrized by a fixed number of trainable weights. In contrast to related approaches that filter in the spectral domain, the proposed method aggregates features purely in the spatial domain. In addition, SplineCNN allows entire end-to-end training of deep architectures, using only the geometric structure as input, instead of handcrafted feature descriptors. For validation, we apply our method on tasks from the fields of image graph classification, shape correspondence and graph node classification, and show that it outperforms or pars state-of-the-art approaches while being significantly faster and having favorable properties like domain-independence.Comment: Presented at CVPR 201

    SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data

    Full text link
    Convolutional Neural Networks have revolutionized vision applications. There are image domains and representations, however, that cannot be handled by standard CNNs (e.g., spherical images, superpixels). Such data are usually processed using networks and algorithms specialized for each type. In this work, we show that it may not always be necessary to use specialized neural networks to operate on such spaces. Instead, we introduce a new structured graph convolution operator that can copy 2D convolution weights, transferring the capabilities of already trained traditional CNNs to our new graph network. This network can then operate on any data that can be represented as a positional graph. By converting non-rectilinear data to a graph, we can apply these convolutions on these irregular image domains without requiring training on large domain-specific datasets. Results of transferring pre-trained image networks for segmentation, stylization, and depth prediction are demonstrated for a variety of such data forms.Comment: To be presented at ECCV 202

    Limbs detection and tracking of head-fixed mice for behavioral phenotyping using motion tubes and deep learning

    Get PDF
    The broad accessibility of affordable and reliable recording equipment and its relative ease of use has enabled neuroscientists to record large amounts of neurophysiological and behavioral data. Given that most of this raw data is unlabeled, great effort is required to adapt it for behavioral phenotyping or signal extraction, for behavioral and neurophysiological data, respectively. Traditional methods for labeling datasets rely on human annotators which is a resource and time intensive process, which often produce data that that is prone to reproducibility errors. Here, we propose a deep learning-based image segmentation framework to automatically extract and label limb movements from movies capturing frontal and lateral views of head-fixed mice. The method decomposes the image into elemental regions (superpixels) with similar appearance and concordant dynamics and stacks them following their partial temporal trajectory. These 3D descriptors (referred as motion cues) are used to train a deep convolutional neural network (CNN). We use the features extracted at the last fully connected layer of the network for training a Long Short Term Memory (LSTM) network that introduces spatio-temporal coherence to the limb segmentation. We tested the pipeline in two video acquisition settings. In the first, the camera is installed on the right side of the mouse (lateral setting). In the second, the camera is installed facing the mouse directly (frontal setting). We also investigated the effect of the noise present in the videos and the amount of training data needed, and we found that reducing the number of training samples does not result in a drop of more than 5% in detection accuracy even when as little as 10% of the available data is used for training
    • …
    corecore