110,950 research outputs found

    FreMAE: Fourier Transform Meets Masked Autoencoders for Medical Image Segmentation

    Full text link
    The research community has witnessed the powerful potential of self-supervised Masked Image Modeling (MIM), which enables the models capable of learning visual representation from unlabeled data. In this paper, to incorporate both the crucial global structural information and local details for dense prediction tasks, we alter the perspective to the frequency domain and present a new MIM-based framework named FreMAE for self-supervised pre-training for medical image segmentation. Based on the observations that the detailed structural information mainly lies in the high-frequency components and the high-level semantics are abundant in the low-frequency counterparts, we further incorporate multi-stage supervision to guide the representation learning during the pre-training phase. Extensive experiments on three benchmark datasets show the superior advantage of our proposed FreMAE over previous state-of-the-art MIM methods. Compared with various baselines trained from scratch, our FreMAE could consistently bring considerable improvements to the model performance. To the best our knowledge, this is the first attempt towards MIM with Fourier Transform in medical image segmentation

    SC-VAE: Sparse Coding-based Variational Autoencoder

    Full text link
    Learning rich data representations from unlabeled data is a key challenge towards applying deep learning algorithms in downstream supervised tasks. Several variants of variational autoencoders have been proposed to learn compact data representaitons by encoding high-dimensional data in a lower dimensional space. Two main classes of VAEs methods may be distinguished depending on the characteristics of the meta-priors that are enforced in the representation learning step. The first class of methods derives a continuous encoding by assuming a static prior distribution in the latent space. The second class of methods learns instead a discrete latent representation using vector quantization (VQ) along with a codebook. However, both classes of methods suffer from certain challenges, which may lead to suboptimal image reconstruction results. The first class of methods suffers from posterior collapse, whereas the second class of methods suffers from codebook collapse. To address these challenges, we introduce a new VAE variant, termed SC-VAE (sparse coding-based VAE), which integrates sparse coding within variational autoencoder framework. Instead of learning a continuous or discrete latent representation, the proposed method learns a sparse data representation that consists of a linear combination of a small number of learned atoms. The sparse coding problem is solved using a learnable version of the iterative shrinkage thresholding algorithm (ISTA). Experiments on two image datasets demonstrate that our model can achieve improved image reconstruction results compared to state-of-the-art methods. Moreover, the use of learned sparse code vectors allows us to perform downstream task like coarse image segmentation through clustering image patches.Comment: 15 pages, 11 figures, and 3 table

    Accurate video object tracking using a region-based particle filter

    Get PDF
    Usually, in particle filters applied to video tracking, a simple geometrical shape, typically an ellipse, is used in order to bound the object being tracked. Although it is a good tracker, it tends to a bad object representation, as most of the world objects are not simple geometrical shapes. A better way to represent the object is by using a region-based approach, such as the Region Based Particle Filter (RBPF). This method exploits a hierarchical region based representation associated with images to tackle both problems at the same time: tracking and video object segmentation. By means of RBPF the object segmentation is resolved with high accuracy, but new problems arise. The object representation is now based on image partitions instead of pixels. This means that the amount of possible combinations has now decreased, which is computationally good, but an error on the regions taken for the object representation leads to a higher estimation error than methods working at pixel level. On the other hand, if the level of regions detail in the partition is high, the estimation of the object turns to be very noisy, making it hard to accurately propagate the object segmentation. In this thesis we present new tools to the existing RBPF. These tools are focused on increasing the RBPF performance by means of guiding the particles towards a good solution while maintaining a particle filter approach. The concept of hierarchical flow is presented and exploited, a Bayesian estimation is used in order to assign probabilities of being object or background to each region, and the reduction, in an intelligent way, of the solution space , to increase the RBPF robustness while reducing computational effort. Also changes on the already proposed co-clustering in the RBPF approach are proposed. Finally, we present results on the recently presented DAVIS database. This database comprises 50 High Definition video sequences representing several challenging situations. By using this dataset, we compare the RBPF with other state-ofthe- art methods

    Implicit deformable models for biomedical image segmentation.

    Get PDF
    In this thesis, new methods for the efficient segmentation of images are presented. The proposed methods are based on the deformable model approach, and can be used efficiently in the segmentation of complex geometries from various imaging modalities. A novel deformable model that is based on a geometrically induced external force field which can be conveniently generalized to arbitrary dimensions is presented. This external force field is based on hypothesized interactions between the relative geometries of the deformable model and the object boundary characterized by image gradient. The evolution of the deformable model is solved using the level set method so that topological changes are handled automatically. The relative geometrical configurations between the deformable model and the object boundaries contributes to a dynamic vector force field that changes accordingly as the deformable model evolves. The geometrically induced dynamic interaction force has been shown to greatly improve the deformable model performance in acquiring complex geometries and highly concave boundaries, and give the deformable model a high invariance in initialization configurations. The voxel interactions across the whole image domain provides a global view of the object boundary representation, giving the external force a long attraction range. The bidirectionality of the external force held allows the new deformable model to deal with arbitrary cross-boundary initializations, and facilitates the handling of weak edges and broken boundaries. In addition, it is shown that by enhancing the geometrical interaction field with a nonlocal edge-preserving algorithm, the new deformable model can effectively overcome image noise. A comparative study on the segmentation of various geometries with different topologies from both synthetic and real images is provided, and the proposed method is shown to achieve significant improvements against several existing techniques. A robust framework for the segmentation of vascular geometries is described. In particular, the framework consists of image denoising, optimal object edge representation, and segmentation using implicit deformable model. The image denoising is based on vessel enhancing diffusion which can be used to smooth out image noise and enhance the vessel structures. The image object boundaries are derived using an edge detection technique which can produce object edges of single pixel width. The image edge information is then used to derive the geometric interaction field for optimal object edge representation. The vascular geometries are segmented using an implict deformable model. A region constraint is added to the deformable model which allows it to easily get around calcified regions and propagate across the vessels to segment the structures efficiently. The presented framework is ai)plied in the accurate segmentation of carotid geometries from medical images. A new segmentation model with statistical shape prior using a variational approach is also presented in this thesis. The proposed model consists of an image attraction force that propagates contours towards image object boundaries, and a global shape force that attracts the model towards similar shapes in the statistical shape distribution. The image attraction force is derived from gradient vector interactions across the whole image domain, which makes the model more robust to image noise, weak edges and initializations. The statistical shape information is incorporated using kernel density estimation, which allows the shape prior model to handle arbitrary shape variations. It is shown that the proposed model with shape prior can be used to segment object shapes from images efficiently

    Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection

    Full text link
    We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network. However, we propose unique designs for Gen-LaneNet in two folds. First, we introduce a new geometry-guided lane anchor representation in a new coordinate frame and apply a specific geometric transformation to directly calculate real 3D lane points from the network output. We demonstrate that aligning the lane points with the underlying top-view features in the new coordinate frame is critical towards a generalized method in handling unfamiliar scenes. Second, we present a scalable two-stage framework that decouples the learning of image segmentation subnetwork and geometry encoding subnetwork. Compared to 3D-LaneNet, the proposed Gen-LaneNet drastically reduces the amount of 3D lane labels required to achieve a robust solution in real-world application. Moreover, we release a new synthetic dataset and its construction strategy to encourage the development and evaluation of 3D lane detection methods. In experiments, we conduct extensive ablation study to substantiate the proposed Gen-LaneNet significantly outperforms 3D-LaneNet in average precision(AP) and F-score

    Medical Image Segmentation Combining Level Set Method and Deep Belief Networks

    Get PDF
    Medical image segmentation is an important step in medical image analysis, where the main goal is the precise delineation of organs and tumours from medical images. For instance there is evidence in the field that shows a positive correlation between the precision of these segmentations and the accuracy observed in classification systems that use these segmentations as their inputs. Over the last decades, a vast number of medical image segmentation models have been introduced, where these models can be divided into five main groups: 1) image-based approaches, 2) active contour methods, 3) machine learning techniques, 4) atlas-guided segmentation and registration and 5) hybrid models. Image-based approaches use only intensity value or texture for segmenting (i.e., thresholding technique) and they usually do not produce precise segmentation. Active contour methods can use an explicit representation (i.e., snakes) with the goal of minimizing an energy function that forces the contour to move towards strong edges and maintains the contour smoothness. The use of implicit representation in active contour methods (i.e., level set method) embeds the contour as zero level set of a higher dimensional surface (i.e., the curve representing the contour does not need to be parameterized as in the Snakes model). Although successful, the main issue with active contour methods is the fact that the energy function must contain terms describing all possible shape and appearance variations, which is a complicated task given that it is hard to design by hand all these terms. Also, this type of active contour methods may get stuck at image regions that do not belong to the object of interest. Machine learning techniques address this issue by automatically learning shape and appearance models using annotated training images. Nevertheless, in order to meet the high accuracy requirements of medical image analysis applications, machine learning methods usually need large and rich training sets and also face the complexity of the inference process. Atlas-guided segmentation and registration use an atlas image, which is constructed based on manually segmentation images. The new image is segmented by registering it with the atlas image. These techniques have been applied successfully in many applications, but they still face some issues, such as their ability to represent the variability of anatomical structure and scale in medical image, and the complexity of the registration algorithms. In this work, we propose a new hybrid segmentation approach by combining a level set method with a machine learning approach (deep belief network). Our main objective with this approach is to achieve segmentation accuracy results that are either comparable or better than the ones produced with machine learning methods, but using relatively smaller training sets. These weaker requirements on the size of training sets is compensated by the hand designed segmentation terms present in typical level set methods, that are used as prior information on the anatomy to be segmented (e.g., smooth contours, strong edges, etc.). In addition, we choose a machine learning methodology that typically requires smaller annotated training sets, compared to other methods proposed in this field. Specifically, we use deep belief networks, with training sets consisting to a large extent of un-annotated training images. In general, our hybrid segmentation approach uses the result produced by the deep belief network as a prior in the level set evolution. We validate this method on the Medical Image Computing and Computer Assisted Intervention (MICCAI) 2009 left ventricle segmentation challenge database and on the Japanese Society of Radiological Technology (JSRT) lung segmentation dataset. The experiments show that our approach produces competitive results in the field in terms of segmentation accuracy. More specifically, we show that the use of our proposed methodology in a semi-automated segmentation system (i.e., using a manual initialization) produces the best result in the field in both databases above, and in the case of a fully automated system, our method shows results competitive with the current state of the art.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201
    • …
    corecore