191 research outputs found

    Weakly Supervised Intracranial Hemorrhage Segmentation using Head-Wise Gradient-Infused Self-Attention Maps from a Swin Transformer in Categorical Learning

    Full text link
    Intracranial hemorrhage (ICH) is a life-threatening medical emergency caused by various factors. Timely and precise diagnosis of ICH is crucial for administering effective treatment and improving patient survival rates. While deep learning techniques have emerged as the leading approach for medical image analysis and processing, the most commonly employed supervised learning often requires large, high-quality annotated datasets that can be costly to obtain, particularly for pixel/voxel-wise image segmentation. To address this challenge and facilitate ICH treatment decisions, we proposed a novel weakly supervised ICH segmentation method that leverages a hierarchical combination of head-wise gradient-infused self-attention maps obtained from a Swin transformer. The transformer is trained using an ICH classification task with categorical labels. To build and validate the proposed technique, we used two publicly available clinical CT datasets, namely RSNA 2019 Brain CT hemorrhage and PhysioNet. Additionally, we conducted an exploratory study comparing two learning strategies - binary classification and full ICH subtyping - to assess their impact on self-attention and our weakly supervised ICH segmentation framework. The proposed algorithm was compared against the popular U-Net with full supervision, as well as a similar weakly supervised approach using Grad-CAM for ICH segmentation. With a mean Dice score of 0.47, our technique achieved similar ICH segmentation performance as the U-Net and outperformed the Grad-CAM based approach, demonstrating the excellent potential of the proposed framework in challenging medical image segmentation tasks

    Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

    Full text link
    Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201

    A top-down methodology to depth map estimation controlled by morphological segmentation

    No full text
    Given a pair of stereo images and the transformation existing between the corresponding camera coordinate systems, the depth of a scene point can be computed from its projections on both images. Despite the difficulties related to the matching of such projections across homogeneous regions and the occlusion phenomenon, state of the art methods have already produced accurate results on classical stereo datasets. This article proposes a new way of approaching depth estimation. Instead of searching for dense pixel correspondences, a gross estimation of the disparities is initially performed at the region level, resulting in a regional disparity map which highlights the principal depth layers of the image. The disparity map is then systematically refined by considering finer partitions of the image. To this end, the watershed of the image colour gradient is selected in order to compute the image partitions alongside a meaningful hierarchy. We show that the ability to be driven by labelled markers enables the watershed algorithm to generate a co-segmentation of both stereo images given the regional disparities, which constitutes the main contribution of this paper. This co-segmentation allows one to reliably compute the disparities of pixels along the region contours. Finally, the contour disparities are transferred to the concerned regions after a careful analysis of their occlusion state with respect to each adjacent region. Though approximate, we show that the proposed method yields regional disparity maps which are close enough to ground truths in the view of performing the desired refinements. We also expose the perspectives of this methodology with respect to challenging stereo imagery, i.e. which is affected by noise or which contains a considerable amount of homogeneous regions

    Salient object detection via reciprocal function filter

    Get PDF

    Learning to Generate and Refine Object Proposals

    Get PDF
    Visual object recognition is a fundamental and challenging problem in computer vision. To build a practical recognition system, one is first confronted with high computation complexity due to an enormous search space from an image, which is caused by large variations in object appearance, pose and mutual occlusion, as well as other environmental factors. To reduce the search complexity, a moderate set of image regions that are likely to contain an object, regardless of its category, are usually first generated in modern object recognition subsystems. These possible object regions are called object proposals, object hypotheses or object candidates, which can be used for down-stream classification or global reasoning in many different vision tasks like object detection, segmentation and tracking, etc. This thesis addresses the problem of object proposal generation, including bounding box and segment proposal generation, in real-world scenarios. In particular, we investigate the representation learning in object proposal generation with 3D cues and contextual information, aiming to propose higher-quality object candidates which have higher object recall, better boundary coverage and lower number. We focus on three main issues: 1) how can we incorporate additional geometric and high-level semantic context information into the proposal generation for stereo images? 2) how do we generate object segment proposals for stereo images with learning representations and learning grouping process? and 3) how can we learn a context-driven representation to refine segment proposals efficiently? In this thesis, we propose a series of solutions to address each of the raised problems. We first propose a semantic context and depth-aware object proposal generation method. We design a set of new cues to encode the objectness, and then train an efficient random forest classifier to re-rank the initial proposals and linear regressors to fine-tune their locations. Next, we extend the task to the segment proposal generation in the same setting and develop a learning-based segment proposal generation method for stereo images. Our method makes use of learned deep features and designed geometric features to represent a region and learns a similarity network to guide the superpixel grouping process. We also learn a ranking network to predict the objectness score for each segment proposal. To address the third problem, we take a transformation-based approach to improve the quality of a given segment candidate pool based on context information. We propose an efficient deep network that learns affine transformations to warp an initial object mask towards nearby object region, based on a novel feature pooling strategy. Finally, we extend our affine warping approach to address the object-mask alignment problem and particularly the problem of refining a set of segment proposals. We design an end-to-end deep spatial transformer network that learns free-form deformations (FFDs) to non-rigidly warp the shape mask towards the ground truth, based on a multi-level dual mask feature pooling strategy. We evaluate all our approaches on several publicly available object recognition datasets and show superior performance

    Digitalisation de partitions et de tessellations

    Get PDF
    International audienceCette étude concerne le partitionnement d'un ensemble de telle sorte que les séparations entre classes soient matérialisées. On le résoud, dans les cas continu et discret, au moyen de hiérarchies de tesselations dont les classes sont des ouverts réguliers. Dans le cas discret, le passage partition→tessellation s'exprime par des topologies d'Alexandrov, et débouche sur des doubles résolutions. Les ambiguités de configurations diagonales ne sont levées que par la trame triangulaire à deux dimensions, et cubique centrée à trois dimensions. Seules ces trames préservent la connexité des classes dans les hiérarchies, et l'on peut alors introduire des fonctions de saillance. On montre enfin que les seules partitions euclidiennes expérimentalement accessibles sont les tesselations
    • …
    corecore