129 research outputs found

    ANALYZING PULMONARY ABNORMALITY WITH SUPERPIXEL BASED GRAPH NEURAL NETWORKS IN CHEST X-RAY

    Get PDF
    In recent years, the utilization of graph-based deep learning has gained prominence, yet its potential in the realm of medical diagnosis remains relatively unexplored. Convolutional Neural Network (CNN) has achieved state-of-the-art performance in areas such as computer vision, particularly for grid-like data such as images. However, they require a huge dataset to achieve top level of performance and challenge arises when learning from the inherent irregular/unordered nature of physiological data. In this thesis, the research primarily focuses on abnormality screening: classification of Chest X-Ray (CXR) as Tuberculosis positive or negative, using Graph Neural Networks (GNN) that uses Region Adjacency Graphs (RAGs), and each superpixel serves as a dedicated graph node. For graph classification, provided that the different classes are distinct enough GNN often classify graphs using just the graph structures. This study delves into the inquiry of whether the incorporation of node features, such as coordinate points and pixel intensity, along with structured data representing graph can enhance the learning process. By integration of residual and concatenation structures, this methodology adeptly captures essential features and relationships among superpixels, thereby contributing to advancements in tuberculosis identification. We achieved the best performance: accuracy of 0.80 and AUC of 0.79, through the union of state-of-the-art neural network architectures and innovative graph-based representations. This work introduces a new perspective to medical image analysis

    [pt] SEGMENTAÇÃO SEMÂNTICA DE CONJUNTO ABERTO APLICADA A IMAGENS DE SENSORIAMENTO REMOTO

    Get PDF

    Automatic Image Segmentation by Dynamic Region Merging

    Full text link
    This paper addresses the automatic image segmentation problem in a region merging style. With an initially over-segmented image, in which the many regions (or super-pixels) with homogeneous color are detected, image segmentation is performed by iteratively merging the regions according to a statistical test. There are two essential issues in a region merging algorithm: order of merging and the stopping criterion. In the proposed algorithm, these two issues are solved by a novel predicate, which is defined by the sequential probability ratio test (SPRT) and the maximum likelihood criterion. Starting from an over-segmented image, neighboring regions are progressively merged if there is an evidence for merging according to this predicate. We show that the merging order follows the principle of dynamic programming. This formulates image segmentation as an inference problem, where the final segmentation is established based on the observed image. We also prove that the produced segmentation satisfies certain global properties. In addition, a faster algorithm is developed to accelerate the region merging process, which maintains a nearest neighbor graph in each iteration. Experiments on real natural images are conducted to demonstrate the performance of the proposed dynamic region merging algorithm.Comment: 28 pages. This paper is under review in IEEE TI

    Discrete Optimization Methods for Segmentation and Matching

    Get PDF
    This dissertation studies discrete optimization methods for several computer vision problems. In the first part, a new objective function for superpixel segmentation is proposed. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. I present a new graph construction for images and show that this construction induces a matroid. The segmentation is then given by the graph topology which maximizes the objective function under the matroid constraint. By exploiting submodular and monotonic properties of the objective function, I develop an efficient algorithm with a worst-case performance bound of 12\frac{1}{2} for the superpixel segmentation problem. Extensive experiments on the Berkeley segmentation benchmark show the proposed algorithm outperforms the state of the art in all the standard evaluation metrics. Next, I propose a video segmentation algorithm by maximizing a submodular objective function subject to a matroid constraint. This function is similar to the standard energy function in computer vision with unary terms, pairwise terms from the Potts model, and a novel higher-order term based on appearance histograms. I show that the standard Potts model prior, which becomes non-submodular for multi-label problems, still induces a submodular function in a maximization framework. A new higher-order prior further enforces consistency in the appearance histograms both spatially and temporally across the video. The matroid constraint leads to a simple algorithm with a performance bound of 12\frac{1}{2}. A branch and bound procedure is also presented to improve the solution computed by the algorithm. The last part of the dissertation studies the object localization problem in images given a single hand-drawn example or a gallery of shapes as the object model. Although many shape matching algorithms have been proposed for the problem, chamfer matching remains to be the preferred method when speed and robustness are considered. In this dissertation, I significantly improve the accuracy of chamfer matching while reducing the computational time from linear to sublinear (shown empirically). It is achieved by incorporating edge orientation information in the matching algorithm so the resulting cost function is piecewise smooth and the cost variation is tightly bounded. Moreover, I present a sublinear time algorithm for exact computation of the directional chamfer matching score using techniques from 3D distance transforms and directional integral images. In addition, the smooth cost function allows one to bound the cost distribution of large neighborhoods and skip the bad hypotheses. Experiments show that the proposed approach improves the speed of the original chamfer matching up to an order of 45 times, and it is much faster than many state of art techniques while the accuracy is comparable. I further demonstrate the application of the proposed algorithm in providing seamless operation for a robotic bin picking system

    An adaptive training-less framework for anomaly detection in crowd scenes

    Get PDF
    Anomaly detection in crowd videos has become a popular area of research for the computer vision community. Several existing methods have determined anomaly as a deviation from scene normalcy learned via separate training with/without labeled information. However, owing to rare and sparse nature of anomalous events, any such learning can be misleading as there exist no hardcore segregation between anomalous and non-anomalous events. To address such challenge, we propose an adaptive training-less system capable of detecting anomaly on-the-fly. Our solution pipeline consists of three major components, namely, adaptive 3D-DCT model for multi-object detection-based association, local motion descriptor generation through an improved saliency guided optical flow, and anomaly detection based on Earth mover's distance (EMD). The proposed model, despite being training-free, is found to achieve comparable performance with several state-of-the-art methods on publicly available UCSD, UMN, CUHK-Avenue and ShanghaiTech datasets.</p

    Depth Synthesis and Local Warps for Plausible Image-based Navigation

    Get PDF
    International audienceModern camera calibration and multiview stereo techniques enable users to smoothly navigate between different views of a scene captured using standard cameras. The underlying automatic 3D reconstruction methods work well for buildings and regular structures but often fail on vegetation, vehicles and other complex geometry present in everyday urban scenes. Consequently, missing depth information makes image-based rendering (IBR) for such scenes very challenging. Our goal is to provide plausible free-viewpoint navigation for such datasets. To do this, we introduce a new IBR algorithm that is robust to missing or unreliable geometry, providing plausible novel views even in regions quite far from the input camera positions. We first oversegment the input images, creating superpixels of homogeneous color content which often tends to preserve depth discontinuities. We then introduce a depth-synthesis approach for poorly reconstructed regions based on a graph structure on the oversegmentation and appropriate traversal of the graph. The superpixels augmented with synthesized depth allow us to define a local shape-preserving warp which compensates for inaccurate depth. Our rendering algorithm blends the warped images, and generates plausible image-based novel views for our challenging target scenes. Our results demonstrate novel view synthesis in real time for multiple challenging scenes with significant depth complexity, providing a convincing immersive navigation experience

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Spatio-Temporal Object Detection Proposals

    Get PDF
    International audienceSpatio-temporal detection of actions and events in video is a challenging problem. Besides the difficulties related to recognition, a major challenge for detection in video is the size of the search space defined by spatio-temporal tubes formed by sequences of bounding boxes along the frames. Recently methods that generate unsupervised detection proposals have proven to be very effective for object detection in still images. These methods open the possibility to use strong but computationally expensive features since only a relatively small number of detection hypotheses need to be assessed. In this paper we make two contributions towards exploiting detection proposals for spatio-temporal detection problems. First, we extend a recent 2D object proposal method, to produce spatio-temporal proposals by a randomized supervoxel merging process. We introduce spatial, temporal, and spatio-temporal pairwise supervoxel features that are used to guide the merging process. Second, we propose a new efficient supervoxel method. We experimentally evaluate our detection proposals, in combination with our new supervoxel method as well as existing ones. This evaluation shows that our supervoxels lead to more accurate proposals when compared to using existing state-of-the-art supervoxel methods
    corecore