7,093 research outputs found

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    Stratified decision forests for accurate anatomical landmark localization in cardiac images

    Get PDF
    Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy

    Colorization and Automated Segmentation of Human T2 MR Brain Images for Characterization of Soft Tissues

    Get PDF
    Characterization of tissues like brain by using magnetic resonance (MR) images and colorization of the gray scale image has been reported in the literature, along with the advantages and drawbacks. Here, we present two independent methods; (i) a novel colorization method to underscore the variability in brain MR images, indicative of the underlying physical density of bio tissue, (ii) a segmentation method (both hard and soft segmentation) to characterize gray brain MR images. The segmented images are then transformed into color using the above-mentioned colorization method, yielding promising results for manual tracing. Our color transformation incorporates the voxel classification by matching the luminance of voxels of the source MR image and provided color image by measuring the distance between them. The segmentation method is based on single-phase clustering for 2D and 3D image segmentation with a new auto centroid selection method, which divides the image into three distinct regions (gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) using prior anatomical knowledge). Results have been successfully validated on human T2-weighted (T2) brain MR images. The proposed method can be potentially applied to gray-scale images from other imaging modalities, in bringing out additional diagnostic tissue information contained in the colorized image processing approach as described

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table
    corecore