224 research outputs found

    Embodied Visual Perception Models For Human Behavior Understanding

    Get PDF
    Many modern applications require extracting the core attributes of human behavior such as a person\u27s attention, intent, or skill level from the visual data. There are two main challenges related to this problem. First, we need models that can represent visual data in terms of object-level cues. Second, we need models that can infer the core behavioral attributes from the visual data. We refer to these two challenges as ``learning to see\u27\u27, and ``seeing to learn\u27\u27 respectively. In this PhD thesis, we have made progress towards addressing both challenges. We tackle the problem of ``learning to see\u27\u27 by developing methods that extract object-level information directly from raw visual data. This includes, two top-down contour detectors, DeepEdge and HfL, which can be used to aid high-level vision tasks such as object detection. Furthermore, we also present two semantic object segmentation methods, Boundary Neural Fields (BNFs), and Convolutional Random Walk Networks (RWNs), which integrate low-level affinity cues into an object segmentation process. We then shift our focus to video-level understanding, and present a Spatiotemporal Sampling Network (STSN), which can be used for video object detection, and discriminative motion feature learning. Afterwards, we transition into the second subproblem of ``seeing to learn\u27\u27, for which we leverage first-person GoPro cameras that record what people see during a particular activity. We aim to infer the core behavior attributes such as a person\u27s attention, intention, and his skill level from such first-person data. To do so, we first propose a concept of action-objects--the objects that capture person\u27s conscious visual (watching a TV) or tactile (taking a cup) interactions. We then introduce two models, EgoNet and Visual-Spatial Network (VSN), which detect action-objects in supervised and unsupervised settings respectively. Afterwards, we focus on a behavior understanding task in a complex basketball activity. We present a method for evaluating players\u27 skill level from their first-person basketball videos, and also a model that predicts a player\u27s future motion trajectory from a single first-person image

    Brain MR Image Segmentation: From Multi-Atlas Method To Deep Learning Models

    Get PDF
    Quantitative analysis of the brain structures on magnetic resonance (MR) images plays a crucial role in examining brain development and abnormality, as well as in aiding the treatment planning. Although manual delineation is commonly considered as the gold standard, it suffers from the shortcomings in terms of low efficiency and inter-rater variability. Therefore, developing automatic anatomical segmentation of human brain is of importance in providing a tool for quantitative analysis (e.g., volume measurement, shape analysis, cortical surface mapping). Despite a large number of existing techniques, the automatic segmentation of brain MR images remains a challenging task due to the complexity of the brain anatomical structures and the great inter- and intra-individual variability among these anatomical structures. To address the existing challenges, four methods are proposed in this thesis. The first work proposes a novel label fusion scheme for the multi-atlas segmentation. A two-stage majority voting scheme is developed to address the over-segmentation problem in the hippocampus segmentation of brain MR images. The second work of the thesis develops a supervoxel graphical model for the whole brain segmentation, in order to relieve the dependencies on complicated pairwise registration for the multi-atlas segmentation methods. Based on the assumption that pixels within a supervoxel are supposed to have the same label, the proposed method converts the voxel labeling problem to a supervoxel labeling problem which is solved by a maximum-a-posteriori (MAP) inference in Markov random field (MRF) defined on supervoxels. The third work incorporates attention mechanism into convolutional neural networks (CNN), aiming at learning the spatial dependencies between the shallow layers and the deep layers in CNN and producing an aggregation of the attended local feature and high-level features to obtain more precise segmentation results. The fourth method takes advantage of the success of CNN in computer vision, combines the strength of the graphical model with CNN, and integrates them into an end-to-end training network. The proposed methods are evaluated on public MR image datasets, such as MICCAI2012, LPBA40, and IBSR. Extensive experiments demonstrate the effectiveness and superior performance of the three proposed methods compared with the other state-of-the-art methods

    Deep learning for image-based liver analysis — A comprehensive review focusing on malignant lesions

    Get PDF
    Deep learning-based methods, in particular, convolutional neural networks and fully convolutional networks are now widely used in the medical image analysis domain. The scope of this review focuses on the analysis using deep learning of focal liver lesions, with a special interest in hepatocellular carcinoma and metastatic cancer; and structures like the parenchyma or the vascular system. Here, we address several neural network architectures used for analyzing the anatomical structures and lesions in the liver from various imaging modalities such as computed tomography, magnetic resonance imaging and ultrasound. Image analysis tasks like segmentation, object detection and classification for the liver, liver vessels and liver lesions are discussed. Based on the qualitative search, 91 papers were filtered out for the survey, including journal publications and conference proceedings. The papers reviewed in this work are grouped into eight categories based on the methodologies used. By comparing the evaluation metrics, hybrid models performed better for both the liver and the lesion segmentation tasks, ensemble classifiers performed better for the vessel segmentation tasks and combined approach performed better for both the lesion classification and detection tasks. The performance was measured based on the Dice score for the segmentation, and accuracy for the classification and detection tasks, which are the most commonly used metrics.publishedVersio

    Medical Image Segmentation Review: The success of U-Net

    Full text link
    Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository.Comment: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journa
    • …
    corecore