3,603 research outputs found

    DISC: Deep Image Saliency Computing via Progressive Representation Learning

    Full text link
    Salient object detection increasingly receives attention as an important component or step in several pattern recognition and image processing tasks. Although a variety of powerful saliency models have been intensively proposed, they usually involve heavy feature (or model) engineering based on priors (or assumptions) about the properties of objects and backgrounds. Inspired by the effectiveness of recently developed feature learning, we provide a novel Deep Image Saliency Computing (DISC) framework for fine-grained image saliency computing. In particular, we model the image saliency from both the coarse- and fine-level observations, and utilize the deep convolutional neural network (CNN) to learn the saliency representation in a progressive manner. Specifically, our saliency model is built upon two stacked CNNs. The first CNN generates a coarse-level saliency map by taking the overall image as the input, roughly identifying saliency regions in the global context. Furthermore, we integrate superpixel-based local context information in the first CNN to refine the coarse-level saliency map. Guided by the coarse saliency map, the second CNN focuses on the local context to produce fine-grained and accurate saliency map while preserving object details. For a testing image, the two CNNs collaboratively conduct the saliency computing in one shot. Our DISC framework is capable of uniformly highlighting the objects-of-interest from complex background while preserving well object details. Extensive experiments on several standard benchmarks suggest that DISC outperforms other state-of-the-art methods and it also generalizes well across datasets without additional training. The executable version of DISC is available online: http://vision.sysu.edu.cn/projects/DISC.Comment: This manuscript is the accepted version for IEEE Transactions on Neural Networks and Learning Systems (T-NNLS), 201

    An Iterative Co-Saliency Framework for RGBD Images

    Full text link
    As a newly emerging and significant topic in computer vision community, co-saliency detection aims at discovering the common salient objects in multiple related images. The existing methods often generate the co-saliency map through a direct forward pipeline which is based on the designed cues or initialization, but lack the refinement-cycle scheme. Moreover, they mainly focus on RGB image and ignore the depth information for RGBD images. In this paper, we propose an iterative RGBD co-saliency framework, which utilizes the existing single saliency maps as the initialization, and generates the final RGBD cosaliency map by using a refinement-cycle model. Three schemes are employed in the proposed RGBD co-saliency framework, which include the addition scheme, deletion scheme, and iteration scheme. The addition scheme is used to highlight the salient regions based on intra-image depth propagation and saliency propagation, while the deletion scheme filters the saliency regions and removes the non-common salient regions based on interimage constraint. The iteration scheme is proposed to obtain more homogeneous and consistent co-saliency map. Furthermore, a novel descriptor, named depth shape prior, is proposed in the addition scheme to introduce the depth information to enhance identification of co-salient objects. The proposed method can effectively exploit any existing 2D saliency model to work well in RGBD co-saliency scenarios. The experiments on two RGBD cosaliency datasets demonstrate the effectiveness of our proposed framework.Comment: 13 pages, 13 figures, Accepted by IEEE Transactions on Cybernetics 2017. Project URL: https://rmcong.github.io/proj_RGBD_cosal_tcyb.htm

    Fully automated segmentation and tracking of the intima media thickness in ultrasound video sequences of the common carotid artery

    Get PDF
    Abstract—The robust identification and measurement of the intima media thickness (IMT) has a high clinical relevance because it represents one of the most precise predictors used in the assessment of potential future cardiovascular events. To facilitate the analysis of arterial wall thickening in serial clinical investigations, in this paper we have developed a novel fully automatic algorithm for the segmentation, measurement, and tracking of the intima media complex (IMC) in B-mode ultrasound video sequences. The proposed algorithm entails a two-stage image analysis process that initially addresses the segmentation of the IMC in the first frame of the ultrasound video sequence using a model-based approach; in the second step, a novel customized tracking procedure is applied to robustly detect the IMC in the subsequent frames. For the video tracking procedure, we introduce a spatially coherent algorithm called adaptive normalized correlation that prevents the tracking process from converging to wrong arterial interfaces. This represents the main contribution of this paper and was developed to deal with inconsistencies in the appearance of the IMC over the cardiac cycle. The quantitative evaluation has been carried out on 40 ultrasound video sequences of the common carotid artery (CCA) by comparing the results returned by the developed algorithm with respect to ground truth data that has been manually annotated by clinical experts. The measured IMTmean ± standard deviation recorded by the proposed algorithm is 0.60 mm ± 0.10, with a mean coefficient of variation (CV) of 2.05%, whereas the corresponding result obtained for the manually annotated ground truth data is 0.60 mm ± 0.11 with a mean CV equal to 5.60%. The numerical results reported in this paper indicate that the proposed algorithm is able to correctly segment and track the IMC in ultrasound CCA video sequences, and we were encouraged by the stability of our technique when applied to data captured under different imaging conditions. Future clinical studies will focus on the evaluation of patients that are affected by advanced cardiovascular conditions such as focal thickening and arterial plaques

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Parsing Objects at a Finer Granularity: A Survey

    Full text link
    Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.Comment: Survey for fine-grained part segmentation and object recognition; Accepted by Machine Intelligence Research (MIR
    corecore