105 research outputs found

    Contextual hypergraph modeling for salient object detection

    Get PDF
    Salient object detection aims to locate objects that capture human attention within images. Previous approaches often pose this as a problem of image contrast analysis. In this work, we model an image as a hypergraph that utilizes a set of hyperedges to capture the contextual properties of image pixels or regions. As a result, the problem of salient object detection becomes one of finding salient vertices and hyperedges in the hypergraph. The main advantage of hypergraph modeling is that it takes into account each pixel’s (or region’s) affinity with its neighborhood as well as its separation from image background. Furthermore, we propose an alternative approach based on centerversus- surround contextual contrast analysis, which performs salient object detection by optimizing a cost-sensitive support vector machine (SVM) objective function. Experimental results on four challenging datasets demonstrate the effectiveness of the proposed approaches against the stateof- the-art approaches to salient object detection.Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton van den Henge

    Salient Object Detection Techniques in Computer Vision-A Survey.

    Full text link
    Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end

    Efficient salient region detection with soft image abstraction

    Get PDF
    Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale per-ceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial dis-tribution of image pixels, the proposed representation ab-stracts out unnecessary image details, allowing the assign-ment of comparable saliency values across similar regions, and producing perceptually accurate salient region detec-tion. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the pro-posed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2 % compared to the previous best result, while being computationally more efficient. 1

    Automatic Landmarking for Non-cooperative 3D Face Recognition

    Get PDF
    This thesis describes a new framework for 3D surface landmarking and evaluates its performance for feature localisation on human faces. This framework has two main parts that can be designed and optimised independently. The first one is a keypoint detection system that returns positions of interest for a given mesh surface by using a learnt dictionary of local shapes. The second one is a labelling system, using model fitting approaches that establish a one-to-one correspondence between the set of unlabelled input points and a learnt representation of the class of object to detect. Our keypoint detection system returns local maxima over score maps that are generated from an arbitrarily large set of local shape descriptors. The distributions of these descriptors (scalars or histograms) are learnt for known landmark positions on a training dataset in order to generate a model. The similarity between the input descriptor value for a given vertex and a model shape is used as a descriptor-related score. Our labelling system can make use of both hypergraph matching techniques and rigid registration techniques to reduce the ambiguity attached to unlabelled input keypoints for which a list of model landmark candidates have been seeded. The soft matching techniques use multi-attributed hyperedges to reduce ambiguity, while the registration techniques use scale-adapted rigid transformation computed from 3 or more points in order to obtain one-to-one correspondences. Our final system achieves better or comparable (depending on the metric) results than the state-of-the-art while being more generic. It does not require pre-processing such as cropping, spike removal and hole filling and is more robust to occlusion of salient local regions, such as those near the nose tip and inner eye corners. It is also fully pose invariant and can be used with kinds of objects other than faces, provided that labelled training data is available

    Semantic multimedia modelling & interpretation for annotation

    Get PDF
    The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems
    • …
    corecore