2,908 research outputs found

    'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems

    Full text link
    An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible objects. Therefore, `harder' test images, mimicking the challenging conditions (e.g. occlusion) in which humans routinely recognize objects, need to be utilized for benchmarking. To address the concerns mentioned above, we make two contributions. First, we systematically vary the level of local object-part content, global detail and spatial context in images from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12. Second, we propose an object-part based benchmarking procedure which quantifies classifiers' robustness to a range of visibility and contextual settings. The benchmarking procedure relies on a semantic similarity measure that naturally addresses potential semantic granularity differences between the category labels in training and test datasets, thus eliminating manual mapping. We use our procedure on the PPSS-12 dataset to benchmark top-performing classifiers trained on the ILSVRC-2012 dataset. Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail. Given this capability for additional differentiation, our approach can potentially supplement existing benchmarking procedures used in object recognition challenge leaderboards.Comment: Extended version of our ACCV-2016 paper. Author formatting modifie

    Shape Similarity Measurement for Known-Object Localization: A New Normalized Assessment

    Get PDF
    International audienceThis paper presents a new, normalized measure for assessing a contour-based object pose. Regarding binary images, the algorithm enables supervised assessment of known-object recognition and localization. A performance measure is computed to quantify differences between a reference edge map and a candidate image. Normalization is appropriate for interpreting the result of the pose assessment. Furthermore, the new measure is well motivated by highlighting the limitations of existing metrics to the main shape variations (translation, rotation, and scaling), by showing how the proposed measure is more robust to them. Indeed, this measure can determine to what extent an object shape differs from a desired position. In comparison with 6 other approaches, experiments performed on real images at different sizes/scales demonstrate the suitability of the new method for object-pose or shape-matching estimation

    On the role of distance transformations in Baddeley’s Delta Metric

    Get PDF
    Comparison and similarity measurement have been a key topic in computer vision for a long time. There is, indeed, an extensive list of algorithms and measures for image or subimage comparison. The superiority or inferiority of different measures is hard to scrutinize, especially considering the dimensionality of their parameter space and their many different configurations. In this work, we focus on the comparison of binary images, and study different variations of Baddeley's Delta Metric, a popular metric for such images. We study the possible parameterizations of the metric, stressing the numerical and behavioural impact of different settings. Specifically, we consider the parameter settings proposed by the original author, as well as the substitution of distance transformations by regularized distance transformations, as recently presented by Brunet and Sills. We take a qualitative perspective on the effects of the settings, and also perform quantitative experiments on separability of datasets for boundary evaluation.The authors gratefully acknowledge the financial support by the Spanish Ministry of Science (project PID2019-108392GB-I00 AEI/FEDER, UE), as well as that by Navarra Servicios y Tecnologías S.A. (NASERTIC)

    Shape/image registration for medical imaging : novel algorithms and applications.

    Get PDF
    This dissertation looks at two different categories of the registration approaches: Shape registration, and Image registration. It also considers the applications of these approaches into the medical imaging field. Shape registration is an important problem in computer vision, computer graphics and medical imaging. It has been handled in different manners in many applications like shapebased segmentation, shape recognition, and tracking. Image registration is the process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors. Many image processing applications like remote sensing, fusion of medical images, and computer-aided surgery need image registration. This study deals with two different applications in the field of medical image analysis. The first one is related to shape-based segmentation of the human vertebral bodies (VBs). The vertebra consists of the VB, spinous, and other anatomical regions. Spinous pedicles, and ribs should not be included in the bone mineral density (BMD) measurements. The VB segmentation is not an easy task since the ribs have similar gray level information. This dissertation investigates two different segmentation approaches. Both of them are obeying the variational shape-based segmentation frameworks. The first approach deals with two dimensional (2D) case. This segmentation approach starts with obtaining the initial segmentation using the intensity/spatial interaction models. Then, shape model is registered to the image domain. Finally, the optimal segmentation is obtained using the optimization of an energy functional which integrating the shape model with the intensity information. The second one is a 3D simultaneous segmentation and registration approach. The information of the intensity is handled by embedding a Willmore flow into the level set segmentation framework. Then the shape variations are estimated using a new distance probabilistic model. The experimental results show that the segmentation accuracy of the framework are much higher than other alternatives. Applications on BMD measurements of vertebral body are given to illustrate the accuracy of the proposed segmentation approach. The second application is related to the field of computer-aided surgery, specifically on ankle fusion surgery. The long-term goal of this work is to apply this technique to ankle fusion surgery to determine the proper size and orientation of the screws that are used for fusing the bones together. In addition, we try to localize the best bone region to fix these screws. To achieve these goals, the 2D-3D registration is introduced. The role of 2D-3D registration is to enhance the quality of the surgical procedure in terms of time and accuracy, and would greatly reduce the need for repeated surgeries; thus, saving the patients time, expense, and trauma

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance
    corecore