7 research outputs found

    GPU deformable part model for object recognition

    No full text
    International audienceWe consider the problem of rapidly detecting objects in static images or videos. The task consists in locating and identifying objects of interest. With the progress of affordable high computing hardware, we propose to analyse and evaluate the deformable part model on the Graphics Processing Unit. We do not take any prior assumptions on the scene and location of the objects. We provide a fast implementation and analyse the different modules of the state-of-the-art detector. Our implementation allows to accelerate both training and testing. While maintaining comparable classification performance, we report a speed-up of x 10.6 using a standard GPU card compared to a baseline implemented in C++ on a single core and x 5 compared to a multi-core OpenMP (8 threads) implementation

    Vision-language integration using constrained local semantic features

    No full text
    International audienceThis paper tackles two recent promising issues in the field of computer vision, namely "the integration of linguistic and visual information'' and "the use of semantic features to represent the image content''. Semantic features represent images according to some visual concepts that are detected into the image by a set of base classifiers. Recent works exhibit competitive performances in image classification and retrieval using such features. We propose to rely on this type of image descriptions to facilitate its integration with linguistic data. More precisely, the contribution of this paper is threefold. First, we propose to automatically determine the most useful dimensions of a semantic representation according to the actual image content. Hence, it results into a level of sparsity for the semantic features that is adapted to each image independently. Our model takes into account both the confidence on each base classifier and the global amount of information of the semantic signature, defined in the Shannon sense. This contribution is further extended to better reflect the detection of a visual concept at a local scale. Second, we introduce a new strategy to learn an efficient mid-level representation by CNNs that boosts the performance of semantic signatures. Last, we propose several schemes to integrate a visual representation based on semantic features with some linguistic piece of information, leading to the nesting of linguistic information at two levels of the visual features. Experimental validation is conducted on four benchmarks (VOC 2007, VOC 2012, Nus-Wide and MIT Indoor) for classification, three of them for retrieval and two of them for bi-modal classification. The proposed semantic feature achieves state-of-the-art performances on three classification benchmarks and all retrieval ones. Regarding our vision-language integration method, it achieves state-of-the-art performances in bi-modal classification

    GPU deformable part model for object recognition

    No full text
    corecore