15 research outputs found

    Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

    Get PDF
    Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Object classification is carried out via inference on the learnt conditional tree model, and we obtain significant gain in precision-recall and F-measures on MS-COCO, especially for difficult object categories. Moreover, the latent variables in the CLTM capture scene information: the images with top activations for a latent node have common themes such as being a grasslands or a food scene, and on on. In addition, we show that a simple k-means clustering of the inferred latent nodes alone significantly improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining, and without using scene labels during training. Thus, we present a unified framework for multi-object classification and unsupervised scene understanding

    Contextual relabelling of detected objects

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordContextual information, such as the co-occurrence of objects and the spatial and relative size among objects provides deep and complex information about scenes. It also can play an important role in improving object detection. In this work, we present two contextual models (rescoring and re-labeling models) that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the state-of-the-art RCNN-based object detection (Faster RCNN). We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO)

    Re-identification by Covariance Descriptors

    Get PDF
    International audienceThis chapter addresses the problem of appearance matching, while employing the covariance descriptor. We tackle the extremely challenging case in which the same non-rigid object has to be matched across disjoint camera views. Covariance statistics averaged over a Riemannian manifold are fundamental for designing appearance models invariant to camera changes. We discuss different ways of extracting an object appearance by incorporating various training strategies. Appearance matching is enhanced either by discriminative analysis using images from a single camera or by selecting distinctive features in a covariance metric space employing data from two cameras. By selecting only essential features for a specific class of objects (\textit{e.g.} humans) without defining \textit{a priori} feature vector for extracting covariance, we remove redundancy from the covariance descriptor and ensure low computational cost. Using a feature selection technique instead of learning on a manifold, we avoid the over-fitting problem. The proposed models have been successfully applied to the person re-identification task in which a human appearance has to be matched across non-overlapping cameras. We carry out detailed experiments of the suggested strategies, demonstrating their pros and cons \textit{w.r.t.} recognition rate and suitability to video analytics systems

    Improving object detection performance using scene contextual constraints

    Get PDF
    Contextual information, such as the co-occurrence of objects and the spatial and relative size among objects, provides rich and complex information about digital scenes. It also plays an important role in improving object detection and determining out-of-context objects. In this work, we present contextual models that leverage contextual information (16 contextual relationships are applied in this paper) to enhance the performance of two of the state-of-the-art object detectors (i.e., Faster RCNN and YOLO), which are applied as a post-processing process for most of the existing detectors, especially for refining the confidences and associated categorical labels, without refining bounding boxes. We experimentally demonstrate that our models lead to enhancement in detection performance using the most common dataset used in this field (MSCOCO), where in some experiments PASCAL2012 is also used.We also show that iterating the process of applying our contextual models also enhances the detection performance further

    Person Re-identification by Articulated Appearance Matching

    Full text link
    Abstract Re-identification of pedestrians in video-surveillance settings can be ef-fectively approached by treating each human figure as an articulated body, whose pose is estimated through the framework of Pictorial Structures (PS). In this way, we can focus selectively on similarities between the appearance of body parts to recognize a previously seen individual. In fact, this strategy resembles what humans employ to solve the same task in the absence of facial details or other reliable bio-metric information. Based on these insights, we show how to perform single image re-identification by matching signatures coming from articulated appearances, and how to strengthen this process in multi-shot re-identification by using Custom Picto-rial Structures (CPS) to produce improved body localizations and appearance signa-tures. Moreover, we provide a complete and detailed breakdown of the system that surrounds these core procedures, with several novel arrangements devised for effi-ciency and flexibility. Finally, we test our approach on several public benchmarks, obtaining convincing results.

    Convolutional Networks for Historic Text Recognition

    Get PDF
    Táto práca sa zaoberá rozpoznávaním riadkov z historických textov. Historické texty pochádzajú z obdobia od 17. až 19 storočia a sú napísané pomocou fraktúry. Pri rozpoznávaní písma sa používa architektúra neurónovej siete zvaná sequence-to-sequence . Táto architektúra vychádza z modelu kodér-dekodér a používa mechanizmus attention . V rámci práce bola z textov, pochádzajúcich z archívu Deutsches Textarchiv , vytvorená dátová sada. Tento archív obsahuje 3 897 rôznych nemeckých diel, ku ktorým sú dostupné snímky strán a ich prepisy. Vytvorená dátová sada sa následne používa pri trénovaní a experimentovaní s neurónovou sieťou. V rámci experimentov sú skúmané rôzne modely konvolučných sietí, vplyv hyperparametrov siete a účinok pozičného kódovania na výsledky rozpoznávania. Výsledný model dokáže rozpoznať znaky s presnosťou 99,63 %. Prínosom tejto práce je spomínaná dátová sada a neurónová sieť, ktorá sa môže použivať pri rozpoznávaní historických dokumentov.This thesis deals with text line recognition of historical documents. Historical texts dating back to the 17th - 19th centuries are written in fraktur typeface. The character recognition problem is solved using neural network architecture called sequence-to-sequence . This architecture is based on encoder-decoder model and contains attention mechanism. In this thesis a dataset, from texts originated from German archiv called Deutsches Textarchiv , was created. This archive contains 3 897 different German books that have available transcripts and corresponding images of pages. The created dataset was used to train and experiment with the proposed neural network. During the experiments, several convolutional models, hyperparameters and the effects of positional embedding were investigated. The final tool can recognize characters with accuracy 99,63 %. The contribution of this work is the~mentioned dataset and neural network, which can be used to recognize historical documents.
    corecore