1,130 research outputs found

    Feature extraction using MPEG-CDVS and Deep Learning with application to robotic navigation and image classification

    Get PDF
    The main contributions of this thesis are the evaluation of MPEG Compact Descriptor for Visual Search in the context of indoor robotic navigation and the introduction of a new method for training Convolutional Neural Networks with applications to object classification. The choice for image descriptor in a visual navigation system is not straightforward. Visual descriptors must be distinctive enough to allow for correct localisation while still offering low matching complexity and short descriptor size for real-time applications. MPEG Compact Descriptor for Visual Search is a low complexity image descriptor that offers several levels of compromises between descriptor distinctiveness and size. In this work, we describe how these trade-offs can be used for efficient loop-detection in a typical indoor environment. We first describe a probabilistic approach to loop detection based on the standard’s suggested similarity metric. We then evaluate the performance of CDVS compression modes in terms of matching speed, feature extraction, and storage requirements and compare them with the state of the art SIFT descriptor for five different types of indoor floors. During the second part of this thesis we focus on the new paradigm to machine learning and computer vision called Deep Learning. Under this paradigm visual features are no longer extracted using fine-grained, highly engineered feature extractor, but rather using a Convolutional Neural Networks (CNN) that extracts hierarchical features learned directly from data at the cost of long training periods. In this context, we propose a method for speeding up the training of Convolutional Neural Networks (CNN) by exploiting the spatial scaling property of convolutions. This is done by first training a pre-train CNN of smaller kernel resolutions for a few epochs, followed by properly rescaling its kernels to the target’s original dimensions and continuing training at full resolution. We show that the overall training time of a target CNN architecture can be reduced by exploiting the spatial scaling property of convolutions during early stages of learning. Moreover, by rescaling the kernels at different epochs, we identify a trade-off between total training time and maximum obtainable accuracy. Finally, we propose a method for choosing when to rescale kernels and evaluate our approach on recent architectures showing savings in training times of nearly 20% while test set accuracy is preserved

    Pattern matching of footwear Impressions

    Get PDF
    One of the most frequently secured types of evidence at crime scenes are footware impressions. Identifying the brand and model of the footware can be crucial to narrowing the search for suspects. This is done by forensic experts by comparing the evidence found at the crime scene with a huge list of reference impressions. In order to support the forensic experts an automatic retrieval of the most likely matches is desired.In this thesis different techniques are evaluated to recognize and match footwear impressions, using reference and real crime scene shoeprint images. Due to the conditions in which the shoeprints are found (partial occlusions, variation in shape) a translation, rotation and scale invariant system is needed. A VLAD (Vector of Locally Aggregated Descriptors) encoder is used to clustering descriptors obtained using different approaches, such as SIFT (Scale-Invariant Feature Transform), Dense SIFT in a Triplet CNN (Convolutional Neural Network). These last two approaches provide the best performance results when the parameters are correctly adjusted, using the Cumulative Matching Characteristic curve to evaluate it.En esta tesis se evalúan diferentes técnicas para reconocer y emparejar impresiones de calzado, utilizando imágenes de referencia y de escenas reales de crimen. Debido a las condiciones en que se encuentran las impresiones (oclusiones parciales, variaciones de forma) se necesita un sistema invariante ante translación, rotación y escalado. Para ello se utiliza un codificador VLAD (Vector of Locally Aggregated Descriptors) para agrupar descriptores obtenidos en diferentes enfoques, como SIFT (Scale-Invariant Feature Transform), Dense SIFT y Triplet CNN (Convolutional Neural Network). Estos dos últimos enfoques proporcionan los mejores resultados una vez los parámetros se han ajustado correctamente, utilizando la curva CMC (Characteristic Matching Curve) para realizar la evaluación.En aquesta tesi s'avaluen diferents tècniques per reconèixer i aparellar impressions de calçat, utilitzant imatges de referència i d'escenes reals de crim. Degut a les condicions en què es troben les impressions (oclusions parcials, variació de forma ) es necessita un sistema invariant davant translació, rotació i escalat. Per això s'utilitza un codificador VLAD (Vector of Locally Aggregated Descriptors) per agrupar descriptors obtinguts en diferents enfocaments, com SIFT (Scale-Invariant Feature Transform), Dense SIFT i Triplet CNN (Convolutional Neural Network). Aquests dos últims enfocaments proporcionen els millors resultats un cop els paràmetres s'han ajustat correctament, utilitzant la corba CMC (Characteristic Matching Curve) per realitzar l'avaluació
    • …
    corecore