1,295 research outputs found

    Learning Transformation-Invariant Local Descriptors With Low-Coupling Binary Codes

    Get PDF
    Despite the great success achieved by prevailing binary local descriptors, they are still suffering from two problems: 1) vulnerable to the geometric transformations; 2) lack of an effective treatment to the highly-correlated bits that are generated by directly applying the scheme of image hashing. To tackle both limitations, we propose an unsupervised Transformation-invariant Binary Local Descriptor learning method (TBLD). Specifically, the transformation invariance of binary local descriptors is ensured by projecting the original patches and their transformed counterparts into an identical high-dimensional feature space and an identical low-dimensional descriptor space simultaneously. Meanwhile, it enforces the dissimilar image patches to have distinctive binary local descriptors. Moreover, to reduce high correlations between bits, we propose a bottom-up learning strategy, termed Adversarial Constraint Module , where low-coupling binary codes are introduced externally to guide the learning of binary local descriptors. With the aid of the Wasserstein loss, the framework is optimized to encourage the distribution of the generated binary local descriptors to mimic that of the introduced low-coupling binary codes, eventually making the former more low-coupling. Experimental results on three benchmark datasets well demonstrate the superiority of the proposed method over the state-of-the-art methods

    On Aggregation of Unsupervised Deep Binary Descriptor with Weak Bits

    Get PDF
    Despite the thrilling success achieved by existing binary descriptors, most of them are still in the mire of three limitations: 1) vulnerable to the geometric transformations; 2) incapable of preserving the manifold structure when learning binary codes; 3) NO guarantee to find the true match if multiple candidates happen to have the same Hamming distance to a given query. All these together make the binary descriptor less effective, given large-scale visual recognition tasks. In this paper, we propose a novel learning-based feature descriptor, namely Unsupervised Deep Binary Descriptor (UDBD), which learns transformation invariant binary descriptors via projecting the original data and their transformed sets into a joint binary space. Moreover, we involve a ℓ2,1-norm loss term in the binary embedding process to gain simultaneously the robustness against data noises and less probability of mistakenly flipping bits of the binary descriptor, on top of it, a graph constraint is used to preserve the original manifold structure in the binary space. Furthermore, a weak bit mechanism is adopted to find the real match from candidates sharing the same minimum Hamming distance, thus enhancing matching performance. Extensive experimental results on public datasets show the superiority of UDBD in terms of matching and retrieval accuracy over state-of-the-arts

    Deep Metric Learning Meets Deep Clustering: An Novel Unsupervised Approach for Feature Embedding

    Full text link
    Unsupervised Deep Distance Metric Learning (UDML) aims to learn sample similarities in the embedding space from an unlabeled dataset. Traditional UDML methods usually use the triplet loss or pairwise loss which requires the mining of positive and negative samples w.r.t. anchor data points. This is, however, challenging in an unsupervised setting as the label information is not available. In this paper, we propose a new UDML method that overcomes that challenge. In particular, we propose to use a deep clustering loss to learn centroids, i.e., pseudo labels, that represent semantic classes. During learning, these centroids are also used to reconstruct the input samples. It hence ensures the representativeness of centroids - each centroid represents visually similar samples. Therefore, the centroids give information about positive (visually similar) and negative (visually dissimilar) samples. Based on pseudo labels, we propose a novel unsupervised metric loss which enforces the positive concentration and negative separation of samples in the embedding space. Experimental results on benchmarking datasets show that the proposed approach outperforms other UDML methods.Comment: Accepted in BMVC 202

    Combining local features and region segmentation: methods and applications

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 23-01-2020Esta tesis tiene embargado el acceso al texto completo hasta el 23-07-2021Muchas y muy diferentes son las propuestas que se han desarrollado en el área de la visión artificial para la extracción de información de las imágenes y su posterior uso. Entra las más destacadas se encuentran las conocidas como características locales, del inglés local features, que detectan puntos o áreas de la imagen con ciertas características de interés, y las describen usando información de su entorno (local). También destacan las regiones en este área, y en especial este trabajo se ha centrado en los segmentadores en regiones, cuyo objetivo es agrupar la información de la imagen atendiendo a diversos criterios. Pese al enorme potencial de estas técnicas, y su probado éxito en diversas aplicaciones, su definición lleva implícita una serie de limitaciones funcionales que les han impedido exportar sus capacidades a otras áreas de aplicación. Se pretende impulsar el uso de estas herramientas en dichas aplicaciones, y por tanto mejorar los resultados del estado del arte, mediante la propuesta de un marco de desarrollo de nuevas soluciones. En concreto, la hipótesis principal del proyecto es que las capacidades de las características locales y los segmentadores en regiones son complementarias, y que su combinación, realizada de la forma adecuada, las maximiza a la vez que minimiza sus limitaciones. El principal objetivo, y por tanto la principal contribución del proyecto, es validar dicha hipótesis mediante la propuesta de un marco de desarrollo de nuevas soluciones combinando características locales y segmentadores para técnicas con capacidades mejoradas. Al tratarse de un marco de combinación de dos técnicas, el proceso de validación se ha llevado a cabo en dos pasos. En primer lugar se ha planteado el caso del uso de segmentadores en regiones para mejorar las características locales. Para verificar la viabilidad y el éxito de esta combinación se ha desarrollado una propuesta específica, SP-SIFT, que se ha validado tanto a nivel experimental como a nivel de aplicación real, en concreto como técnica principal de algoritmos de seguimiento de objetos. En segundo lugar, se ha planteado el caso de uso de características locales para mejorar los segmentadores en regiones. Para verificar la viabilidad y el éxito de esta combinación se ha desarrollado una propuesta específica, LF-SLIC, que se ha validado tanto a nivel experimental como a nivel de aplicación real, en concreto como técnica principal de un algoritmo de segmentación de lesiones pigmentadas de la piel. Los resultados conceptuales han probado que las técnicas mejoran a nivel de capacidades. Los resultados aplicados han probado que estas mejoras permiten el uso de estas técnicas en aplicaciones donde antes no tenían éxito. Con ello, se ha considerado la hipótesis validada, y por tanto exitosa la definición de un marco para el desarrollo de nuevas técnicas específicas con capacidades mejoradas. En conclusión, la principal aportación de la tesis es el marco de combinación de técnicas, plasmada en sus dos propuestas específicas: características locales mejoradas con segmentadores y segmentadores mejorados con características locales, y en el éxito conseguido en sus aplicaciones.A huge number of proposals have been developed in the area of computer vision for information extraction from images, and its further use. One of the most prevalent solutions are those known as local features. They detect points or areas of the image with certain characteristics of interest, and describe them using information from their (local) environment. The regions also stand out in the area, and especially this work has focused on the region segmentation algorithms, whose objective is to group the information of the image according to di erent criteria. Despite the enormous potential of these techniques, and their proven success in a number of applications, their de nition implies a series of functional limitations that have prevented them from exporting their capabilities to other application areas. In this thesis, it is intended to promote the use of these tools in these applications, and therefore improve the results of the state of the art, by proposing a framework for developing new solutions. Speci cally, the main hypothesis of the project is that the capacities of the local features and the region segmentation algorithms are complementary, and thus their combination, carried out in the right way, maximizes them while minimizing their limitations. The main objective, and therefore the main contribution of the thesis, is to validate this hypothesis by proposing a framework for developing new solutions combining local features and region segmentation algorithms, obtaining solutions with improved capabilities. As the hypothesis is proposing to combine two techniques, the validation process has been carried out in two steps. First, the use case of region segmentation algorithms enhancing local features. In order to verify the viability and success of this combination, a speci c proposal, SP-SIFT, was been developed. This proposal was validated both experimentally and in a real application scenario, speci cally as the main technique of object tracking algorithms. Second, the use case of enhancing region segmentation algorithm with local features. In order to verify the viability and success of this combination, a speci c proposal, LF-SLIC, was developed. The proposal was validated both experimentally and in a real application scenario, speci cally as the main technique of a pigmented skin lesions segmentation algorithm. The conceptual results proved that the techniques improve at the capabilities level. The application results proved that these improvements allow the use of this techniques in applications where they were previously unsuccessful. Thus, the hypothesis can be considered validated, and therefore the de nition of a framework for the development of new techniques with improved capabilities can be considered successful. In conclusion, the main contribution of the thesis is the framework for the combination of techniques, embodied in the two speci c proposals: enhanced local features with region segmentation algorithms, and region segmentation algorithms enhanced with local features; and in the success achieved in their applications.The work described in this Thesis was carried out within the Video Processing and Understanding Lab at the Department of Tecnología Electrónica y de las Comunicaciones, Escuela Politécnica Superior, Universidad Autónoma de Madrid (from 2014 to 2019). It was partially supported by the Spanish Government (TEC2014-53176-R, HAVideo)

    Representing Input Transformations by Low-Dimensional Parameter Subspaces

    Full text link
    Deep models lack robustness to simple input transformations such as rotation, scaling, and translation, unless they feature a particular invariant architecture or undergo specific training, e.g., learning the desired robustness from data augmentations. Alternatively, input transformations can be treated as a domain shift problem, and solved by post-deployment model adaptation. Although a large number of methods deal with transformed inputs, the fundamental relation between input transformations and optimal model weights is unknown. In this paper, we put forward the configuration subspace hypothesis that model weights optimal for parameterized continuous transformations can reside in low-dimensional linear subspaces. We introduce subspace-configurable networks to learn these subspaces and observe their structure and surprisingly low dimensionality on all tested transformations, datasets and architectures from computer vision and audio signal processing domains. Our findings enable efficient model reconfiguration, especially when limited storage and computing resources are at stake