5,402 research outputs found

    Linking Image and Text with 2-Way Nets

    Full text link
    Linking two data sources is a basic building block in numerous computer vision problems. Canonical Correlation Analysis (CCA) achieves this by utilizing a linear optimizer in order to maximize the correlation between the two views. Recent work makes use of non-linear models, including deep learning techniques, that optimize the CCA loss in some feature space. In this paper, we introduce a novel, bi-directional neural network architecture for the task of matching vectors from two data sources. Our approach employs two tied neural network channels that project the two views into a common, maximally correlated space using the Euclidean loss. We show a direct link between the correlation-based loss and Euclidean loss, enabling the use of Euclidean loss for correlation maximization. To overcome common Euclidean regression optimization problems, we modify well-known techniques to our problem, including batch normalization and dropout. We show state of the art results on a number of computer vision matching tasks including MNIST image matching and sentence-image matching on the Flickr8k, Flickr30k and COCO datasets.Comment: 14 pages, 2 figures, 6 table

    Scalable Nonlinear Embeddings for Semantic Category-based Image Retrieval

    Full text link
    We propose a novel algorithm for the task of supervised discriminative distance learning by nonlinearly embedding vectors into a low dimensional Euclidean space. We work in the challenging setting where supervision is with constraints on similar and dissimilar pairs while training. The proposed method is derived by an approximate kernelization of a linear Mahalanobis-like distance metric learning algorithm and can also be seen as a kernel neural network. The number of model parameters and test time evaluation complexity of the proposed method are O(dD) where D is the dimensionality of the input features and d is the dimension of the projection space - this is in contrast to the usual kernelization methods as, unlike them, the complexity does not scale linearly with the number of training examples. We propose a stochastic gradient based learning algorithm which makes the method scalable (w.r.t. the number of training examples), while being nonlinear. We train the method with up to half a million training pairs of 4096 dimensional CNN features. We give empirical comparisons with relevant baselines on seven challenging datasets for the task of low dimensional semantic category based image retrieval.Comment: ICCV 2015 preprin

    An analysis of word embedding spaces and regularities

    Get PDF
    Word embeddings are widely use in several applications due to their ability to capture semantic relationships between words as relations between vectors in high dimensional spaces. One of the main problems to obtain the information is to deal with the phenomena known as the Curse of Dimensionality, the fact that some intuitive results for well known distances are not valid in high dimensional contexts. In this thesis we explore the problem to distinguish between synonyms or antonyms pairs of words and non-related pairs of words attending just to the distance between the words of the pair. We considerer several norms and explore the problem in the two principal kinds of embeddings, GloVe and Word2Vec
    corecore