1,117 research outputs found
Conditional Random Field Autoencoders for Unsupervised Structured Prediction
We introduce a framework for unsupervised learning of structured predictors
with overlapping, global features. Each input's latent representation is
predicted conditional on the observable data using a feature-rich conditional
random field. Then a reconstruction of the input is (re)generated, conditional
on the latent structure, using models for which maximum likelihood estimation
has a closed-form. Our autoencoder formulation enables efficient learning
without making unrealistic independence assumptions or restricting the kinds of
features that can be used. We illustrate insightful connections to traditional
autoencoders, posterior regularization and multi-view learning. We show
competitive results with instantiations of the model for two canonical NLP
tasks: part-of-speech induction and bitext word alignment, and show that
training our model can be substantially more efficient than comparable
feature-rich baselines
Audio self-supervised learning: a survey
Inspired by the humans' cognitive ability to generalise knowledge and skills,
Self-Supervised Learning (SSL) targets at discovering general representations
from large-scale data without requiring human annotations, which is an
expensive and time consuming task. Its success in the fields of computer vision
and natural language processing have prompted its recent adoption into the
field of audio and speech processing. Comprehensive reviews summarising the
knowledge in audio SSL are currently missing. To fill this gap, in the present
work, we provide an overview of the SSL methods used for audio and speech
processing applications. Herein, we also summarise the empirical works that
exploit the audio modality in multi-modal SSL frameworks, and the existing
suitable benchmarks to evaluate the power of SSL in the computer audition
domain. Finally, we discuss some open problems and point out the future
directions on the development of audio SSL
Multi-modal joint embedding for fashion product retrieval
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem, akin to finding a needle in a haystack. In this paper, we leverage both the images and textual meta-data and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space, which is both efficient and accurate. We train this embedding using large-scale real world e-commerce data by both minimizing the similarity between related products and using auxiliary classification networks to that encourage the embedding to have semantic meaning. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset. We also provide an analysis of the different metadata.Peer ReviewedPostprint (author's final draft
Deep supervised learning using local errors
Error backpropagation is a highly effective mechanism for learning
high-quality hierarchical features in deep networks. Updating the features or
weights in one layer, however, requires waiting for the propagation of error
signals from higher layers. Learning using delayed and non-local errors makes
it hard to reconcile backpropagation with the learning mechanisms observed in
biological neural networks as it requires the neurons to maintain a memory of
the input long enough until the higher-layer errors arrive. In this paper, we
propose an alternative learning mechanism where errors are generated locally in
each layer using fixed, random auxiliary classifiers. Lower layers could thus
be trained independently of higher layers and training could either proceed
layer by layer, or simultaneously in all layers using local error information.
We address biological plausibility concerns such as weight symmetry
requirements and show that the proposed learning mechanism based on fixed,
broad, and random tuning of each neuron to the classification categories
outperforms the biologically-motivated feedback alignment learning technique on
the MNIST, CIFAR10, and SVHN datasets, approaching the performance of standard
backpropagation. Our approach highlights a potential biological mechanism for
the supervised, or task-dependent, learning of feature hierarchies. In
addition, we show that it is well suited for learning deep networks in custom
hardware where it can drastically reduce memory traffic and data communication
overheads
- …