1,166 research outputs found
Hierarchically Clustered Representation Learning
The joint optimization of representation learning and clustering in the
embedding space has experienced a breakthrough in recent years. In spite of the
advance, clustering with representation learning has been limited to flat-level
categories, which often involves cohesive clustering with a focus on instance
relations. To overcome the limitations of flat clustering, we introduce
hierarchically-clustered representation learning (HCRL), which simultaneously
optimizes representation learning and hierarchical clustering in the embedding
space. Compared with a few prior works, HCRL firstly attempts to consider a
generation of deep embeddings from every component of the hierarchy, not just
leaf components. In addition to obtaining hierarchically clustered embeddings,
we can reconstruct data by the various abstraction levels, infer the intrinsic
hierarchical structure, and learn the level-proportion features. We conducted
evaluations with image and text domains, and our quantitative analyses showed
competent likelihoods and the best accuracies compared with the baselines.Comment: 10 pages, 7 figures, Under review as a conference pape
Combining Sentiment Lexica with a Multi-View Variational Autoencoder
When assigning quantitative labels to a dataset, different methodologies may
rely on different scales. In particular, when assigning polarities to words in
a sentiment lexicon, annotators may use binary, categorical, or continuous
labels. Naturally, it is of interest to unify these labels from disparate
scales to both achieve maximal coverage over words and to create a single, more
robust sentiment lexicon while retaining scale coherence. We introduce a
generative model of sentiment lexica to combine disparate scales into a common
latent representation. We realize this model with a novel multi-view
variational autoencoder (VAE), called SentiVAE. We evaluate our approach via a
downstream text classification task involving nine English-Language sentiment
analysis datasets; our representation outperforms six individual sentiment
lexica, as well as a straightforward combination thereof.Comment: To appear in NAACL-HLT 201
- …