129,575 research outputs found
Representation Learning for Clustering: A Statistical Framework
We address the problem of communicating domain knowledge from a user to the
designer of a clustering algorithm. We propose a protocol in which the user
provides a clustering of a relatively small random sample of a data set. The
algorithm designer then uses that sample to come up with a data representation
under which -means clustering results in a clustering (of the full data set)
that is aligned with the user's clustering. We provide a formal statistical
model for analyzing the sample complexity of learning a clustering
representation with this paradigm. We then introduce a notion of capacity of a
class of possible representations, in the spirit of the VC-dimension, showing
that classes of representations that have finite such dimension can be
successfully learned with sample size error bounds, and end our discussion with
an analysis of that dimension for classes of representations induced by linear
embeddings.Comment: To be published in Proceedings of UAI 201
Low-Complexity Audio Embedding Extractors
Solving tasks such as speaker recognition, music classification, or semantic
audio event tagging with deep learning models typically requires
computationally demanding networks. General-purpose audio embeddings (GPAEs)
are dense representations of audio signals that allow lightweight, shallow
classifiers to tackle various audio tasks. The idea is that a single complex
feature extractor would extract dense GPAEs, while shallow MLPs can produce
task-specific predictions. If the extracted dense representations are general
enough to allow the simple downstream classifiers to generalize to a variety of
tasks in the audio domain, a single costly forward pass suffices to solve
multiple tasks in parallel. In this work, we try to reduce the cost of GPAE
extractors to make them suitable for resource-constrained devices. We use
efficient MobileNets trained on AudioSet using Knowledge Distillation from a
Transformer ensemble as efficient GPAE extractors. We explore how to obtain
high-quality GPAEs from the model, study how model complexity relates to the
quality of extracted GPAEs, and conclude that low-complexity models can
generate competitive GPAEs, paving the way for analyzing audio streams on edge
devices w.r.t. multiple audio classification and recognition tasks.Comment: In Proceedings of the 31st European Signal Processing Conference,
EUSIPCO 2023. Source Code available at:
https://github.com/fschmid56/EfficientAT_HEA
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
Neural models have become ubiquitous in automatic speech recognition systems.
While neural networks are typically used as acoustic models in more complex
systems, recent studies have explored end-to-end speech recognition systems
based on neural networks, which can be trained to directly predict text from
input acoustic features. Although such systems are conceptually elegant and
simpler than traditional systems, it is less obvious how to interpret the
trained models. In this work, we analyze the speech representations learned by
a deep end-to-end model that is based on convolutional and recurrent layers,
and trained with a connectionist temporal classification (CTC) loss. We use a
pre-trained model to generate frame-level features which are given to a
classifier that is trained on frame classification into phones. We evaluate
representations from different layers of the deep model and compare their
quality for predicting phone labels. Our experiments shed light on important
aspects of the end-to-end model such as layer depth, model complexity, and
other design choices.Comment: NIPS 201
Local Causal States and Discrete Coherent Structures
Coherent structures form spontaneously in nonlinear spatiotemporal systems
and are found at all spatial scales in natural phenomena from laboratory
hydrodynamic flows and chemical reactions to ocean, atmosphere, and planetary
climate dynamics. Phenomenologically, they appear as key components that
organize the macroscopic behaviors in such systems. Despite a century of
effort, they have eluded rigorous analysis and empirical prediction, with
progress being made only recently. As a step in this, we present a formal
theory of coherent structures in fully-discrete dynamical field theories. It
builds on the notion of structure introduced by computational mechanics,
generalizing it to a local spatiotemporal setting. The analysis' main tool
employs the \localstates, which are used to uncover a system's hidden
spatiotemporal symmetries and which identify coherent structures as
spatially-localized deviations from those symmetries. The approach is
behavior-driven in the sense that it does not rely on directly analyzing
spatiotemporal equations of motion, rather it considers only the spatiotemporal
fields a system generates. As such, it offers an unsupervised approach to
discover and describe coherent structures. We illustrate the approach by
analyzing coherent structures generated by elementary cellular automata,
comparing the results with an earlier, dynamic-invariant-set approach that
decomposes fields into domains, particles, and particle interactions.Comment: 27 pages, 10 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/dcs.ht
Components of cultural complexity relating to emotions: A conceptual framework
Many cultural variations in emotions have been documented in previous research, but a general theoretical framework involving cultural sources of these variations is still missing. The main goal of the present study was to determine what components of cultural complexity interact with the emotional experience and behavior of individuals. The proposed framework conceptually distinguishes five main components of cultural complexity relating to emotions: 1) emotion language, 2) conceptual knowledge about emotions, 3) emotion-related values, 4) feelings rules, i.e. norms for subjective experience, and 5) display rules, i.e. norms for emotional expression
Neural Graph Collaborative Filtering
Learning vector representations (aka. embeddings) of users and items lies at
the core of modern recommender systems. Ranging from early matrix factorization
to recently emerged deep learning based methods, existing efforts typically
obtain a user's (or an item's) embedding by mapping from pre-existing features
that describe the user (or the item), such as ID and attributes. We argue that
an inherent drawback of such methods is that, the collaborative signal, which
is latent in user-item interactions, is not encoded in the embedding process.
As such, the resultant embeddings may not be sufficient to capture the
collaborative filtering effect.
In this work, we propose to integrate the user-item interactions -- more
specifically the bipartite graph structure -- into the embedding process. We
develop a new recommendation framework Neural Graph Collaborative Filtering
(NGCF), which exploits the user-item graph structure by propagating embeddings
on it. This leads to the expressive modeling of high-order connectivity in
user-item graph, effectively injecting the collaborative signal into the
embedding process in an explicit manner. We conduct extensive experiments on
three public benchmarks, demonstrating significant improvements over several
state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further
analysis verifies the importance of embedding propagation for learning better
user and item representations, justifying the rationality and effectiveness of
NGCF. Codes are available at
https://github.com/xiangwang1223/neural_graph_collaborative_filtering.Comment: SIGIR 2019; the latest version of NGCF paper, which is distinct from
the version published in ACM Digital Librar
- …