20,865 research outputs found
SOM-VAE: Interpretable Discrete Representation Learning on Time Series
High-dimensional time series are common in many domains. Since human
cognition is not optimized to work well in high-dimensional spaces, these areas
could benefit from interpretable low-dimensional representations. However, most
representation learning algorithms for time series data are difficult to
interpret. This is due to non-intuitive mappings from data features to salient
properties of the representation and non-smoothness over time. To address this
problem, we propose a new representation learning framework building on ideas
from interpretable discrete dimensionality reduction and deep generative
modeling. This framework allows us to learn discrete representations of time
series, which give rise to smooth and interpretable embeddings with superior
clustering performance. We introduce a new way to overcome the
non-differentiability in discrete representation learning and present a
gradient-based version of the traditional self-organizing map algorithm that is
more performant than the original. Furthermore, to allow for a probabilistic
interpretation of our method, we integrate a Markov model in the representation
space. This model uncovers the temporal transition structure, improves
clustering performance even further and provides additional explanatory
insights as well as a natural representation of uncertainty. We evaluate our
model in terms of clustering performance and interpretability on static
(Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST
images, a chaotic Lorenz attractor system with two macro states, as well as on
a challenging real world medical time series application on the eICU data set.
Our learned representations compare favorably with competitor methods and
facilitate downstream tasks on the real world data.Comment: Accepted for publication at the Seventh International Conference on
Learning Representations (ICLR 2019
Winner-relaxing and winner-enhancing Kohonen maps: Maximal mutual information from enhancing the winner
The magnification behaviour of a generalized family of self-organizing
feature maps, the Winner Relaxing and Winner Enhancing Kohonen algorithms is
analyzed by the magnification law in the one-dimensional case, which can be
obtained analytically. The Winner-Enhancing case allows to acheive a
magnification exponent of one and therefore provides optimal mapping in the
sense of information theory. A numerical verification of the magnification law
is included, and the ordering behaviour is analyzed. Compared to the original
Self-Organizing Map and some other approaches, the generalized Winner Enforcing
Algorithm requires minimal extra computations per learning step and is
conveniently easy to implement.Comment: 6 pages, 5 figures. For an extended version refer to cond-mat/0208414
(Neural Computation 17, 996-1009
Winner-Relaxing Self-Organizing Maps
A new family of self-organizing maps, the Winner-Relaxing Kohonen Algorithm,
is introduced as a generalization of a variant given by Kohonen in 1991. The
magnification behaviour is calculated analytically. For the original variant a
magnification exponent of 4/7 is derived; the generalized version allows to
steer the magnification in the wide range from exponent 1/2 to 1 in the
one-dimensional case, thus provides optimal mapping in the sense of information
theory. The Winner Relaxing Algorithm requires minimal extra computations per
learning step and is conveniently easy to implement.Comment: 14 pages (6 figs included). To appear in Neural Computatio
Probabilistic estimation of microarray data reliability and underlying gene expression
Background: The availability of high throughput methods for measurement of
mRNA concentrations makes the reliability of conclusions drawn from the data
and global quality control of samples and hybridization important issues. We
address these issues by an information theoretic approach, applied to
discretized expression values in replicated gene expression data.
Results: Our approach yields a quantitative measure of two important
parameter classes: First, the probability that a gene is in the
biological state in a certain variety, given its observed expression
in the samples of that variety. Second, sample specific error probabilities
which serve as consistency indicators of the measured samples of each variety.
The method and its limitations are tested on gene expression data for
developing murine B-cells and a -test is used as reference. On a set of
known genes it performs better than the -test despite the crude
discretization into only two expression levels. The consistency indicators,
i.e. the error probabilities, correlate well with variations in the biological
material and thus prove efficient.
Conclusions: The proposed method is effective in determining differential
gene expression and sample reliability in replicated microarray data. Already
at two discrete expression levels in each sample, it gives a good explanation
of the data and is comparable to standard techniques.Comment: 11 pages, 4 figure
Background modeling by shifted tilings of stacked denoising autoencoders
The effective processing of visual data without interruption is currently of supreme importance. For that purpose, the analysis system must adapt to events that may affect the data quality and maintain its performance level over time. A methodology for background modeling and foreground detection, whose main characteristic is its robustness against stationary noise, is presented in the paper. The system is based on a stacked denoising autoencoder which extracts a set of significant features for each patch of several shifted tilings of the video frame. A probabilistic model for each patch is learned. The distinct patches which include a particular pixel are considered for that pixel classification. The experiments show that classical methods existing in the literature experience drastic performance drops when noise is present in the video sequences, whereas the proposed one seems to be slightly affected. This fact corroborates the idea of robustness of our proposal, in addition to its usefulness for the processing and analysis of continuous data during uninterrupted periods of time.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Minimum Energy Information Fusion in Sensor Networks
In this paper we consider how to organize the sharing of information in a
distributed network of sensors and data processors so as to provide
explanations for sensor readings with minimal expenditure of energy. We point
out that the Minimum Description Length principle provides an approach to
information fusion that is more naturally suited to energy minimization than
traditional Bayesian approaches. In addition we show that for networks
consisting of a large number of identical sensors Kohonen self-organization
provides an exact solution to the problem of combining the sensor outputs into
minimal description length explanations.Comment: postscript, 8 pages. Paper 65 in Proceedings of The 2nd International
Conference on Information Fusio
Measuring concept similarities in multimedia ontologies: analysis and evaluations
The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing
- …