27,597 research outputs found
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
Differentially Private Mixture of Generative Neural Networks
Generative models are used in a wide range of applications building on large
amounts of contextually rich information. Due to possible privacy violations of
the individuals whose data is used to train these models, however, publishing
or sharing generative models is not always viable. In this paper, we present a
novel technique for privately releasing generative models and entire
high-dimensional datasets produced by these models. We model the generator
distribution of the training data with a mixture of generative neural
networks. These are trained together and collectively learn the generator
distribution of a dataset. Data is divided into clusters, using a novel
differentially private kernel -means, then each cluster is given to separate
generative neural networks, such as Restricted Boltzmann Machines or
Variational Autoencoders, which are trained only on their own cluster using
differentially private gradient descent. We evaluate our approach using the
MNIST dataset, as well as call detail records and transit datasets, showing
that it produces realistic synthetic samples, which can also be used to
accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE
International Conference on Data Mining (ICDM 2017). This is the full
version, published in IEEE Transactions on Knowledge and Data Engineering
(TKDE
Saliency-guided integration of multiple scans
we present a novel method..
- …