96,889 research outputs found
A Note on Clustering Aggregation
We consider the clustering aggregation problem in which we are given a set of
clusterings and want to find an aggregated clustering which minimizes the sum
of mismatches to the input clusterings. In the binary case (each clustering is
a bipartition) this problem was known to be NP-hard under Turing reduction. We
strengthen this result by providing a polynomial-time many-one reduction. Our
result also implies that no -time algorithm exists
for any clustering instance with elements, unless the Exponential Time
Hypothesis fails. On the positive side, we show that the problem is
fixed-parameter tractable with respect to the number of input clusterings
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
A Multiscale Approach for Statistical Characterization of Functional Images
Increasingly, scientific studies yield functional image data, in which the observed data consist of sets of curves recorded on the pixels of the image. Examples include temporal brain response intensities measured by fMRI and NMR frequency spectra measured at each pixel. This article presents a new methodology for improving the characterization of pixels in functional imaging, formulated as a spatial curve clustering problem. Our method operates on curves as a unit. It is nonparametric and involves multiple stages: (i) wavelet thresholding, aggregation, and Neyman truncation to effectively reduce dimensionality; (ii) clustering based on an extended EM algorithm; and (iii) multiscale penalized dyadic partitioning to create a spatial segmentation. We motivate the different stages with theoretical considerations and arguments, and illustrate the overall procedure on simulated and real datasets. Our method appears to offer substantial improvements over monoscale pixel-wise methods. An Appendix which gives some theoretical justifications of the methodology, computer code, documentation and dataset are available in the online supplements
Optimization for L1-Norm Error Fitting via Data Aggregation
We propose a data aggregation-based algorithm with monotonic convergence to a
global optimum for a generalized version of the L1-norm error fitting model
with an assumption of the fitting function. The proposed algorithm generalizes
the recent algorithm in the literature, aggregate and iterative disaggregate
(AID), which selectively solves three specific L1-norm error fitting problems.
With the proposed algorithm, any L1-norm error fitting model can be solved
optimally if it follows the form of the L1-norm error fitting problem and if
the fitting function satisfies the assumption. The proposed algorithm can also
solve multi-dimensional fitting problems with arbitrary constraints on the
fitting coefficients matrix. The generalized problem includes popular models
such as regression and the orthogonal Procrustes problem. The results of the
computational experiment show that the proposed algorithms are faster than the
state-of-the-art benchmarks for L1-norm regression subset selection and L1-norm
regression over a sphere. Further, the relative performance of the proposed
algorithm improves as data size increases
- …