252 research outputs found
EFFICIENT APPROXIMATION FOR LARGE-SCALE KERNEL CLUSTERING ANALYSIS
Kernel k-means is useful for performing clustering on nonlinearly separable data. The kernel k-means is hard to scale to large data due to the quadratic complexity. In this paper, we propose an approach which utilizes the low-dimensional feature approximation of the Gaussian kernel function to capitalize a fast linear k-means solver to perform the nonlinear kernel k-means. This approach takes advantage of the efficiency of the linear solver and the nonlinear partitioning ability of the kernel clustering. The experimental results show that the proposed approach is much more efficient than a normal kernel k- means solver and achieves similar clustering performance
Selective inference after convex clustering with penalization
Classical inference methods notoriously fail when applied to data-driven test
hypotheses or inference targets. Instead, dedicated methodologies are required
to obtain statistical guarantees for these selective inference problems.
Selective inference is particularly relevant post-clustering, typically when
testing a difference in mean between two clusters. In this paper, we address
convex clustering with penalization, by leveraging related selective
inference tools for regression, based on Gaussian vectors conditioned to
polyhedral sets. In the one-dimensional case, we prove a polyhedral
characterization of obtaining given clusters, than enables us to suggest a test
procedure with statistical guarantees. This characterization also allows us to
provide a computationally efficient regularization path algorithm. Then, we
extend the above test procedure and guarantees to multi-dimensional clustering
with penalization, and also to more general multi-dimensional
clusterings that aggregate one-dimensional ones. With various numerical
experiments, we validate our statistical guarantees and we demonstrate the
power of our methods to detect differences in mean between clusters. Our
methods are implemented in the R package poclin.Comment: 40 pages, 8 figure
Roto-Translation Covariant Convolutional Networks for Medical Image Analysis
We propose a framework for rotation and translation covariant deep learning
using group convolutions. The group product of the special Euclidean
motion group describes how a concatenation of two roto-translations
results in a net roto-translation. We encode this geometric structure into
convolutional neural networks (CNNs) via group convolutional layers,
which fit into the standard 2D CNN framework, and which allow to generically
deal with rotated input samples without the need for data augmentation.
We introduce three layers: a lifting layer which lifts a 2D (vector valued)
image to an -image, i.e., 3D (vector valued) data whose domain is
; a group convolution layer from and to an -image; and a
projection layer from an -image to a 2D image. The lifting and group
convolution layers are covariant (the output roto-translates with the
input). The final projection layer, a maximum intensity projection over
rotations, makes the full CNN rotation invariant.
We show with three different problems in histopathology, retinal imaging, and
electron microscopy that with the proposed group CNNs, state-of-the-art
performance can be achieved, without the need for data augmentation by rotation
and with increased performance compared to standard CNNs that do rely on
augmentation.Comment: 8 pages, 2 figures, 1 table, accepted at MICCAI 201
Quantum Cosmological Relational Model of Shape and Scale in 1-d
Relational particle models are useful toy models for quantum cosmology and
the problem of time in quantum general relativity. This paper shows how to
extend existing work on concrete examples of relational particle models in 1-d
to include a notion of scale. This is useful as regards forming a tight analogy
with quantum cosmology and the emergent semiclassical time and hidden time
approaches to the problem of time. This paper shows furthermore that the
correspondence between relational particle models and classical and quantum
cosmology can be strengthened using judicious choices of the mechanical
potential. This gives relational particle mechanics models with analogues of
spatial curvature, cosmological constant, dust and radiation terms. A number of
these models are then tractable at the quantum level. These models can be used
to study important issues 1) in canonical quantum gravity: the problem of time,
the semiclassical approach to it and timeless approaches to it (such as the
naive Schrodinger interpretation and records theory). 2) In quantum cosmology,
such as in the investigation of uniform states, robustness, and the qualitative
understanding of the origin of structure formation.Comment: References and some more motivation adde
Land use mapping and modelling for the Phoenix Quadrangle
The author has identified the following significant results. The mapping of generalized land use (level 1) from ERTS 1 images was shown to be feasible with better than 95% accuracy in the Phoenix quadrangle. The accuracy of level 2 mapping in urban areas is still a problem. Updating existing maps also proved to be feasible, especially in water categories and agricultural uses; however, expanding urban growth has presented with accuracy. ERTS 1 film images indicated where areas of change were occurring, thus aiding focusing-in for more detailed investigation. ERTS color composite transparencies provided a cost effective source of information for land use mapping of very large regions at small map scales
Onset of an outline map to get a hold on the wildwood of clustering methods
The domain of cluster analysis is a meeting point for a very rich
multidisciplinary encounter, with cluster-analytic methods being studied and
developed in discrete mathematics, numerical analysis, statistics, data
analysis and data science, and computer science (including machine learning,
data mining, and knowledge discovery), to name but a few. The other side of the
coin, however, is that the domain suffers from a major accessibility problem as
well as from the fact that it is rife with division across many pretty isolated
islands. As a way out, the present paper offers an outline map for the
clustering domain as a whole, which takes the form of an overarching conceptual
framework and a common language. With this framework we wish to contribute to
structuring the domain, to characterizing methods that have often been
developed and studied in quite different contexts, to identifying links between
them, and to introducing a frame of reference for optimally setting up cluster
analyses in data-analytic practice.Comment: 33 pages, 4 figure
- …