100,413 research outputs found
Representation Learning for Clustering: A Statistical Framework
We address the problem of communicating domain knowledge from a user to the
designer of a clustering algorithm. We propose a protocol in which the user
provides a clustering of a relatively small random sample of a data set. The
algorithm designer then uses that sample to come up with a data representation
under which -means clustering results in a clustering (of the full data set)
that is aligned with the user's clustering. We provide a formal statistical
model for analyzing the sample complexity of learning a clustering
representation with this paradigm. We then introduce a notion of capacity of a
class of possible representations, in the spirit of the VC-dimension, showing
that classes of representations that have finite such dimension can be
successfully learned with sample size error bounds, and end our discussion with
an analysis of that dimension for classes of representations induced by linear
embeddings.Comment: To be published in Proceedings of UAI 201
-Coresets for Clustering (with Outliers) in Doubling Metrics
We study the problem of constructing -coresets for the -clustering problem in a doubling metric . An -coreset
is a weighted subset with weight function , such that for any -subset , it holds that
.
We present an efficient algorithm that constructs an -coreset
for the -clustering problem in , where the size of the coreset
only depends on the parameters and the doubling dimension
. To the best of our knowledge, this is the first efficient
-coreset construction of size independent of for general
clustering problems in doubling metrics.
To this end, we establish the first relation between the doubling dimension
of and the shattering dimension (or VC-dimension) of the range space
induced by the distance . Such a relation was not known before, since one
can easily construct instances in which neither one can be bounded by (some
function of) the other. Surprisingly, we show that if we allow a small
-distortion of the distance function , and consider the
notion of -error probabilistic shattering dimension, we can prove an
upper bound of for the probabilistic shattering dimension for
even weighted doubling metrics. We believe this new relation is of independent
interest and may find other applications.
We also study the robust coresets and centroid sets in doubling metrics. Our
robust coreset construction leads to new results in clustering and property
testing, and the centroid sets can be used to accelerate the local search
algorithms for clustering problems.Comment: Appeared in FOCS 2018, this is the full versio
Importance of small earthquakes for stress transfers and earthquake triggering
We estimate the relative importance of small and large earthquakes for static
stress changes and for earthquake triggering, assuming that earthquakes are
triggered by static stress changes and that earthquakes are located on a
fractal network of dimension D. This model predicts that both the number of
events triggered by an earthquake of magnitude m and the stress change induced
by this earthquake at the location of other earthquakes increase with m as
\~10^(Dm/2). The stronger the spatial clustering, the larger the influence of
small earthquakes on stress changes at the location of a future event as well
as earthquake triggering. If earthquake magnitudes follow the Gutenberg-Richter
law with b>D/2, small earthquakes collectively dominate stress transfer and
earthquake triggering, because their greater frequency overcomes their smaller
individual triggering potential. Using a Southern-California catalog, we
observe that the rate of seismicity triggered by an earthquake of magnitude m
increases with m as 10^(alpha m), where alpha=1.00+-0.05. We also find that the
magnitude distribution of triggered earthquakes is independent of the
triggering earthquake magnitude m. When alpha=b, small earthquakes are roughly
as important to earthquake triggering as larger ones. We evaluate the fractal
correlation dimension of hypocenters D=2 using two relocated catalogs for
Southern California, and removing the effect of short-term clustering. Thus
D=2alpha as predicted by assuming that earthquake triggering is due to static
stress. The value D=2 implies that small earthquakes are as important as larger
ones for stress transfers between earthquakes.Comment: 14 pages, 7 eps figures, latex. In press in J. Geophys. Re
Ewald Sums for One Dimension
We derive analytic solutions for the potential and field in a one-dimensional
system of masses or charges with periodic boundary conditions, in other words
Ewald sums for one dimension. We also provide a set of tools for exploring the
system evolution and show that it's possible to construct an efficient
algorithm for carrying out simulations. In the cosmological setting we show
that two approaches for satisfying periodic boundary conditions, one overly
specified and the other completely general, provide a nearly identical
clustering evolution until the number of clusters becomes small, at which time
the influence of any size-dependent boundary cannot be ignored. Finally we
compare the results with other recent work with the hope of providing
clarification over differences these issues have induced. We explain that
modern formulations of physics require a well defined potential which is not
available if the forces are screened directly.Comment: 2 figures added references expanded discussion of algorithm corrected
figures added discussion of screened forc
- …