206 research outputs found
Incompatibility boundaries for properties of community partitions
We prove the incompatibility of certain desirable properties of community
partition quality functions. Our results generalize the impossibility result of
[Kleinberg 2003] by considering sets of weaker properties. In particular, we
use an alternative notion to solve the central issue of the consistency
property. (The latter means that modifying the graph in a way consistent with a
partition should not have counterintuitive effects). Our results clearly show
that community partition methods should not be expected to perfectly satisfy
all ideally desired properties.
We then proceed to show that this incompatibility no longer holds when
slightly relaxed versions of the properties are considered, and we provide in
fact examples of simple quality functions satisfying these relaxed properties.
An experimental study of these quality functions shows a behavior comparable to
established methods in some situations, but more debatable results in others.
This suggests that defining a notion of good partition in communities probably
requires imposing additional properties.Comment: 17 pages, 3 figure
Incremental Clustering: The Case for Extra Clusters
The explosion in the amount of data available for analysis often necessitates
a transition from batch to incremental clustering methods, which process one
element at a time and typically store only a small subset of the data. In this
paper, we initiate the formal analysis of incremental clustering methods
focusing on the types of cluster structure that they are able to detect. We
find that the incremental setting is strictly weaker than the batch model,
proving that a fundamental class of cluster structures that can readily be
detected in the batch setting is impossible to identify using any incremental
method. Furthermore, we show how the limitations of incremental clustering can
be overcome by allowing additional clusters
A Method to Improve the Analysis of Cluster Ensembles
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensembleproblem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a singleconsensus solution that outperforms all the ensemble members. However, there is disagreement about which arethe best ensemble characteristics to obtain a good performance: some authors have suggested that highly differentpartitions within the ensemble are beneï¬ cial for the ï¬ nal performance, whereas others have stated that mediumdiversity among them is better. While there are several measures to quantify the diversity, a better method toanalyze the best ensemble characteristics is necessary. This paper introduces a new ensemble generation strategyand a method to make slight changes in its structure. Experimental results on six datasets suggest that this isan important step towards a more systematic approach to analyze the impact of the ensemble characteristics onthe overall consensus performance.Fil: Pividori, Milton Damián. Universidad Tecnologica Nacional. Facultad Regional Santa Fe. Centro de Investigacion y Desarrollo de Ingenieria en Sistemas de Informacion; Argentina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Tecnologica Nacional. Facultad Regional Santa Fe. Centro de Investigacion y Desarrollo de Ingenieria en Sistemas de Informacion; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de IngenierÃa y Ciencias HÃdricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
Unpredictability of AI
The young field of AI Safety is still in the process of identifying its challenges and limitations. In this paper, we formally describe one such impossibility result, namely Unpredictability of AI. We prove that it is impossible to precisely and consistently predict what specific actions a smarter-than-human intelligent system will take to achieve its objectives, even if we know terminal goals of the system. In conclusion, impact of Unpredictability on AI Safety is discussed
Clustering Financial Time Series: How Long is Enough?
Researchers have used from 30 days to several years of daily returns as
source data for clustering financial time series based on their correlations.
This paper sets up a statistical framework to study the validity of such
practices. We first show that clustering correlated random variables from their
observed values is statistically consistent. Then, we also give a first
empirical answer to the much debated question: How long should the time series
be? If too short, the clusters found can be spurious; if too long, dynamics can
be smoothed out.Comment: Accepted at IJCAI 201
Clustering processes
The problem of clustering is considered, for the case when each data point is
a sample generated by a stationary ergodic process. We propose a very natural
asymptotic notion of consistency, and show that simple consistent algorithms
exist, under most general non-parametric assumptions. The notion of consistency
is as follows: two samples should be put into the same cluster if and only if
they were generated by the same distribution. With this notion of consistency,
clustering generalizes such classical statistical problems as homogeneity
testing and process classification. We show that, for the case of a known
number of clusters, consistency can be achieved under the only assumption that
the joint distribution of the data is stationary ergodic (no parametric or
Markovian assumptions, no assumptions of independence, neither between nor
within the samples). If the number of clusters is unknown, consistency can be
achieved under appropriate assumptions on the mixing rates of the processes.
(again, no parametric or independence assumptions). In both cases we give
examples of simple (at most quadratic in each argument) algorithms which are
consistent.Comment: in proceedings of ICML 2010. arXiv-admin note: for version 2 of this
article please see: arXiv:1005.0826v
- …