10,834 research outputs found
Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm
One of the most fundamental concepts in statistics is the concept of sample
mean. Properties of the sample mean that are well-defined in Euclidean spaces
become unwieldy or even unclear in graph spaces. Open problems related to the
sample mean of graphs include: non-existence, non-uniqueness, statistical
inconsistency, lack of convergence results of mean algorithms, non-existence of
midpoints, and disparity to midpoints. We present conditions to resolve all six
problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on
graph datasets representing images and molecules show that the MMM-Algorithm
best approximates a sample mean of graphs compared to six other mean
algorithms
Faster Balanced Clusterings in High Dimension
The problem of constrained clustering has attracted significant attention in
the past decades. In this paper, we study the balanced -center, -median,
and -means clustering problems where the size of each cluster is constrained
by the given lower and upper bounds. The problems are motivated by the
applications in processing large-scale data in high dimension. Existing methods
often need to compute complicated matchings (or min cost flows) to satisfy the
balance constraint, and thus suffer from high complexities especially in high
dimension. We develop an effective framework for the three balanced clustering
problems to address this issue, and our method is based on a novel spatial
partition idea in geometry. For the balanced -center clustering, we provide
a -approximation algorithm that improves the existing approximation factors;
for the balanced -median and -means clusterings, our algorithms yield
constant and -approximation factors with any . More
importantly, our algorithms achieve linear or nearly linear running times when
is a constant, and significantly improve the existing ones. Our results can
be easily extended to metric balanced clusterings and the running times are
sub-linear in terms of the complexity of -point metric
A primer on substitution tilings of the Euclidean plane
This paper is intended to provide an introduction to the theory of
substitution tilings. For our purposes, tiling substitution rules are divided
into two broad classes: geometric and combinatorial. Geometric substitution
tilings include self-similar tilings such as the well-known Penrose tilings;
for this class there is a substantial body of research in the literature.
Combinatorial substitutions are just beginning to be examined, and some of what
we present here is new. We give numerous examples, mention selected major
results, discuss connections between the two classes of substitutions, include
current research perspectives and questions, and provide an extensive
bibliography. Although the author attempts to fairly represent the as a whole,
the paper is not an exhaustive survey, and she apologizes for any important
omissions.Comment: 26 pages, 39 figure
Proper Scoring Rules and Bregman Divergences
We revisit the mathematical foundations of proper scoring rules (PSRs) and
Bregman divergences and present their characteristic properties in a unified
theoretical framework. In many situations it is preferable not to generate a
PSR directly from its convex entropy on the unit simplex but instead by the
sublinear extension of the entropy to the positive orthant. This gives the
scoring rule simply as a subgradient of the extended entropy, allowing for a
more elegant theory. The other convex extensions of the entropy generate affine
extensions of the scoring rule and induce the class of functional Bregman
divergences. We discuss the geometric nature of the relationship between PSRs
and Bregman divergences and extend and unify existing partial results. We also
approach the topic of differentiability of entropy functions. Not all entropies
of interest possess functional derivatives, but they do all have directional
derivatives in almost every direction. Relying on the notion of quasi-interior
of a convex set to quantify the latter property, we formalise under what
conditions a PSR may be considered to be uniquely determined from its entropy
Hyperbolic Image Embeddings
Computer vision tasks such as image classification, image retrieval and
few-shot learning are currently dominated by Euclidean and spherical
embeddings, so that the final decisions about class belongings or the degree of
similarity are made using linear hyperplanes, Euclidean distances, or spherical
geodesic distances (cosine similarity). In this work, we demonstrate that in
many practical scenarios hyperbolic embeddings provide a better alternative
Convergence of graphs with intermediate density
We propose a notion of graph convergence that interpolates between the
Benjamini--Schramm convergence of bounded degree graphs and the dense graph
convergence developed by L\'aszl\'o Lov\'asz and his coauthors. We prove that
spectra of graphs, and also some important graph parameters such as numbers of
colorings or matchings, behave well in convergent graph sequences. Special
attention is given to graph sequences of large essential girth, for which
asymptotics of coloring numbers are explicitly calculated. We also treat
numbers of matchings in approximately regular graphs.
We introduce tentative limit objects that we call graphonings because they
are common generalizations of graphons and graphings. Special forms of these,
called Hausdorff and Euclidean graphonings, involve geometric measure theory.
We construct Euclidean graphonings that provide limits of hypercubes and of
finite projective planes, and, more generally, of a wide class of regular
sequences of large essential girth. For any convergent sequence of large
essential girth, we construct weaker limit objects: an involution invariant
probability measure on the sub-Markov space of consistent measure sequences
(this is unique), or an acyclic reversible sub-Markov kernel on a probability
space (non-unique). We also pose some open problems.Comment: 41 pages. Minor errors have been correcte
RetGK: Graph Kernels based on Return Probabilities of Random Walks
Graph-structured data arise in wide applications, such as computer vision,
bioinformatics, and social networks. Quantifying similarities among graphs is a
fundamental problem. In this paper, we develop a framework for computing graph
kernels, based on return probabilities of random walks. The advantages of our
proposed kernels are that they can effectively exploit various node attributes,
while being scalable to large datasets. We conduct extensive graph
classification experiments to evaluate our graph kernels. The experimental
results show that our graph kernels significantly outperform existing
state-of-the-art approaches in both accuracy and computational efficiency
Building pattern recognition applications with the SPARE library
This paper presents the SPARE C++ library, an open source software tool
conceived to build pattern recognition and soft computing systems. The library
follows the requirement of the generality: most of the implemented algorithms
are able to process user-defined input data types transparently, such as
labeled graphs and sequences of objects, as well as standard numeric vectors.
Here we present a high-level picture of the SPARE library characteristics,
focusing instead on the specific practical possibility of constructing pattern
recognition systems for different input data types. In particular, as a proof
of concept, we discuss two application instances involving clustering of
real-valued multidimensional sequences and classification of labeled graphs.Comment: Home page: https://sourceforge.net/p/libspare/home/Spare
A unified framework for harmonic analysis of functions on directed graphs and changing data
We present a general framework for studying harmonic analysis of functions in
the settings of various emerging problems in the theory of diffusion geometry.
The starting point of the now classical diffusion geometry approach is the
construction of a kernel whose discretization leads to an undirected graph
structure on an unstructured data set. We study the question of constructing
such kernels for directed graph structures, and argue that our construction is
essentially the only way to do so using discretizations of kernels. We then use
our previous theory to develop harmonic analysis based on the singular value
decomposition of the resulting non-self-adjoint operators associated with the
directed graph. Next, we consider the question of how functions defined on one
space evolves to another space in the paradigm of changing data sets recently
introduced by Coifman and Hirn. While the approach of Coifman and Hirn require
that the points on one space should be in a known one-to-one correspondence
with the points on the other, our approach allows the identification of only a
subset of landmark points. We introduce a new definition of distance between
points on two spaces, construct localized kernels based on the two spaces and
certain interaction parameters, and study the evolution of smoothness of a
function on one space to its lifting to the other space via the landmarks. We
develop novel mathematical tools that enable us to study these seemingly
different problems in a unified manner.Comment: Submitted earlier version on July 1, 2015; accepted for publication
in Appl. Comput. Harm. Anal. Available online June 28, 2016, 28 page
On the Evaluation of Video Keyframe Summaries using User Ground Truth
Given the great interest in creating keyframe summaries from video, it is
surprising how little has been done to formalise their evaluation and
comparison. User studies are often carried out to demonstrate that a proposed
method generates a more appealing summary than one or two rival methods. But
larger comparison studies cannot feasibly use such user surveys. Here we
propose a discrimination capacity measure as a formal way to quantify the
improvement over the uniform baseline, assuming that one or more ground truth
summaries are available. Using the VSUMM video collection, we examine 10 video
feature types, including CNN and SURF, and 6 methods for matching frames from
two summaries. Our results indicate that a simple frame representation through
hue histograms suffices for the purposes of comparing keyframe summaries. We
subsequently propose a formal protocol for comparing summaries when ground
truth is available.Comment: 12 pages, 10 figures, 2 table
- …