Search CORE

10,834 research outputs found

Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm

Author: Jain Brijnesh J.
Publication venue
Publication date: 03/11/2015
Field of study

One of the most fundamental concepts in statistics is the concept of sample mean. Properties of the sample mean that are well-defined in Euclidean spaces become unwieldy or even unclear in graph spaces. Open problems related to the sample mean of graphs include: non-existence, non-uniqueness, statistical inconsistency, lack of convergence results of mean algorithms, non-existence of midpoints, and disparity to midpoints. We present conditions to resolve all six problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on graph datasets representing images and molecules show that the MMM-Algorithm best approximates a sample mean of graphs compared to six other mean algorithms

arXiv.org e-Print Archive

Faster Balanced Clusterings in High Dimension

Author: Ding Hu
Publication venue
Publication date: 09/09/2018
Field of study

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced

k

-center,

k

-median, and

k

-means clustering problems where the size of each cluster is constrained by the given lower and upper bounds. The problems are motivated by the applications in processing large-scale data in high dimension. Existing methods often need to compute complicated matchings (or min cost flows) to satisfy the balance constraint, and thus suffer from high complexities especially in high dimension. We develop an effective framework for the three balanced clustering problems to address this issue, and our method is based on a novel spatial partition idea in geometry. For the balanced

k

-center clustering, we provide a

4

-approximation algorithm that improves the existing approximation factors; for the balanced

k

-median and

k

-means clusterings, our algorithms yield constant and

(1+\epsilon)

-approximation factors with any

\epsilon>0

. More importantly, our algorithms achieve linear or nearly linear running times when

k

is a constant, and significantly improve the existing ones. Our results can be easily extended to metric balanced clusterings and the running times are sub-linear in terms of the complexity of

n

-point metric

arXiv.org e-Print Archive

A primer on substitution tilings of the Euclidean plane

Author: Frank Natalie Priebe
Publication venue
Publication date: 08/05/2007
Field of study

This paper is intended to provide an introduction to the theory of substitution tilings. For our purposes, tiling substitution rules are divided into two broad classes: geometric and combinatorial. Geometric substitution tilings include self-similar tilings such as the well-known Penrose tilings; for this class there is a substantial body of research in the literature. Combinatorial substitutions are just beginning to be examined, and some of what we present here is new. We give numerous examples, mention selected major results, discuss connections between the two classes of substitutions, include current research perspectives and questions, and provide an extensive bibliography. Although the author attempts to fairly represent the as a whole, the paper is not an exhaustive survey, and she apologizes for any important omissions.Comment: 26 pages, 39 figure

arXiv.org e-Print Archive

Proper Scoring Rules and Bregman Divergences

Author: Ovcharov Evgeni Y.
Publication venue
Publication date: 10/09/2015
Field of study

We revisit the mathematical foundations of proper scoring rules (PSRs) and Bregman divergences and present their characteristic properties in a unified theoretical framework. In many situations it is preferable not to generate a PSR directly from its convex entropy on the unit simplex but instead by the sublinear extension of the entropy to the positive orthant. This gives the scoring rule simply as a subgradient of the extended entropy, allowing for a more elegant theory. The other convex extensions of the entropy generate affine extensions of the scoring rule and induce the class of functional Bregman divergences. We discuss the geometric nature of the relationship between PSRs and Bregman divergences and extend and unify existing partial results. We also approach the topic of differentiability of entropy functions. Not all entropies of interest possess functional derivatives, but they do all have directional derivatives in almost every direction. Relying on the notion of quasi-interior of a convex set to quantify the latter property, we formalise under what conditions a PSR may be considered to be uniquely determined from its entropy

arXiv.org e-Print Archive

Hyperbolic Image Embeddings

Author: Khrulkov Valentin
Lempitsky Victor
Mirvakhabova Leyla
Oseledets Ivan
Ustinova Evgeniya
Publication venue
Publication date: 30/03/2020
Field of study

Computer vision tasks such as image classification, image retrieval and few-shot learning are currently dominated by Euclidean and spherical embeddings, so that the final decisions about class belongings or the degree of similarity are made using linear hyperplanes, Euclidean distances, or spherical geodesic distances (cosine similarity). In this work, we demonstrate that in many practical scenarios hyperbolic embeddings provide a better alternative

arXiv.org e-Print Archive

Convergence of graphs with intermediate density

Author: Frenkel Péter E.
Publication venue: 'American Mathematical Society (AMS)'
Publication date: 26/12/2017
Field of study

We propose a notion of graph convergence that interpolates between the Benjamini--Schramm convergence of bounded degree graphs and the dense graph convergence developed by L\'aszl\'o Lov\'asz and his coauthors. We prove that spectra of graphs, and also some important graph parameters such as numbers of colorings or matchings, behave well in convergent graph sequences. Special attention is given to graph sequences of large essential girth, for which asymptotics of coloring numbers are explicitly calculated. We also treat numbers of matchings in approximately regular graphs. We introduce tentative limit objects that we call graphonings because they are common generalizations of graphons and graphings. Special forms of these, called Hausdorff and Euclidean graphonings, involve geometric measure theory. We construct Euclidean graphonings that provide limits of hypercubes and of finite projective planes, and, more generally, of a wide class of regular sequences of large essential girth. For any convergent sequence of large essential girth, we construct weaker limit objects: an involution invariant probability measure on the sub-Markov space of consistent measure sequences (this is unique), or an acyclic reversible sub-Markov kernel on a probability space (non-unique). We also pose some open problems.Comment: 41 pages. Minor errors have been correcte

arXiv.org e-Print Archive

RetGK: Graph Kernels based on Return Probabilities of Random Walks

Author: Huang Yan
Nehorai Arye
Wang Mianzhi
Xiang Yijian
Zhang Zhen
Publication venue
Publication date: 07/09/2018
Field of study

Graph-structured data arise in wide applications, such as computer vision, bioinformatics, and social networks. Quantifying similarities among graphs is a fundamental problem. In this paper, we develop a framework for computing graph kernels, based on return probabilities of random walks. The advantages of our proposed kernels are that they can effectively exploit various node attributes, while being scalable to large datasets. We conduct extensive graph classification experiments to evaluate our graph kernels. The experimental results show that our graph kernels significantly outperform existing state-of-the-art approaches in both accuracy and computational efficiency

arXiv.org e-Print Archive

Building pattern recognition applications with the SPARE library

Author: Del Vescovo Guido
Livi Lorenzo
Mascioli Fabio Massimo Frattale
Rizzi Antonello
Publication venue
Publication date: 20/02/2015
Field of study

This paper presents the SPARE C++ library, an open source software tool conceived to build pattern recognition and soft computing systems. The library follows the requirement of the generality: most of the implemented algorithms are able to process user-defined input data types transparently, such as labeled graphs and sequences of objects, as well as standard numeric vectors. Here we present a high-level picture of the SPARE library characteristics, focusing instead on the specific practical possibility of constructing pattern recognition systems for different input data types. In particular, as a proof of concept, we discuss two application instances involving clustering of real-valued multidimensional sequences and classification of labeled graphs.Comment: Home page: https://sourceforge.net/p/libspare/home/Spare

arXiv.org e-Print Archive

A unified framework for harmonic analysis of functions on directed graphs and changing data

Author: Mhaskar Hrushikesh N.
Publication venue: 'Elsevier BV'
Publication date: 14/07/2016
Field of study

We present a general framework for studying harmonic analysis of functions in the settings of various emerging problems in the theory of diffusion geometry. The starting point of the now classical diffusion geometry approach is the construction of a kernel whose discretization leads to an undirected graph structure on an unstructured data set. We study the question of constructing such kernels for directed graph structures, and argue that our construction is essentially the only way to do so using discretizations of kernels. We then use our previous theory to develop harmonic analysis based on the singular value decomposition of the resulting non-self-adjoint operators associated with the directed graph. Next, we consider the question of how functions defined on one space evolves to another space in the paradigm of changing data sets recently introduced by Coifman and Hirn. While the approach of Coifman and Hirn require that the points on one space should be in a known one-to-one correspondence with the points on the other, our approach allows the identification of only a subset of landmark points. We introduce a new definition of distance between points on two spaces, construct localized kernels based on the two spaces and certain interaction parameters, and study the evolution of smoothness of a function on one space to its lifting to the other space via the landmarks. We develop novel mathematical tools that enable us to study these seemingly different problems in a unified manner.Comment: Submitted earlier version on July 1, 2015; accepted for publication in Appl. Comput. Harm. Anal. Available online June 28, 2016, 28 page

arXiv.org e-Print Archive

On the Evaluation of Video Keyframe Summaries using User Ground Truth

Author: Gunn Iain A. D.
Kuncheva Ludmila I.
Yousefi Paria
Publication venue
Publication date: 19/12/2017
Field of study

Given the great interest in creating keyframe summaries from video, it is surprising how little has been done to formalise their evaluation and comparison. User studies are often carried out to demonstrate that a proposed method generates a more appealing summary than one or two rival methods. But larger comparison studies cannot feasibly use such user surveys. Here we propose a discrimination capacity measure as a formal way to quantify the improvement over the uniform baseline, assuming that one or more ground truth summaries are available. Using the VSUMM video collection, we examine 10 video feature types, including CNN and SURF, and 6 methods for matching frames from two summaries. Our results indicate that a simple frame representation through hue histograms suffices for the purposes of comparing keyframe summaries. We subsequently propose a formal protocol for comparing summaries when ground truth is available.Comment: 12 pages, 10 figures, 2 table

arXiv.org e-Print Archive