10,834 research outputs found

    Properties of the Sample Mean in Graph Spaces and the Majorize-Minimize-Mean Algorithm

    Full text link
    One of the most fundamental concepts in statistics is the concept of sample mean. Properties of the sample mean that are well-defined in Euclidean spaces become unwieldy or even unclear in graph spaces. Open problems related to the sample mean of graphs include: non-existence, non-uniqueness, statistical inconsistency, lack of convergence results of mean algorithms, non-existence of midpoints, and disparity to midpoints. We present conditions to resolve all six problems and propose a Majorize-Minimize-Mean (MMM) Algorithm. Experiments on graph datasets representing images and molecules show that the MMM-Algorithm best approximates a sample mean of graphs compared to six other mean algorithms

    Faster Balanced Clusterings in High Dimension

    Full text link
    The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced kk-center, kk-median, and kk-means clustering problems where the size of each cluster is constrained by the given lower and upper bounds. The problems are motivated by the applications in processing large-scale data in high dimension. Existing methods often need to compute complicated matchings (or min cost flows) to satisfy the balance constraint, and thus suffer from high complexities especially in high dimension. We develop an effective framework for the three balanced clustering problems to address this issue, and our method is based on a novel spatial partition idea in geometry. For the balanced kk-center clustering, we provide a 44-approximation algorithm that improves the existing approximation factors; for the balanced kk-median and kk-means clusterings, our algorithms yield constant and (1+ϵ)(1+\epsilon)-approximation factors with any ϵ>0\epsilon>0. More importantly, our algorithms achieve linear or nearly linear running times when kk is a constant, and significantly improve the existing ones. Our results can be easily extended to metric balanced clusterings and the running times are sub-linear in terms of the complexity of nn-point metric

    A primer on substitution tilings of the Euclidean plane

    Full text link
    This paper is intended to provide an introduction to the theory of substitution tilings. For our purposes, tiling substitution rules are divided into two broad classes: geometric and combinatorial. Geometric substitution tilings include self-similar tilings such as the well-known Penrose tilings; for this class there is a substantial body of research in the literature. Combinatorial substitutions are just beginning to be examined, and some of what we present here is new. We give numerous examples, mention selected major results, discuss connections between the two classes of substitutions, include current research perspectives and questions, and provide an extensive bibliography. Although the author attempts to fairly represent the as a whole, the paper is not an exhaustive survey, and she apologizes for any important omissions.Comment: 26 pages, 39 figure

    Proper Scoring Rules and Bregman Divergences

    Full text link
    We revisit the mathematical foundations of proper scoring rules (PSRs) and Bregman divergences and present their characteristic properties in a unified theoretical framework. In many situations it is preferable not to generate a PSR directly from its convex entropy on the unit simplex but instead by the sublinear extension of the entropy to the positive orthant. This gives the scoring rule simply as a subgradient of the extended entropy, allowing for a more elegant theory. The other convex extensions of the entropy generate affine extensions of the scoring rule and induce the class of functional Bregman divergences. We discuss the geometric nature of the relationship between PSRs and Bregman divergences and extend and unify existing partial results. We also approach the topic of differentiability of entropy functions. Not all entropies of interest possess functional derivatives, but they do all have directional derivatives in almost every direction. Relying on the notion of quasi-interior of a convex set to quantify the latter property, we formalise under what conditions a PSR may be considered to be uniquely determined from its entropy

    Hyperbolic Image Embeddings

    Full text link
    Computer vision tasks such as image classification, image retrieval and few-shot learning are currently dominated by Euclidean and spherical embeddings, so that the final decisions about class belongings or the degree of similarity are made using linear hyperplanes, Euclidean distances, or spherical geodesic distances (cosine similarity). In this work, we demonstrate that in many practical scenarios hyperbolic embeddings provide a better alternative

    Convergence of graphs with intermediate density

    Full text link
    We propose a notion of graph convergence that interpolates between the Benjamini--Schramm convergence of bounded degree graphs and the dense graph convergence developed by L\'aszl\'o Lov\'asz and his coauthors. We prove that spectra of graphs, and also some important graph parameters such as numbers of colorings or matchings, behave well in convergent graph sequences. Special attention is given to graph sequences of large essential girth, for which asymptotics of coloring numbers are explicitly calculated. We also treat numbers of matchings in approximately regular graphs. We introduce tentative limit objects that we call graphonings because they are common generalizations of graphons and graphings. Special forms of these, called Hausdorff and Euclidean graphonings, involve geometric measure theory. We construct Euclidean graphonings that provide limits of hypercubes and of finite projective planes, and, more generally, of a wide class of regular sequences of large essential girth. For any convergent sequence of large essential girth, we construct weaker limit objects: an involution invariant probability measure on the sub-Markov space of consistent measure sequences (this is unique), or an acyclic reversible sub-Markov kernel on a probability space (non-unique). We also pose some open problems.Comment: 41 pages. Minor errors have been correcte

    RetGK: Graph Kernels based on Return Probabilities of Random Walks

    Full text link
    Graph-structured data arise in wide applications, such as computer vision, bioinformatics, and social networks. Quantifying similarities among graphs is a fundamental problem. In this paper, we develop a framework for computing graph kernels, based on return probabilities of random walks. The advantages of our proposed kernels are that they can effectively exploit various node attributes, while being scalable to large datasets. We conduct extensive graph classification experiments to evaluate our graph kernels. The experimental results show that our graph kernels significantly outperform existing state-of-the-art approaches in both accuracy and computational efficiency

    Building pattern recognition applications with the SPARE library

    Full text link
    This paper presents the SPARE C++ library, an open source software tool conceived to build pattern recognition and soft computing systems. The library follows the requirement of the generality: most of the implemented algorithms are able to process user-defined input data types transparently, such as labeled graphs and sequences of objects, as well as standard numeric vectors. Here we present a high-level picture of the SPARE library characteristics, focusing instead on the specific practical possibility of constructing pattern recognition systems for different input data types. In particular, as a proof of concept, we discuss two application instances involving clustering of real-valued multidimensional sequences and classification of labeled graphs.Comment: Home page: https://sourceforge.net/p/libspare/home/Spare

    A unified framework for harmonic analysis of functions on directed graphs and changing data

    Full text link
    We present a general framework for studying harmonic analysis of functions in the settings of various emerging problems in the theory of diffusion geometry. The starting point of the now classical diffusion geometry approach is the construction of a kernel whose discretization leads to an undirected graph structure on an unstructured data set. We study the question of constructing such kernels for directed graph structures, and argue that our construction is essentially the only way to do so using discretizations of kernels. We then use our previous theory to develop harmonic analysis based on the singular value decomposition of the resulting non-self-adjoint operators associated with the directed graph. Next, we consider the question of how functions defined on one space evolves to another space in the paradigm of changing data sets recently introduced by Coifman and Hirn. While the approach of Coifman and Hirn require that the points on one space should be in a known one-to-one correspondence with the points on the other, our approach allows the identification of only a subset of landmark points. We introduce a new definition of distance between points on two spaces, construct localized kernels based on the two spaces and certain interaction parameters, and study the evolution of smoothness of a function on one space to its lifting to the other space via the landmarks. We develop novel mathematical tools that enable us to study these seemingly different problems in a unified manner.Comment: Submitted earlier version on July 1, 2015; accepted for publication in Appl. Comput. Harm. Anal. Available online June 28, 2016, 28 page

    On the Evaluation of Video Keyframe Summaries using User Ground Truth

    Full text link
    Given the great interest in creating keyframe summaries from video, it is surprising how little has been done to formalise their evaluation and comparison. User studies are often carried out to demonstrate that a proposed method generates a more appealing summary than one or two rival methods. But larger comparison studies cannot feasibly use such user surveys. Here we propose a discrimination capacity measure as a formal way to quantify the improvement over the uniform baseline, assuming that one or more ground truth summaries are available. Using the VSUMM video collection, we examine 10 video feature types, including CNN and SURF, and 6 methods for matching frames from two summaries. Our results indicate that a simple frame representation through hue histograms suffices for the purposes of comparing keyframe summaries. We subsequently propose a formal protocol for comparing summaries when ground truth is available.Comment: 12 pages, 10 figures, 2 table
    • …
    corecore