787 research outputs found

    On the swap-distances of different realizations of a graphical degree sequence

    Get PDF
    One of the first graph theoretical problems which got serious attention (already in the fifties of the last century) was to decide whether a given integer sequence is equal to the degree sequence of a simple graph (or it is {\em graphical} for short). One method to solve this problem is the greedy algorithm of Havel and Hakimi, which is based on the {\em swap} operation. Another, closely related question is to find a sequence of swap operations to transform one graphical realization into another one of the same degree sequence. This latter problem got particular emphases in connection of fast mixing Markov chain approaches to sample uniformly all possible realizations of a given degree sequence. (This becomes a matter of interest in connection of -- among others -- the study of large social networks.) Earlier there were only crude upper bounds on the shortest possible length of such swap sequences between two realizations. In this paper we develop formulae (Gallai-type identities) for these {\em swap-distance}s of any two realizations of simple undirected or directed degree sequences. These identities improves considerably the known upper bounds on the swap-distances.Comment: to be publishe

    Graph realizations constrained by skeleton graphs

    Get PDF
    In 2008 Amanatidis, Green and Mihail introduced the Joint Degree Matrix (JDM) model to capture the fundamental difference in assortativity of networks in nature studied by the physical and life sciences and social networks studied in the social sciences. In 2014 Czabarka proposed a direct generalization of the JDM model, the Partition Adjacency Matrix (PAM) model. In the PAM model the vertices have specified degrees, and the vertex set itself is partitioned into classes. For each pair of vertex classes the number of edges between the classes in a graph realization is prescribed. In this paper we apply the new {\em skeleton graph} model to describe the same information as the PAM model. Our model is more convenient for handling problems with low number of partition classes or with special topological restrictions among the classes. We investigate two particular cases in detail: (i) when there are only two vertex classes and (ii) when the skeleton graph contains at most one cycle.Comment: 19 page

    Towards random uniform sampling of bipartite graphs with given degree sequence

    Get PDF
    In this paper we consider a simple Markov chain for bipartite graphs with given degree sequence on nn vertices. We show that the mixing time of this Markov chain is bounded above by a polynomial in nn in case of {\em semi-regular} degree sequence. The novelty of our approach lays in the construction of the canonical paths in Sinclair's method.Comment: 47 pages, submitted for publication. In this version we explain explicitly our main contribution and corrected a serious flaw in the cycle decompositio

    Towards random uniform sampling of bipartite graphs with given degree sequence

    Get PDF
    In this paper we consider a simple Markov chain for bipartite graphs with given degree sequence on n vertices. We show that the mixing time of this Markov chain is bounded above by a polynomial in n in case of half-regular degree sequence. The novelty of our approach lies in the construction of the multicommodity flow in Sinclair's method

    New methods for fixed-margin binary matrix sampling, Fréchet covariance, and MANOVA tests for random objects in multiple metric spaces

    Get PDF
    2022 Summer.Includes bibliographical references.Many approaches to the analysis of network data essentially view the data as Euclidean and apply standard multivariate techniques. In this dissertation, we refrain from this approach, exploring two alternate approaches to the analysis of networks and other structured data. The first approach seeks to determine how unique an observed simple, directed network is by comparing it to like networks which share its degree distribution. Generating networks for comparison requires sampling from the space of all binary matrices with the prescribed row and column margins, since enumeration of all such matrices is often infeasible for even moderately sized networks with 20-50 nodes. We propose two new sampling methods for this problem. First, we extend two Markov chain Monte Carlo methods to sample from the space non-uniformly, allowing flexibility in the case that some networks are more likely than others. We show that non-uniform sampling could impede the MCMC process, but in certain special cases is still valid. Critically, we illustrate the differential conclusions that could be drawn from uniform vs. nonuniform sampling. Second, we develop a generalized divide and conquer approach which recursively divides matrices into smaller subproblems which are much easier to count and sample. Each division step reveals interesting mathematics involving the enumeration of integer partitions and points in convex lattice polytopes. The second broad approach we explore is comparing random objects in metric spaces lacking a coordinate system. Traditional definitions of the mean and variance no longer apply, and standard statistical tests have needed reconceptualization in terms of only distances in the metric space. We consider the multivariate setting where random objects exist in multiple metric spaces, which can be thought of as distinct views of the random object. We define the notion of Fréchet covariance to measure dependence between two metric spaces, and establish consistency for the sample estimator. We then propose several tests for differences in means and covariance matrices among two or more groups in multiple metric spaces, and compare their performance on scenarios involving random probability distributions and networks with node covariates

    Clustering Financial Time Series: How Long is Enough?

    Get PDF
    Researchers have used from 30 days to several years of daily returns as source data for clustering financial time series based on their correlations. This paper sets up a statistical framework to study the validity of such practices. We first show that clustering correlated random variables from their observed values is statistically consistent. Then, we also give a first empirical answer to the much debated question: How long should the time series be? If too short, the clusters found can be spurious; if too long, dynamics can be smoothed out.Comment: Accepted at IJCAI 201

    Parallel enumeration of degree sequences of simple graphs. II.

    Get PDF
    Abstract In the paper we report on the parallel enumeration of the degree sequences (their number is denoted by G(n)) and zerofree degree sequences (their number is denoted by (Gz(n)) of simple graphs on n = 30 and n = 31 vertices. Among others we obtained that the number of zerofree degree sequences of graphs on n = 30 vertices is Gz(30) = 5 876 236 938 019 300 and on n = 31 vertices is Gz(31) = 22 974 847 474 172 374. Due to Corollary 21 in [52] these results give the number of degree sequences of simple graphs on 30 and 31 vertices.</jats:p
    corecore