787 research outputs found
On the swap-distances of different realizations of a graphical degree sequence
One of the first graph theoretical problems which got serious attention
(already in the fifties of the last century) was to decide whether a given
integer sequence is equal to the degree sequence of a simple graph (or it is
{\em graphical} for short). One method to solve this problem is the greedy
algorithm of Havel and Hakimi, which is based on the {\em swap} operation.
Another, closely related question is to find a sequence of swap operations to
transform one graphical realization into another one of the same degree
sequence. This latter problem got particular emphases in connection of fast
mixing Markov chain approaches to sample uniformly all possible realizations of
a given degree sequence. (This becomes a matter of interest in connection of --
among others -- the study of large social networks.) Earlier there were only
crude upper bounds on the shortest possible length of such swap sequences
between two realizations. In this paper we develop formulae (Gallai-type
identities) for these {\em swap-distance}s of any two realizations of simple
undirected or directed degree sequences. These identities improves considerably
the known upper bounds on the swap-distances.Comment: to be publishe
Graph realizations constrained by skeleton graphs
In 2008 Amanatidis, Green and Mihail introduced the Joint Degree Matrix (JDM)
model to capture the fundamental difference in assortativity of networks in
nature studied by the physical and life sciences and social networks studied in
the social sciences. In 2014 Czabarka proposed a direct generalization of the
JDM model, the Partition Adjacency Matrix (PAM) model. In the PAM model the
vertices have specified degrees, and the vertex set itself is partitioned into
classes. For each pair of vertex classes the number of edges between the
classes in a graph realization is prescribed. In this paper we apply the new
{\em skeleton graph} model to describe the same information as the PAM model.
Our model is more convenient for handling problems with low number of partition
classes or with special topological restrictions among the classes. We
investigate two particular cases in detail: (i) when there are only two vertex
classes and (ii) when the skeleton graph contains at most one cycle.Comment: 19 page
Towards random uniform sampling of bipartite graphs with given degree sequence
In this paper we consider a simple Markov chain for bipartite graphs with
given degree sequence on vertices. We show that the mixing time of this
Markov chain is bounded above by a polynomial in in case of {\em
semi-regular} degree sequence. The novelty of our approach lays in the
construction of the canonical paths in Sinclair's method.Comment: 47 pages, submitted for publication. In this version we explain
explicitly our main contribution and corrected a serious flaw in the cycle
decompositio
Towards random uniform sampling of bipartite graphs with given degree sequence
In this paper we consider a simple Markov chain for bipartite graphs with given degree sequence on n vertices. We show that the mixing time of this Markov chain is bounded above by a polynomial in n in case of half-regular degree sequence. The novelty of our approach lies in the construction of the multicommodity flow in Sinclair's method
New methods for fixed-margin binary matrix sampling, Fréchet covariance, and MANOVA tests for random objects in multiple metric spaces
2022 Summer.Includes bibliographical references.Many approaches to the analysis of network data essentially view the data as Euclidean and apply standard multivariate techniques. In this dissertation, we refrain from this approach, exploring two alternate approaches to the analysis of networks and other structured data. The first approach seeks to determine how unique an observed simple, directed network is by comparing it to like networks which share its degree distribution. Generating networks for comparison requires sampling from the space of all binary matrices with the prescribed row and column margins, since enumeration of all such matrices is often infeasible for even moderately sized networks with 20-50 nodes. We propose two new sampling methods for this problem. First, we extend two Markov chain Monte Carlo methods to sample from the space non-uniformly, allowing flexibility in the case that some networks are more likely than others. We show that non-uniform sampling could impede the MCMC process, but in certain special cases is still valid. Critically, we illustrate the differential conclusions that could be drawn from uniform vs. nonuniform sampling. Second, we develop a generalized divide and conquer approach which recursively divides matrices into smaller subproblems which are much easier to count and sample. Each division step reveals interesting mathematics involving the enumeration of integer partitions and points in convex lattice polytopes. The second broad approach we explore is comparing random objects in metric spaces lacking a coordinate system. Traditional definitions of the mean and variance no longer apply, and standard statistical tests have needed reconceptualization in terms of only distances in the metric space. We consider the multivariate setting where random objects exist in multiple metric spaces, which can be thought of as distinct views of the random object. We define the notion of Fréchet covariance to measure dependence between two metric spaces, and establish consistency for the sample estimator. We then propose several tests for differences in means and covariance matrices among two or more groups in multiple metric spaces, and compare their performance on scenarios involving random probability distributions and networks with node covariates
Clustering Financial Time Series: How Long is Enough?
Researchers have used from 30 days to several years of daily returns as
source data for clustering financial time series based on their correlations.
This paper sets up a statistical framework to study the validity of such
practices. We first show that clustering correlated random variables from their
observed values is statistically consistent. Then, we also give a first
empirical answer to the much debated question: How long should the time series
be? If too short, the clusters found can be spurious; if too long, dynamics can
be smoothed out.Comment: Accepted at IJCAI 201
Parallel enumeration of degree sequences of simple graphs. II.
Abstract
In the paper we report on the parallel enumeration of the degree sequences (their number is denoted by G(n)) and zerofree degree sequences (their number is denoted by (Gz(n)) of simple graphs on n = 30 and n = 31 vertices. Among others we obtained that the number of zerofree degree sequences of graphs on n = 30 vertices is Gz(30) = 5 876 236 938 019 300 and on n = 31 vertices is Gz(31) = 22 974 847 474 172 374. Due to Corollary 21 in [52] these results give the number of degree sequences of simple graphs on 30 and 31 vertices.</jats:p
- …