5,260 research outputs found
Using discrepancy to control singular values for nonnegative matrices
AbstractWe will consider two parameters which can be associated with a nonnegative matrix: the second largest singular value of the “normalized” matrix, and the discrepancy of the entries (which is a measurement between the sum of the actual entries in blocks versus the expected sum). Our main result is to show that these are related in that discrepancy can be bounded by the second largest singular value and vice versa. These matrix results are then used to derive some (edge/alternating walks) discrepancy properties of edge-weighted directed graphs
On the probability of planarity of a random graph near the critical point
Consider the uniform random graph with vertices and edges.
Erd\H{o}s and R\'enyi (1960) conjectured that the limit
\lim_{n \to \infty} \Pr\{G(n,\textstyle{n\over 2}) is planar}} exists
and is a constant strictly between 0 and 1. \L uczak, Pittel and Wierman (1994)
proved this conjecture and Janson, \L uczak, Knuth and Pittel (1993) gave lower
and upper bounds for this probability.
In this paper we determine the exact probability of a random graph being
planar near the critical point . For each , we find an exact
analytic expression for
In particular, we obtain .
We extend these results to classes of graphs closed under taking minors. As
an example, we show that the probability of being
series-parallel converges to 0.98003.
For the sake of completeness and exposition we reprove in a concise way
several basic properties we need of a random graph near the critical point.Comment: 10 pages, 1 figur
Recommended from our members
Hypothesis testing and causal inference with heterogeneous medical data
Learning from data which associations hold and are likely to hold in the future is a fundamental part of scientific discovery. With increasingly heterogeneous data collection practices, exemplified by passively collected electronic health records or high-dimensional genetic data with only few observed samples, biases and spurious correlations are prevalent. These are called spurious because they do not contribute to the effect being studied. In this context, the modelling assumptions of existing statistical tests and causal inference methods are often found inadequate and their practical utility diminished even though these models are increasingly used as decision-support tools in practice. This thesis investigates how modern computational techniques may broaden the fields of hypothesis testing and causal inference to handle the subtleties of large heterogeneous data sets, as well as simultaneously improve the robustness and theoretical understanding of machine learning algorithms using insights from causality and statistics.
The first part of this thesis is concerned with hypothesis testing. We develop a framework for hypothesis testing on set-valued data, a representation that faithfully describes many real-world phenomena including patient biomarker trajectories in the hospital. Using similar techniques, we develop next a two-sample test for making inference on selection-biased data, in the sense that not all individuals are equally likely to be included in the study, a fact that biases tests if not accounted for and if the desideratum is to obtain conclusions that are generally applicable. We conclude this section with an investigation of conditional independence in high-dimensional data, such as found in gene expression data, and propose a test using generative adversarial networks. The second part of this thesis is concerned with causal inference and discovery, with a special focus on the influence of unobserved confounders that distort the observed associations between variables and yet may not be ruled out or adjusted for using data alone. We start by demonstrating that unobserved confounders may bias substantially the generalization performance of machine learning algorithms trained with conventional learning paradigms such as empirical risk minimization. Acknowledging this spurious effect, we develop a new learning principle inspired by causal insights that provably generalizes to test data sampled from a larger set of distributions different from the training distribution. In the last chapter we consider the influence of unobserved confounders for causal discovery. We show that with some assumptions on the type and influence on the nature of unobserved confounding one may develop provably consistent causal discovery algorithms, formulated as a solution to a continuous optimization program
- …