    Digraph Complexity Measures and Applications in Formal Language Theory

    We investigate structural complexity measures on digraphs, in particular the cycle rank. This concept is intimately related to a classical topic in formal language theory, namely the star height of regular languages. We explore this connection, and obtain several new algorithmic insights regarding both cycle rank and star height. Among other results, we show that computing the cycle rank is NP-complete, even for sparse digraphs of maximum outdegree 2. Notwithstanding, we provide both a polynomial-time approximation algorithm and an exponential-time exact algorithm for this problem. The former algorithm yields an O((log n)^(3/2))- approximation in polynomial time, whereas the latter yields the optimum solution, and runs in time and space O*(1.9129^n) on digraphs of maximum outdegree at most two. Regarding the star height problem, we identify a subclass of the regular languages for which we can precisely determine the computational complexity of the star height problem. Namely, the star height problem for bideterministic languages is NP-complete, and this holds already for binary alphabets. Then we translate the algorithmic results concerning cycle rank to the bideterministic star height problem, thus giving a polynomial-time approximation as well as a reasonably fast exact exponential algorithm for bideterministic star height.Comment: 19 pages, 1 figur

    Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

    Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including probabilistic topic models and latent linear Bayesian networks, using only second-order observed moments. The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks. Because no assumptions are made on the distribution among the latent variables, the approach can handle arbitrary correlations among the topics or latent factors. In addition, a tractable learning method via 1\ell_1 optimization is proposed and studied in numerical experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and Bayesian networks are studied. Simulation section is adde

    On semi-transitive orientability of Kneser graphs and their complements

    An orientation of a graph is semi-transitive if it is acyclic, and for any directed path v0v1vkv_0\rightarrow v_1\rightarrow \cdots\rightarrow v_k either there is no edge between v0v_0 and vkv_k, or vivjv_i\rightarrow v_j is an edge for all 0i<jk0\leq i<j\leq k. An undirected graph is semi-transitive if it admits a semi-transitive orientation. Semi-transitive graphs include several important classes of graphs such as 3-colorable graphs, comparability graphs, and circle graphs, and they are precisely the class of word-representable graphs studied extensively in the literature. In this paper, we study semi-transitive orientability of the celebrated Kneser graph K(n,k)K(n,k), which is the graph whose vertices correspond to the kk-element subsets of a set of nn elements, and where two vertices are adjacent if and only if the two corresponding sets are disjoint. We show that for n15k24n\geq 15k-24, K(n,k)K(n,k) is not semi-transitive, while for kn2k+1k\leq n\leq 2k+1, K(n,k)K(n,k) is semi-transitive. Also, we show computationally that a subgraph SS on 16 vertices and 36 edges of K(8,3)K(8,3), and thus K(8,3)K(8,3) itself on 56 vertices and 280 edges, is non-semi-transitive. SS and K(8,3)K(8,3) are the first explicit examples of triangle-free non-semi-transitive graphs, whose existence was established via Erd\H{o}s' theorem by Halld\'{o}rsson et al. in 2011. Moreover, we show that the complement graph K(n,k)\overline{K(n,k)} of K(n,k)K(n,k) is semi-transitive if and only if n2kn\geq 2k

    Geometry of the faithfulness assumption in causal inference

    Many algorithms for inferring causality rely heavily on the faithfulness assumption. The main justification for imposing this assumption is that the set of unfaithful distributions has Lebesgue measure zero, since it can be seen as a collection of hypersurfaces in a hypercube. However, due to sampling error the faithfulness condition alone is not sufficient for statistical estimation, and strong-faithfulness has been proposed and assumed to achieve uniform or high-dimensional consistency. In contrast to the plain faithfulness assumption, the set of distributions that is not strong-faithful has nonzero Lebesgue measure and in fact, can be surprisingly large as we show in this paper. We study the strong-faithfulness condition from a geometric and combinatorial point of view and give upper and lower bounds on the Lebesgue measure of strong-faithful distributions for various classes of directed acyclic graphs. Our results imply fundamental limitations for the PC-algorithm and potentially also for other algorithms based on partial correlation testing in the Gaussian case.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1080 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    "Graph Entropy, Network Coding and Guessing games"

    We introduce the (private) entropy of a directed graph (in a new network coding sense) as well as a number of related concepts. We show that the entropy of a directed graph is identical to its guessing number and can be bounded from below with the number of vertices minus the size of the graph’s shortest index code. We show that the Network Coding solvability of each specific multiple unicast network is completely determined by the entropy (as well as by the shortest index code) of the directed graph that occur by identifying each source node with each corresponding target node. Shannon’s information inequalities can be used to calculate up- per bounds on a graph’s entropy as well as calculating the size of the minimal index code. Recently, a number of new families of so-called non-shannon-type information inequalities have been discovered. It has been shown that there exist communication networks with a ca- pacity strictly ess than required for solvability, but where this fact cannot be derived using Shannon’s classical information inequalities. Based on this result we show that there exist graphs with an entropy that cannot be calculated using only Shannon’s classical information inequalities, and show that better estimate can be obtained by use of certain non-shannon-type information inequalities

    Homotopy Type of the Boolean Complex of a Coxeter System

    In any Coxeter group, the set of elements whose principal order ideals are boolean forms a simplicial poset under the Bruhat order. This simplicial poset defines a cell complex, called the boolean complex. In this paper it is shown that, for any Coxeter system of rank n, the boolean complex is homotopy equivalent to a wedge of (n-1)-dimensional spheres. The number of such spheres can be computed recursively from the unlabeled Coxeter graph, and defines a new graph invariant called the boolean number. Specific calculations of the boolean number are given for all finite and affine irreducible Coxeter systems, as well as for systems with graphs that are disconnected, complete, or stars. One implication of these results is that the boolean complex is contractible if and only if a generator of the Coxeter system is in the center of the group. of these results is that the boolean complex is contractible if and only if a generator of the Coxeter system is in the center of the group.Comment: final version, to appear in Advances in Mathematic

    A Transition-Based Directed Acyclic Graph Parser for UCCA

    We present the first parser for UCCA, a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. To our knowledge, the conjunction of these formal properties is not supported by any existing parser. Our transition-based parser, which uses a novel transition set and features based on bidirectional LSTMs, has value not just for UCCA parsing: its ability to handle more general graph structures can inform the development of parsers for other semantic DAG structures, and in languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201