1,663 research outputs found

    Causal Discovery for Relational Domains: Representation, Reasoning, and Learning

    Get PDF
    Many domains are currently experiencing the growing trend to record and analyze massive, observational data sets with increasing complexity. A commonly made claim is that these data sets hold potential to transform their corresponding domains by providing previously unknown or unexpected explanations and enabling informed decision-making. However, only knowledge of the underlying causal generative process, as opposed to knowledge of associational patterns, can support such tasks. Most methods for traditional causal discovery—the development of algorithms that learn causal structure from observational data—are restricted to representations that require limiting assumptions on the form of the data. Causal discovery has almost exclusively been applied to directed graphical models of propositional data that assume a single type of entity with independence among instances. However, most real-world domains are characterized by systems that involve complex interactions among multiple types of entities. Many state-of-the-art methods in statistics and machine learning that address such complex systems focus on learning associational models, and they are oftentimes mistakenly interpreted as causal. The intersection between causal discovery and machine learning in complex systems is small. The primary objective of this thesis is to extend causal discovery to such complex systems. Specifically, I formalize a relational representation and model that can express the causal and probabilistic dependencies among the attributes of interacting, heterogeneous entities. I show that the traditional method for reasoning about statistical independence from model structure fails to accurately derive conditional independence facts from relational models. I introduce a new theory—relational d-separation—and a novel, lifted representation—the abstract ground graph—that supports a sound, complete, and computationally efficient method for algorithmically deriving conditional independencies from probabilistic models of relational data. The abstract ground graph representation also presents causal implications that enable the detection of causal direction for bivariate relational dependencies without parametric assumptions. I leverage these implications and the theoretical framework of relational d-separation to develop a sound and complete algorithm—the relational causal discovery (RCD) algorithm—that learns causal structure from relational data

    Computing the Grounded Semantics in all the Subgraphs of an Argumentation Framework: an Empirical Evaluation

    Get PDF
    Given an argumentation framework – with a finite set of arguments and the attack relation identifying the graph – we study how the grounded labelling of a generic argument a varies in all the subgraphs of . Since this is an intractable problem of above-polynomial complexity, we present two non-naïve algorithms to find the set of all the subgraphs where the grounded semantic assigns to argument a specific label . We report the results of a series of empirical tests over graphs of increasing complexity. The value of researching the above problem is two-fold. First, knowing how an argument behaves in all the subgraphs represents strategic information for arguing agents. Second, the algorithms can be applied to the computation of the recently introduced probabilistic argumentation frameworks

    A Finite-Model-Theoretic View on Propositional Proof Complexity

    Full text link
    We establish new, and surprisingly tight, connections between propositional proof complexity and finite model theory. Specifically, we show that the power of several propositional proof systems, such as Horn resolution, bounded-width resolution, and the polynomial calculus of bounded degree, can be characterised in a precise sense by variants of fixed-point logics that are of fundamental importance in descriptive complexity theory. Our main results are that Horn resolution has the same expressive power as least fixed-point logic, that bounded-width resolution captures existential least fixed-point logic, and that the polynomial calculus with bounded degree over the rationals solves precisely the problems definable in fixed-point logic with counting. By exploring these connections further, we establish finite-model-theoretic tools for proving lower bounds for the polynomial calculus over the rationals and over finite fields

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

    A general framework for enumerating equivalence classes of solutions

    Get PDF
    When a problem has more than one solution, it is often important, depending on the underlying context, to enumerate (i.e., to list) them all. Even when the enumeration can be done in polynomial delay, that is, spending no more than polynomial time to go from one solution to the next, this can be costly as the number of solutions themselves may be huge, including sometimes exponential. Furthermore, depending on the application, many of these solutions can be considered equivalent. The problem of an efficient enumeration of the equivalence classes or of one representative per class (without generating all the solutions), although identified as a need in many areas, has been addressed only for very few specific cases. In this paper, we provide a general framework that solves this problem in polynomial delay for a wide variety of contexts, including optimization ones that can be addressed by dynamic programming algorithms, and for certain types of equivalence relations between solutions

    Gene expression programming for logic circuit design

    Get PDF
    Finding an optimal solution for the logic circuit design problem is challenging and time-consuming especially for complex logic circuits. As the number of logic gates increases the task of designing optimal logic circuits extends beyond human capability. A number of evolutionary algorithms have been invented to tackle a range of optimisation problems, including logic circuit design. This dissertation explores two of these evolutionary algorithms i.e. Gene Expression Programming (GEP) and Multi Expression Programming (MEP) with the aim of integrating their strengths into a new Genetic Programming (GP) algorithm. GEP was invented by Candida Ferreira in 1999 and published in 2001 [8]. The GEP algorithm inherits the advantages of the Genetic Algorithm (GA) and GP, and it uses a simple encoding method to solve complex problems [6, 32]. While GEP emerged as powerful due to its simplicity in implementation and exibility in genetic operations, it is not without weaknesses. Some of these inherent weaknesses are discussed in [1, 6, 21]. Like GEP, MEP is a GP-variant that uses linear chromosomes of xed length [23]. A unique feature of MEP is its ability to store multiple solutions of a problem in a single chromosome. MEP also has an ability to implement code-reuse which is achieved through its representation which allow multiple references to a single sub-structure. This dissertation proposes a new GP algorithm, Improved Gene Expression Programming (IGEP) which im- proves the performance of the traditional GEP by combining the code-reuse capability and simplicity of gene encoding method from MEP and GEP, respectively. The results obtained using the IGEP and the traditional GEP show that the two algorithms are comparable in terms of the success rate when applied on simple problems such as basic logic functions. However, for complex problems such as one-bit Full Adder (FA) and AND-OR Arithmetic Logic Unit (ALU) the IGEP performs better than the traditional GEP due to the code-reuse in IGEPMathematical SciencesM. Sc. (Applied Mathematics

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Adaptive Constraint Solving for Information Flow Analysis

    Get PDF
    In program analysis, unknown properties for terms are typically represented symbolically as variables. Bound constraints on these variables can then specify multiple optimisation goals for computer programs and nd application in areas such as type theory, security, alias analysis and resource reasoning. Resolution of bound constraints is a problem steeped in graph theory; interdependencies between the variables is represented as a constraint graph. Additionally, constants are introduced into the system as concrete bounds over these variables and constants themselves are ordered over a lattice which is, once again, represented as a graph. Despite graph algorithms being central to bound constraint solving, most approaches to program optimisation that use bound constraint solving have treated their graph theoretic foundations as a black box. Little has been done to investigate the computational costs or design e cient graph algorithms for constraint resolution. Emerging examples of these lattices and bound constraint graphs, particularly from the domain of language-based security, are showing that these graphs and lattices are structurally diverse and could be arbitrarily large. Therefore, there is a pressing need to investigate the graph theoretic foundations of bound constraint solving. In this thesis, we investigate the computational costs of bound constraint solving from a graph theoretic perspective for Information Flow Analysis (IFA); IFA is a sub- eld of language-based security which veri es whether con dentiality and integrity of classified information is preserved as it is manipulated by a program. We present a novel framework based on graph decomposition for solving the (atomic) bound constraint problem for IFA. Our approach enables us to abstract away from connections between individual vertices to those between sets of vertices in both the constraint graph and an accompanying security lattice which defines ordering over constants. Thereby, we are able to achieve significant speedups compared to state-of-the-art graph algorithms applied to bound constraint solving. More importantly, our algorithms are highly adaptive in nature and seamlessly adapt to the structure of the constraint graph and the lattice. The computational costs of our approach is a function of the latent scope of decomposition in the constraint graph and the lattice; therefore, we enjoy the fastest runtime for every point in the structure-spectrum of these graphs and lattices. While the techniques in this dissertation are developed with IFA in mind, they can be extended to other application of the bound constraints problem, such as type inference and program analysis frameworks which use annotated type systems, where constants are ordered over a lattice

    A Tree Locality-Sensitive Hash for Secure Software Testing

    Get PDF
    Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together
    corecore