1,663 research outputs found
Causal Discovery for Relational Domains: Representation, Reasoning, and Learning
Many domains are currently experiencing the growing trend to record and analyze massive, observational data sets with increasing complexity. A commonly made claim is that these data sets hold potential to transform their corresponding domains by providing previously unknown or unexpected explanations and enabling informed decision-making. However, only knowledge of the underlying causal generative process, as opposed to knowledge of associational patterns, can support such tasks.
Most methods for traditional causal discovery—the development of algorithms that learn causal structure from observational data—are restricted to representations that require limiting assumptions on the form of the data. Causal discovery has almost exclusively been applied to directed graphical models of propositional data that assume a single type of entity with independence among instances. However, most real-world domains are characterized by systems that involve complex interactions among multiple types of entities. Many state-of-the-art methods in statistics and machine learning that address such complex systems focus on learning associational models, and they are oftentimes mistakenly interpreted as causal. The intersection between causal discovery and machine learning in complex systems is small.
The primary objective of this thesis is to extend causal discovery to such complex systems. Specifically, I formalize a relational representation and model that can express the causal and probabilistic dependencies among the attributes of interacting, heterogeneous entities. I show that the traditional method for reasoning about statistical independence from model structure fails to accurately derive conditional independence facts from relational models. I introduce a new theory—relational d-separation—and a novel, lifted representation—the abstract ground graph—that supports a sound, complete, and computationally efficient method for algorithmically deriving conditional independencies from probabilistic models of relational data. The abstract ground graph representation also presents causal implications that enable the detection of causal direction for bivariate relational dependencies without parametric assumptions. I leverage these implications and the theoretical framework of relational d-separation to develop a sound and complete algorithm—the relational causal discovery (RCD) algorithm—that learns causal structure from relational data
Computing the Grounded Semantics in all the Subgraphs of an Argumentation Framework: an Empirical Evaluation
Given an argumentation framework – with a finite set of arguments and the attack relation identifying the graph – we study how the grounded labelling of a generic argument a varies in all the subgraphs of . Since this is an intractable problem of above-polynomial complexity, we present two non-naïve algorithms to find the set of all the subgraphs where the grounded semantic assigns to argument a specific label . We report the results of a series of empirical tests over graphs of increasing complexity. The value of researching the above problem is two-fold. First, knowing how an argument behaves in all the subgraphs represents strategic information for arguing agents. Second, the algorithms can be applied to the computation of the recently introduced probabilistic argumentation frameworks
A Finite-Model-Theoretic View on Propositional Proof Complexity
We establish new, and surprisingly tight, connections between propositional
proof complexity and finite model theory. Specifically, we show that the power
of several propositional proof systems, such as Horn resolution, bounded-width
resolution, and the polynomial calculus of bounded degree, can be characterised
in a precise sense by variants of fixed-point logics that are of fundamental
importance in descriptive complexity theory. Our main results are that Horn
resolution has the same expressive power as least fixed-point logic, that
bounded-width resolution captures existential least fixed-point logic, and that
the polynomial calculus with bounded degree over the rationals solves precisely
the problems definable in fixed-point logic with counting. By exploring these
connections further, we establish finite-model-theoretic tools for proving
lower bounds for the polynomial calculus over the rationals and over finite
fields
An Abstract Machine for Unification Grammars
This work describes the design and implementation of an abstract machine,
Amalia, for the linguistic formalism ALE, which is based on typed feature
structures. This formalism is one of the most widely accepted in computational
linguistics and has been used for designing grammars in various linguistic
theories, most notably HPSG. Amalia is composed of data structures and a set of
instructions, augmented by a compiler from the grammatical formalism to the
abstract instructions, and a (portable) interpreter of the abstract
instructions. The effect of each instruction is defined using a low-level
language that can be executed on ordinary hardware.
The advantages of the abstract machine approach are twofold. From a
theoretical point of view, the abstract machine gives a well-defined
operational semantics to the grammatical formalism. This ensures that grammars
specified using our system are endowed with well defined meaning. It enables,
for example, to formally verify the correctness of a compiler for HPSG, given
an independent definition. From a practical point of view, Amalia is the first
system that employs a direct compilation scheme for unification grammars that
are based on typed feature structures. The use of amalia results in a much
improved performance over existing systems.
In order to test the machine on a realistic application, we have developed a
small-scale, HPSG-based grammar for a fragment of the Hebrew language, using
Amalia as the development platform. This is the first application of HPSG to a
Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks,
pst-node, psfig, fullname and a macros fil
A general framework for enumerating equivalence classes of solutions
When a problem has more than one solution, it is often important, depending on the underlying context, to enumerate (i.e., to list) them all. Even when the enumeration can be done in polynomial delay, that is, spending no more than polynomial time to go from one solution to the next, this can be costly as the number of solutions themselves may be huge, including sometimes exponential. Furthermore, depending on the application, many of these solutions can be considered equivalent. The problem of an efficient enumeration of the equivalence classes or of one representative per class (without generating all the solutions), although identified as a need in many areas, has been addressed only for very few specific cases. In this paper, we provide a general framework that solves this problem in polynomial delay for a wide variety of contexts, including optimization ones that can be addressed by dynamic programming algorithms, and for certain types of equivalence relations between solutions
Gene expression programming for logic circuit design
Finding an optimal solution for the logic circuit design problem is challenging and time-consuming especially
for complex logic circuits. As the number of logic gates increases the task of designing optimal logic circuits
extends beyond human capability. A number of evolutionary algorithms have been invented to tackle a range
of optimisation problems, including logic circuit design. This dissertation explores two of these evolutionary
algorithms i.e. Gene Expression Programming (GEP) and Multi Expression Programming (MEP) with the
aim of integrating their strengths into a new Genetic Programming (GP) algorithm. GEP was invented by
Candida Ferreira in 1999 and published in 2001 [8]. The GEP algorithm inherits the advantages of the Genetic
Algorithm (GA) and GP, and it uses a simple encoding method to solve complex problems [6, 32]. While
GEP emerged as powerful due to its simplicity in implementation and
exibility in genetic operations, it is
not without weaknesses. Some of these inherent weaknesses are discussed in [1, 6, 21]. Like GEP, MEP is a
GP-variant that uses linear chromosomes of xed length [23]. A unique feature of MEP is its ability to store
multiple solutions of a problem in a single chromosome. MEP also has an ability to implement code-reuse which
is achieved through its representation which allow multiple references to a single sub-structure.
This dissertation proposes a new GP algorithm, Improved Gene Expression Programming (IGEP) which im-
proves the performance of the traditional GEP by combining the code-reuse capability and simplicity of gene encoding method from MEP and GEP, respectively. The results obtained using the IGEP and the traditional
GEP show that the two algorithms are comparable in terms of the success rate when applied on simple problems
such as basic logic functions. However, for complex problems such as one-bit Full Adder (FA) and AND-OR
Arithmetic Logic Unit (ALU) the IGEP performs better than the traditional GEP due to the code-reuse in IGEPMathematical SciencesM. Sc. (Applied Mathematics
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
Adaptive Constraint Solving for Information Flow Analysis
In program analysis, unknown properties for terms are typically represented symbolically as variables. Bound constraints on these variables can then specify multiple optimisation goals for computer programs and nd application in areas such as type theory, security,
alias analysis and resource reasoning. Resolution of bound constraints is a problem steeped in graph theory; interdependencies between the variables is represented as a constraint graph. Additionally, constants are introduced into the system as concrete
bounds over these variables and constants themselves are ordered over a lattice which is, once again, represented as a graph. Despite graph algorithms being central to bound constraint solving, most approaches to program optimisation that use bound constraint
solving have treated their graph theoretic foundations as a black box. Little has been done to investigate the computational costs or design e cient graph algorithms for constraint resolution. Emerging examples of these lattices and bound constraint graphs, particularly
from the domain of language-based security, are showing that these graphs and lattices are structurally diverse and could be arbitrarily large. Therefore, there is a pressing need to investigate the graph theoretic foundations of bound constraint solving.
In this thesis, we investigate the computational costs of bound constraint solving from a graph theoretic perspective for Information Flow Analysis (IFA); IFA is a sub- eld of language-based security which veri es whether con dentiality and integrity of classified information is preserved as it is manipulated by a program. We present a novel framework based on graph decomposition for solving the (atomic) bound constraint problem for IFA. Our approach enables us to abstract away from connections between individual vertices to those between sets of vertices in both the constraint graph and an accompanying security lattice which defines ordering over constants. Thereby, we are able to achieve significant speedups compared to state-of-the-art graph algorithms applied to bound constraint solving. More importantly, our algorithms are highly adaptive in nature and seamlessly adapt
to the structure of the constraint graph and the lattice. The computational costs of our approach is a function of the latent scope of decomposition in the constraint graph and the lattice; therefore, we enjoy the fastest runtime for every point in the structure-spectrum of these graphs and lattices. While the techniques in this dissertation are developed with IFA in mind, they can be extended to other application of the bound constraints problem, such as type inference and program analysis frameworks which use annotated type systems, where constants are ordered over a lattice
A Tree Locality-Sensitive Hash for Secure Software Testing
Bugs in software that make it through testing can cost tens of millions of dollars each year, and in some cases can even result in the loss of human life. In order to eliminate bugs, developers may use symbolic execution to search through possible program states looking for anomalous states. Most of the computational effort to search through these states is spent solving path constraints in order to determine the feasibility of entering each state. State merging can make this search more efficient by combining program states, allowing multiple execution paths to be analyzed at the same time. However, a merge with dissimilar path constraints dramatically increases the time necessary to solve the path constraint. Currently, there are no distance measures for path constraints, and pairwise comparison of program states is not scalable. A hashing method is presented that clusters constraints in such a way that similar constraints are placed in the same cluster without requiring pairwise comparisons between queries. When combined with other state-of-the-art state merging techniques, the hashing method allows the symbolic executor to execute more instructions per second and find more terminal execution states than the other techniques alone, without decreasing the high path coverage achieved by merging many states together
- …