12,272 research outputs found

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Testing Linear-Invariant Non-Linear Properties

    Get PDF
    We consider the task of testing properties of Boolean functions that are invariant under linear transformations of the Boolean cube. Previous work in property testing, including the linearity test and the test for Reed-Muller codes, has mostly focused on such tasks for linear properties. The one exception is a test due to Green for "triangle freeness": a function f:\cube^{n}\to\cube satisfies this property if f(x),f(y),f(x+y)f(x),f(y),f(x+y) do not all equal 1, for any pair x,y\in\cube^{n}. Here we extend this test to a more systematic study of testing for linear-invariant non-linear properties. We consider properties that are described by a single forbidden pattern (and its linear transformations), i.e., a property is given by kk points v_{1},...,v_{k}\in\cube^{k} and f:\cube^{n}\to\cube satisfies the property that if for all linear maps L:\cube^{k}\to\cube^{n} it is the case that f(L(v1)),...,f(L(vk))f(L(v_{1})),...,f(L(v_{k})) do not all equal 1. We show that this property is testable if the underlying matroid specified by v1,...,vkv_{1},...,v_{k} is a graphic matroid. This extends Green's result to an infinite class of new properties. Our techniques extend those of Green and in particular we establish a link between the notion of "1-complexity linear systems" of Green and Tao, and graphic matroids, to derive the results.Comment: This is the full version; conference version appeared in the proceedings of STACS 200

    Gradual Certified Programming in Coq

    Full text link
    Expressive static typing disciplines are a powerful way to achieve high-quality software. However, the adoption cost of such techniques should not be under-estimated. Just like gradual typing allows for a smooth transition from dynamically-typed to statically-typed programs, it seems desirable to support a gradual path to certified programming. We explore gradual certified programming in Coq, providing the possibility to postpone the proofs of selected properties, and to check "at runtime" whether the properties actually hold. Casts can be integrated with the implicit coercion mechanism of Coq to support implicit cast insertion a la gradual typing. Additionally, when extracting Coq functions to mainstream languages, our encoding of casts supports lifting assumed properties into runtime checks. Much to our surprise, it is not necessary to extend Coq in any way to support gradual certified programming. A simple mix of type classes and axioms makes it possible to bring gradual certified programming to Coq in a straightforward manner.Comment: DLS'15 final version, Proceedings of the ACM Dynamic Languages Symposium (DLS 2015

    Learning pseudo-Boolean k-DNF and Submodular Functions

    Full text link
    We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all constants associated with the terms of the formula are bounded. This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership queries under the uniform distribution for submodular functions of the form f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k \log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the agnostic setting. The line of previous work on learning submodular functions [Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi, Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for learning submodular functions in this setting, for fixed epsilon and delta. Our learning algorithm implies a property tester for submodularity of functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon

    Semantic Component Composition

    Full text link
    Building complex software systems necessitates the use of component-based architectures. In theory, of the set of components needed for a design, only some small portion of them are "custom"; the rest are reused or refactored existing pieces of software. Unfortunately, this is an idealized situation. Just because two components should work together does not mean that they will work together. The "glue" that holds components together is not just technology. The contracts that bind complex systems together implicitly define more than their explicit type. These "conceptual contracts" describe essential aspects of extra-system semantics: e.g., object models, type systems, data representation, interface action semantics, legal and contractual obligations, and more. Designers and developers spend inordinate amounts of time technologically duct-taping systems to fulfill these conceptual contracts because system-wide semantics have not been rigorously characterized or codified. This paper describes a formal characterization of the problem and discusses an initial implementation of the resulting theoretical system.Comment: 9 pages, submitted to GCSE/SAIG '0

    On active and passive testing

    Full text link
    Given a property of Boolean functions, what is the minimum number of queries required to determine with high probability if an input function satisfies this property or is "far" from satisfying it? This is a fundamental question in Property Testing, where traditionally the testing algorithm is allowed to pick its queries among the entire set of inputs. Balcan, Blais, Blum and Yang have recently suggested to restrict the tester to take its queries from a smaller random subset of polynomial size of the inputs. This model is called active testing, and in the extreme case when the size of the set we can query from is exactly the number of queries performed it is known as passive testing. We prove that passive or active testing of k-linear functions (that is, sums of k variables among n over Z_2) requires Theta(k*log n) queries, assuming k is not too large. This extends the case k=1, (that is, dictator functions), analyzed by Balcan et. al. We also consider other classes of functions including low degree polynomials, juntas, and partially symmetric functions. Our methods combine algebraic, combinatorial, and probabilistic techniques, including the Talagrand concentration inequality and the Erdos--Rado theorem on Delta-systems.Comment: 16 page

    Hard Properties with (Very) Short PCPPs and Their Applications

    Get PDF
    We show that there exist properties that are maximally hard for testing, while still admitting PCPPs with a proof size very close to linear. Specifically, for every fixed ?, we construct a property P^(?)? {0,1}^n satisfying the following: Any testing algorithm for P^(?) requires ?(n) many queries, and yet P^(?) has a constant query PCPP whose proof size is O(n?log^(?)n), where log^(?) denotes the ? times iterated log function (e.g., log^(2)n = log log n). The best previously known upper bound on the PCPP proof size for a maximally hard to test property was O(n?polylog(n)). As an immediate application, we obtain stronger separations between the standard testing model and both the tolerant testing model and the erasure-resilient testing model: for every fixed ?, we construct a property that has a constant-query tester, but requires ?(n/log^(?)(n)) queries for every tolerant or erasure-resilient tester
    corecore