12,272 research outputs found
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Testing Linear-Invariant Non-Linear Properties
We consider the task of testing properties of Boolean functions that are
invariant under linear transformations of the Boolean cube. Previous work in
property testing, including the linearity test and the test for Reed-Muller
codes, has mostly focused on such tasks for linear properties. The one
exception is a test due to Green for "triangle freeness": a function
f:\cube^{n}\to\cube satisfies this property if do not all
equal 1, for any pair x,y\in\cube^{n}.
Here we extend this test to a more systematic study of testing for
linear-invariant non-linear properties. We consider properties that are
described by a single forbidden pattern (and its linear transformations), i.e.,
a property is given by points v_{1},...,v_{k}\in\cube^{k} and
f:\cube^{n}\to\cube satisfies the property that if for all linear maps
L:\cube^{k}\to\cube^{n} it is the case that do
not all equal 1. We show that this property is testable if the underlying
matroid specified by is a graphic matroid. This extends
Green's result to an infinite class of new properties.
Our techniques extend those of Green and in particular we establish a link
between the notion of "1-complexity linear systems" of Green and Tao, and
graphic matroids, to derive the results.Comment: This is the full version; conference version appeared in the
proceedings of STACS 200
Gradual Certified Programming in Coq
Expressive static typing disciplines are a powerful way to achieve
high-quality software. However, the adoption cost of such techniques should not
be under-estimated. Just like gradual typing allows for a smooth transition
from dynamically-typed to statically-typed programs, it seems desirable to
support a gradual path to certified programming. We explore gradual certified
programming in Coq, providing the possibility to postpone the proofs of
selected properties, and to check "at runtime" whether the properties actually
hold. Casts can be integrated with the implicit coercion mechanism of Coq to
support implicit cast insertion a la gradual typing. Additionally, when
extracting Coq functions to mainstream languages, our encoding of casts
supports lifting assumed properties into runtime checks. Much to our surprise,
it is not necessary to extend Coq in any way to support gradual certified
programming. A simple mix of type classes and axioms makes it possible to bring
gradual certified programming to Coq in a straightforward manner.Comment: DLS'15 final version, Proceedings of the ACM Dynamic Languages
Symposium (DLS 2015
Learning pseudo-Boolean k-DNF and Submodular Functions
We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be
represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a
natural generalization of DNF representation for functions with integer range.
Each term in such a formula has an associated integral constant. We show that
an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all
constants associated with the terms of the formula are bounded.
This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to
pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership
queries under the uniform distribution for submodular functions of the form
f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k
\log k / \epsilon)}, 1/\epsilon and log(1/\delta) and works even in the
agnostic setting. The line of previous work on learning submodular functions
[Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi,
Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for
learning submodular functions in this setting, for fixed epsilon and delta.
Our learning algorithm implies a property tester for submodularity of
functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for
k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter \epsilon
Semantic Component Composition
Building complex software systems necessitates the use of component-based
architectures. In theory, of the set of components needed for a design, only
some small portion of them are "custom"; the rest are reused or refactored
existing pieces of software. Unfortunately, this is an idealized situation.
Just because two components should work together does not mean that they will
work together.
The "glue" that holds components together is not just technology. The
contracts that bind complex systems together implicitly define more than their
explicit type. These "conceptual contracts" describe essential aspects of
extra-system semantics: e.g., object models, type systems, data representation,
interface action semantics, legal and contractual obligations, and more.
Designers and developers spend inordinate amounts of time technologically
duct-taping systems to fulfill these conceptual contracts because system-wide
semantics have not been rigorously characterized or codified. This paper
describes a formal characterization of the problem and discusses an initial
implementation of the resulting theoretical system.Comment: 9 pages, submitted to GCSE/SAIG '0
On active and passive testing
Given a property of Boolean functions, what is the minimum number of queries
required to determine with high probability if an input function satisfies this
property or is "far" from satisfying it? This is a fundamental question in
Property Testing, where traditionally the testing algorithm is allowed to pick
its queries among the entire set of inputs. Balcan, Blais, Blum and Yang have
recently suggested to restrict the tester to take its queries from a smaller
random subset of polynomial size of the inputs. This model is called active
testing, and in the extreme case when the size of the set we can query from is
exactly the number of queries performed it is known as passive testing.
We prove that passive or active testing of k-linear functions (that is, sums
of k variables among n over Z_2) requires Theta(k*log n) queries, assuming k is
not too large. This extends the case k=1, (that is, dictator functions),
analyzed by Balcan et. al.
We also consider other classes of functions including low degree polynomials,
juntas, and partially symmetric functions. Our methods combine algebraic,
combinatorial, and probabilistic techniques, including the Talagrand
concentration inequality and the Erdos--Rado theorem on Delta-systems.Comment: 16 page
Hard Properties with (Very) Short PCPPs and Their Applications
We show that there exist properties that are maximally hard for testing, while still admitting PCPPs with a proof size very close to linear. Specifically, for every fixed ?, we construct a property P^(?)? {0,1}^n satisfying the following: Any testing algorithm for P^(?) requires ?(n) many queries, and yet P^(?) has a constant query PCPP whose proof size is O(n?log^(?)n), where log^(?) denotes the ? times iterated log function (e.g., log^(2)n = log log n). The best previously known upper bound on the PCPP proof size for a maximally hard to test property was O(n?polylog(n)).
As an immediate application, we obtain stronger separations between the standard testing model and both the tolerant testing model and the erasure-resilient testing model: for every fixed ?, we construct a property that has a constant-query tester, but requires ?(n/log^(?)(n)) queries for every tolerant or erasure-resilient tester
- …