36,125 research outputs found
Highly Scalable Algorithms for Robust String Barcoding
String barcoding is a recently introduced technique for genomic-based
identification of microorganisms. In this paper we describe the engineering of
highly scalable algorithms for robust string barcoding. Our methods enable
distinguisher selection based on whole genomic sequences of hundreds of
microorganisms of up to bacterial size on a well-equipped workstation, and can
be easily parallelized to further extend the applicability range to thousands
of bacterial size genomes. Experimental results on both randomly generated and
NCBI genomic data show that whole-genome based selection results in a number of
distinguishers nearly matching the information theoretic lower bounds for the
problem
Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) - The Method
The sparsity of natural signals and images in a transform domain or
dictionary has been extensively exploited in several applications such as
compression, denoising and inverse problems. More recently, data-driven
adaptation of synthesis dictionaries has shown promise in many applications
compared to fixed or analytical dictionary models. However, dictionary learning
problems are typically non-convex and NP-hard, and the usual alternating
minimization approaches for these problems are often computationally expensive,
with the computations dominated by the NP-hard synthesis sparse coding step. In
this work, we investigate an efficient method for "norm"-based
dictionary learning by first approximating the training data set with a sum of
sparse rank-one matrices and then using a block coordinate descent approach to
estimate the unknowns. The proposed block coordinate descent algorithm involves
efficient closed-form solutions. In particular, the sparse coding step involves
a simple form of thresholding. We provide a convergence analysis for the
proposed block coordinate descent approach. Our numerical experiments show the
promising performance and significant speed-ups provided by our method over the
classical K-SVD scheme in sparse signal representation and image denoising.Comment: This work is cited by the IEEE Transactions on Computational Imaging
Paper arXiv:1511.06333 (DOI: 10.1109/TCI.2017.2697206
On the Use of Cellular Automata in Symmetric Cryptography
In this work, pseudorandom sequence generators based on finite fields have
been analyzed from the point of view of their cryptographic application. In
fact, a class of nonlinear sequence generators has been modelled in terms of
linear cellular automata. The algorithm that converts the given generator into
a linear model based on automata is very simple and is based on the
concatenation of a basic structure. Once the generator has been linearized, a
cryptanalytic attack that exploits the weaknesses of such a model has been
developed. Linear cellular structures easily model sequence generators with
application in stream cipher cryptography.Comment: 25 pages, 0 figure
Partitioning Patches into Test-equivalence Classes for Scaling Program Repair
Automated program repair is a problem of finding a transformation (called a
patch) of a given incorrect program that eliminates the observable failures. It
has important applications such as providing debugging aids, automatically
grading assignments and patching security vulnerabilities. A common challenge
faced by all existing repair techniques is scalability to large patch spaces,
since there are many candidate patches that these techniques explicitly or
implicitly consider.
The correctness criterion for program repair is often given as a suite of
tests, since a formal specification of the intended program behavior may not be
available. Current repair techniques do not scale due to the large number of
test executions performed by the underlying search algorithms. We address this
problem by introducing a methodology of patch generation based on a
test-equivalence relation (if two programs are "test-equivalent" for a given
test, they produce indistinguishable results on this test). We propose two
test-equivalence relations based on runtime values and dependencies
respectively and present an algorithm that performs on-the-fly partitioning of
patches into test-equivalence classes.
Our experiments on real-world programs reveal that the proposed methodology
drastically reduces the number of test executions and therefore provides an
order of magnitude efficiency improvement over existing repair techniques,
without sacrificing patch quality
An Algorithm for Constructing a Smallest Register with Non-Linear Update Generating a Given Binary Sequence
Registers with Non-Linear Update (RNLUs) are a generalization of Non-Linear
Feedback Shift Registers (NLFSRs) in which both, feedback and feedforward,
connections are allowed and no chain connection between the stages is required.
In this paper, a new algorithm for constructing RNLUs generating a given binary
sequence is presented. Expected size of RNLUs constructed by the presented
algorithm is proved to be O(n/log(n/p)), where n is the sequence length and p
is the degree of parallelization. This is asymptotically smaller than the
expected size of RNLUs constructed by previous algorithms and the expected size
of LFSRs and NLFSRs generating the same sequence. The presented algorithm can
potentially be useful for many applications, including testing, wireless
communications, and cryptography
Unsynthesizable Cores - Minimal Explanations for Unsynthesizable High-Level Robot Behaviors
With the increasing ubiquity of multi-capable, general-purpose robots arises
the need for enabling non-expert users to command these robots to perform
complex high-level tasks. To this end, high-level robot control has seen the
application of formal methods to automatically synthesize
correct-by-construction controllers from user-defined specifications; synthesis
fails if and only if there exists no controller that achieves the specified
behavior. Recent work has also addressed the challenge of providing
easy-to-understand feedback to users when a specification fails to yield a
corresponding controller. Existing techniques provide feedback on portions of
the specification that cause the failure, but do so at a coarse granularity.
This work presents techniques for refining this feedback, extracting minimal
explanations of unsynthesizability
Systematic Testing of Multicast Routing Protocols: Analysis of Forward and Backward Search Techniques
In this paper, we present a new methodology for developing systematic and
automatic test generation algorithms for multipoint protocols. These algorithms
attempt to synthesize network topologies and sequences of events that stress
the protocol's correctness or performance. This problem can be viewed as a
domain-specific search problem that suffers from the state space explosion
problem. One goal of this work is to circumvent the state space explosion
problem utilizing knowledge of network and fault modeling, and multipoint
protocols. The two approaches investigated in this study are based on forward
and backward search techniques. We use an extended finite state machine (FSM)
model of the protocol. The first algorithm uses forward search to perform
reduced reachability analysis. Using domain-specific information for multicast
routing over LANs, the algorithm complexity is reduced from exponential to
polynomial in the number of routers. This approach, however, does not fully
automate topology synthesis. The second algorithm, the fault-oriented test
generation, uses backward search for topology synthesis and uses backtracking
to generate event sequences instead of searching forward from initial states.
Using these algorithms, we have conducted studies for correctness of the
multicast routing protocol PIM. We propose to extend these algorithms to study
end-to-end multipoint protocols using a virtual LAN that represents delays of
the underlying multicast distribution tree.Comment: 26 pages, 20 figure
Global Linear Complexity Analysis of Filter Keystream Generators
An efficient algorithm for computing lower bounds on the global linear
complexity of nonlinearly filtered PN-sequences is presented. The technique
here developed is based exclusively on the realization of bit wise logic
operations, which makes it appropriate for both software simulation and
hardware implementation. The present algorithm can be applied to any arbitrary
nonlinear function with a unique term of maximum order. Thus, the extent of its
application for different types of filter generators is quite broad.
Furthermore, emphasis is on the large lower bounds obtained that confirm the
exponential growth of the global linear complexity for the class of nonlinearly
filtered sequences
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
- …