855 research outputs found
Recommended from our members
Optimizing the length of checking sequences
A checking sequence, generated from a finite state machine, is a test sequence that is guaranteed to lead to a failure if the system under test is faulty and has no more states than the specification. The problem of generating a checking sequence for a finite state machine M is simplified if M has a distinguishing sequence: an input sequence D~ with the property that the output sequence produced by M in response to D is different for the different states of M. Previous work has shown that, where a distinguishing sequence is known, an efficient checking sequence can be produced from the elements of a set A of sequences that verify the distinguishing sequence used and the elements of a set /spl gamma/ of subsequences that test the individual transitions by following each transition t by the distinguishing sequence that verifies the final state of t. In this previous work, A is a predefined set and /spl gamma/ is defined in terms of A. The checking sequence is produced by connecting the elements of /spl gamma/ and A to form a single sequence, using a predefined acyclic set E/sub c/ of transitions. An optimization algorithm is used in order to produce the shortest such checking sequence that can be generated on the basis of the given A and E/sub c/. However, this previous work did not state how the sets A and E/sub c/ should be chosen. This paper investigates the problem of finding appropriate A and E/sub c/ to be used in checking sequence generation. We show how a set A may be chosen so that it minimizes the sum of the lengths of the sequences to be combined. Further, we show that the optimization step, in the checking sequence generation algorithm, may be adapted so that it generates the optimal E/sub c/. Experiments are used to evaluate the proposed method
A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
From the output produced by a memoryless deletion channel from a uniformly
random input of known length , one obtains a posterior distribution on the
channel input. The difference between the Shannon entropy of this distribution
and that of the uniform prior measures the amount of information about the
channel input which is conveyed by the output of length , and it is natural
to ask for which outputs this is extremized. This question was posed in a
previous work, where it was conjectured on the basis of experimental data that
the entropy of the posterior is minimized and maximized by the constant strings
and and the alternating strings
and respectively. In the present
work we confirm the minimization conjecture in the asymptotic limit using
results from hidden word statistics. We show how the analytic-combinatorial
methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden
pattern matching problem can be applied to resolve the case of fixed output
length and , by obtaining estimates for the entropy in
terms of the moments of the posterior distribution and establishing its
minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure
Partition into heapable sequences, heap tableaux and a multiset extension of Hammersley's process
We investigate partitioning of integer sequences into heapable subsequences
(previously defined and established by Mitzenmacher et al). We show that an
extension of patience sorting computes the decomposition into a minimal number
of heapable subsequences (MHS). We connect this parameter to an interactive
particle system, a multiset extension of Hammersley's process, and investigate
its expected value on a random permutation. In contrast with the (well studied)
case of the longest increasing subsequence, we bring experimental evidence that
the correct asymptotic scaling is . Finally
we give a heap-based extension of Young tableaux, prove a hook inequality and
an extension of the Robinson-Schensted correspondence
Artificial Sequences and Complexity Measures
In this paper we exploit concepts of information theory to address the
fundamental problem of identifying and defining the most suitable tools to
extract, in a automatic and agnostic way, information from a generic string of
characters. We introduce in particular a class of methods which use in a
crucial way data compression techniques in order to define a measure of
remoteness and distance between pairs of sequences of characters (e.g. texts)
based on their relative information content. We also discuss in detail how
specific features of data compression techniques could be used to introduce the
notion of dictionary of a given sequence and of Artificial Text and we show how
these new tools can be used for information extraction purposes. We point out
the versatility and generality of our method that applies to any kind of
corpora of character strings independently of the type of coding behind them.
We consider as a case study linguistic motivated problems and we present
results for automatic language recognition, authorship attribution and self
consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression
approach to Information Extraction and Classification" by A. Baronchelli and
V. Loreto. 15 pages; 5 figure
Near-Linear Time Insertion-Deletion Codes and (1+)-Approximating Edit Distance via Indexing
We introduce fast-decodable indexing schemes for edit distance which can be
used to speed up edit distance computations to near-linear time if one of the
strings is indexed by an indexing string . In particular, for every length
and every , one can in near linear time construct a string
with , such that, indexing
any string , symbol-by-symbol, with results in a string where for which edit
distance computations are easy, i.e., one can compute a
-approximation of the edit distance between and any other
string in time.
Our indexing schemes can be used to improve the decoding complexity of
state-of-the-art error correcting codes for insertions and deletions. In
particular, they lead to near-linear time decoding algorithms for the
insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding
algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi,
Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in
the construction of fast-decodable indexing schemes
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
- ā¦