1,037 research outputs found
Polygraph: Automatically generating signatures for polymorphic worms
It is widely believed that content-signature-based intrusion detection systems (IDSes) are easily evaded by polymorphic worms, which vary their payload on every infection attempt. In this paper, we present Polygraph, a signature generation system that successfully produces signatures that match polymorphic worms. Polygraph generates signatures that consist of multiple disjoint content sub-strings. In doing so, Polygraph leverages our insight that for a real-world exploit to function properly, multiple invariant substrings must often be present in all variants of a payload; these substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code. We contribute a definition of the polymorphic signature generation problem; propose classes of signature suited for matching polymorphic worm payloads; and present algorithms for automatic generation of signatures in these classes. Our evaluation of these algorithms on a range of polymorphic worms demonstrates that Polygraph produces signatures for polymorphic worms that exhibit low false negatives and false positives. Ā© 2005 IEEE
Knowledge Discovery in Documents by Extracting Frequent Word Sequences
published or submitted for publicatio
VirtualHome: Simulating Household Activities via Programs
In this paper, we are interested in modeling complex activities that occur in
a typical household. We propose to use programs, i.e., sequences of atomic
actions and interactions, as a high level representation of complex tasks.
Programs are interesting because they provide a non-ambiguous representation of
a task, and allow agents to execute them. However, nowadays, there is no
database providing this type of information. Towards this goal, we first
crowd-source programs for a variety of activities that happen in people's
homes, via a game-like interface used for teaching kids how to code. Using the
collected dataset, we show how we can learn to extract programs directly from
natural language descriptions or from videos. We then implement the most common
atomic (inter)actions in the Unity3D game engine, and use our programs to
"drive" an artificial agent to execute tasks in a simulated household
environment. Our VirtualHome simulator allows us to create a large activity
video dataset with rich ground-truth, enabling training and testing of video
understanding models. We further showcase examples of our agent performing
tasks in our VirtualHome based on language descriptions.Comment: CVPR 2018 (Oral
Long-Term Visual Object Tracking Benchmark
We propose a new long video dataset (called Track Long and Prosper - TLP) and
benchmark for single object tracking. The dataset consists of 50 HD videos from
real world scenarios, encompassing a duration of over 400 minutes (676K
frames), making it more than 20 folds larger in average duration per sequence
and more than 8 folds larger in terms of total covered duration, as compared to
existing generic datasets for visual tracking. The proposed dataset paves a way
to suitably assess long term tracking performance and train better deep
learning architectures (avoiding/reducing augmentation, which may not reflect
real world behaviour). We benchmark the dataset on 17 state of the art trackers
and rank them according to tracking accuracy and run time speeds. We further
present thorough qualitative and quantitative evaluation highlighting the
importance of long term aspect of tracking. Our most interesting observations
are (a) existing short sequence benchmarks fail to bring out the inherent
differences in tracking algorithms which widen up while tracking on long
sequences and (b) the accuracy of trackers abruptly drops on challenging long
sequences, suggesting the potential need of research efforts in the direction
of long-term tracking.Comment: ACCV 2018 (Oral
Multivariate Fine-Grained Complexity of Longest Common Subsequence
We revisit the classic combinatorial pattern matching problem of finding a
longest common subsequence (LCS). For strings and of length , a
textbook algorithm solves LCS in time , but although much effort has
been spent, no -time algorithm is known. Recent work
indeed shows that such an algorithm would refute the Strong Exponential Time
Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann,
K\"unnemann FOCS'15].
Despite the quadratic-time barrier, for over 40 years an enduring scientific
interest continued to produce fast algorithms for LCS and its variations.
Particular attention was put into identifying and exploiting input parameters
that yield strongly subquadratic time algorithms for special cases of interest,
e.g., differential file comparison. This line of research was successfully
pursued until 1990, at which time significant improvements came to a halt. In
this paper, using the lens of fine-grained complexity, our goal is to (1)
justify the lack of further improvements and (2) determine whether some special
cases of LCS admit faster algorithms than currently known.
To this end, we provide a systematic study of the multivariate complexity of
LCS, taking into account all parameters previously discussed in the literature:
the input size , the length of the shorter string
, the length of an LCS of and , the numbers of
deletions and , the alphabet size, as well as
the numbers of matching pairs and dominant pairs . For any class of
instances defined by fixing each parameter individually to a polynomial in
terms of the input size, we prove a SETH-based lower bound matching one of
three known algorithms. Specifically, we determine the optimal running time for
LCS under SETH as .
[...]Comment: Presented at SODA'18. Full Version. 66 page
Growth rates for subclasses of Av(321)
Pattern classes which avoid 321 and other patterns are shown to have the same growth rates as similar (but strictly larger) classes obtained by adding articulation points to any or all of the other patterns. The method of proof is to show that the elements of the latter classes can be represented as bounded merges of elements of the original class, and that the bounded merge construction does not change growth rates
JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition
This paper proposes a novel algorithm to reassemble an arbitrarily shredded
image to its original status. Existing reassembly pipelines commonly consist of
a local matching stage and a global compositions stage. In the local stage, a
key challenge in fragment reassembly is to reliably compute and identify
correct pairwise matching, for which most existing algorithms use handcrafted
features, and hence, cannot reliably handle complicated puzzles. We build a
deep convolutional neural network to detect the compatibility of a pairwise
stitching, and use it to prune computed pairwise matches. To improve the
network efficiency and accuracy, we transfer the calculation of CNN to the
stitching region and apply a boost training strategy. In the global composition
stage, we modify the commonly adopted greedy edge selection strategies to two
new loop closure based searching algorithms. Extensive experiments show that
our algorithm significantly outperforms existing methods on solving various
puzzles, especially those challenging ones with many fragment pieces
Shape Optimization Problems for Metric Graphs
We consider the shape optimization problem where is the one-dimensional Hausdorff measure and is an
admissible class of one-dimensional sets connecting some prescribed set of
points . The cost
functional is the Dirichlet energy of defined
through the Sobolev functions on vanishing on the points . We
analyze the existence of a solution in both the families of connected sets and
of metric graphs. At the end, several explicit examples are discussed.Comment: 23 pages, 11 figures, ESAIM Control Optim. Calc. Var., (to appear
Comparing Java Programs: Syntactic and Contextual Semantic Differences
This thesis describes the foundation for developing a tool that compares Java programs, or different versions of a program. The tool captures syntactic differences and contextual semantic differences as well. Syntactic differences are āordinaryā changes in the code. This tool works much in the same way as the Unix tool diff, but it is much smarter than diff. This is because it exploits the fact that programs are built differently than ordinary text. The tool diffās purpose is to compare text, and it will therefore give imprecise or too verbose results. The tool described in this thesis can identify contextual semantic differences because it knows the contexts of methods, meaning that it knows whether methods are directly declared in the class, inherited from implemented interfaces or if methods override the classā parentās method.
The approach in this thesis for comparing Java programs is to transform the programs into abstract syntax trees. The transformation from source code to abstract syntax trees are done with the help Strafunski. Strafunski is a software bundle that supports generic programming. The implementation of the tool is done in Haskell. Haskell is a functional programming language.
The work of comparing abstract syntax trees can be broken down into the problem of finding the largest common subtree of two abstract syntax trees and further more, the problem of finding the longest common subsequence of two sequences. This thesis describes and presents new algorithms for doing this and it also describe working Haskell code of the implementation of the tool
On space efficiency of algorithms working on structural decompositions of graphs
Dynamic programming on path and tree decompositions of graphs is a technique
that is ubiquitous in the field of parameterized and exponential-time
algorithms. However, one of its drawbacks is that the space usage is
exponential in the decomposition's width. Following the work of Allender et al.
[Theory of Computing, '14], we investigate whether this space complexity
explosion is unavoidable. Using the idea of reparameterization of Cai and
Juedes [J. Comput. Syst. Sci., '03], we prove that the question is closely
related to a conjecture that the Longest Common Subsequence problem
parameterized by the number of input strings does not admit an algorithm that
simultaneously uses XP time and FPT space. Moreover, we complete the complexity
landscape sketched for pathwidth and treewidth by Allender et al. by
considering the parameter tree-depth. We prove that computations on tree-depth
decompositions correspond to a model of non-deterministic machines that work in
polynomial time and logarithmic space, with access to an auxiliary stack of
maximum height equal to the decomposition's depth. Together with the results of
Allender et al., this describes a hierarchy of complexity classes for
polynomial-time non-deterministic machines with different restrictions on the
access to working space, which mirrors the classic relations between treewidth,
pathwidth, and tree-depth.Comment: An extended abstract appeared in the proceedings of STACS'16. The new
version is augmented with a space-efficient algorithm for Dominating Set
using the Chinese remainder theore
- ā¦