1,037 research outputs found

    Polygraph: Automatically generating signatures for polymorphic worms

    Get PDF
    It is widely believed that content-signature-based intrusion detection systems (IDSes) are easily evaded by polymorphic worms, which vary their payload on every infection attempt. In this paper, we present Polygraph, a signature generation system that successfully produces signatures that match polymorphic worms. Polygraph generates signatures that consist of multiple disjoint content sub-strings. In doing so, Polygraph leverages our insight that for a real-world exploit to function properly, multiple invariant substrings must often be present in all variants of a payload; these substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code. We contribute a definition of the polymorphic signature generation problem; propose classes of signature suited for matching polymorphic worm payloads; and present algorithms for automatic generation of signatures in these classes. Our evaluation of these algorithms on a range of polymorphic worms demonstrates that Polygraph produces signatures for polymorphic worms that exhibit low false negatives and false positives. Ā© 2005 IEEE

    Knowledge Discovery in Documents by Extracting Frequent Word Sequences

    Get PDF
    published or submitted for publicatio

    VirtualHome: Simulating Household Activities via Programs

    Full text link
    In this paper, we are interested in modeling complex activities that occur in a typical household. We propose to use programs, i.e., sequences of atomic actions and interactions, as a high level representation of complex tasks. Programs are interesting because they provide a non-ambiguous representation of a task, and allow agents to execute them. However, nowadays, there is no database providing this type of information. Towards this goal, we first crowd-source programs for a variety of activities that happen in people's homes, via a game-like interface used for teaching kids how to code. Using the collected dataset, we show how we can learn to extract programs directly from natural language descriptions or from videos. We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to "drive" an artificial agent to execute tasks in a simulated household environment. Our VirtualHome simulator allows us to create a large activity video dataset with rich ground-truth, enabling training and testing of video understanding models. We further showcase examples of our agent performing tasks in our VirtualHome based on language descriptions.Comment: CVPR 2018 (Oral

    Long-Term Visual Object Tracking Benchmark

    Full text link
    We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. The proposed dataset paves a way to suitably assess long term tracking performance and train better deep learning architectures (avoiding/reducing augmentation, which may not reflect real world behaviour). We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. We further present thorough qualitative and quantitative evaluation highlighting the importance of long term aspect of tracking. Our most interesting observations are (a) existing short sequence benchmarks fail to bring out the inherent differences in tracking algorithms which widen up while tracking on long sequences and (b) the accuracy of trackers abruptly drops on challenging long sequences, suggesting the potential need of research efforts in the direction of long-term tracking.Comment: ACCV 2018 (Oral

    Multivariate Fine-Grained Complexity of Longest Common Subsequence

    Full text link
    We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings xx and yy of length nn, a textbook algorithm solves LCS in time O(n2)O(n^2), but although much effort has been spent, no O(n2āˆ’Īµ)O(n^{2-\varepsilon})-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size n:=maxā”{āˆ£xāˆ£,āˆ£yāˆ£}n:=\max\{|x|,|y|\}, the length of the shorter string m:=minā”{āˆ£xāˆ£,āˆ£yāˆ£}m:=\min\{|x|,|y|\}, the length LL of an LCS of xx and yy, the numbers of deletions Ī“:=māˆ’L\delta := m-L and Ī”:=nāˆ’L\Delta := n-L, the alphabet size, as well as the numbers of matching pairs MM and dominant pairs dd. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as (n+minā”{d,Ī“Ī”,Ī“m})1Ā±o(1)(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}. [...]Comment: Presented at SODA'18. Full Version. 66 page

    Growth rates for subclasses of Av(321)

    Get PDF
    Pattern classes which avoid 321 and other patterns are shown to have the same growth rates as similar (but strictly larger) classes obtained by adding articulation points to any or all of the other patterns. The method of proof is to show that the elements of the latter classes can be represented as bounded merges of elements of the original class, and that the bounded merge construction does not change growth rates

    JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition

    Full text link
    This paper proposes a novel algorithm to reassemble an arbitrarily shredded image to its original status. Existing reassembly pipelines commonly consist of a local matching stage and a global compositions stage. In the local stage, a key challenge in fragment reassembly is to reliably compute and identify correct pairwise matching, for which most existing algorithms use handcrafted features, and hence, cannot reliably handle complicated puzzles. We build a deep convolutional neural network to detect the compatibility of a pairwise stitching, and use it to prune computed pairwise matches. To improve the network efficiency and accuracy, we transfer the calculation of CNN to the stitching region and apply a boost training strategy. In the global composition stage, we modify the commonly adopted greedy edge selection strategies to two new loop closure based searching algorithms. Extensive experiments show that our algorithm significantly outperforms existing methods on solving various puzzles, especially those challenging ones with many fragment pieces

    Shape Optimization Problems for Metric Graphs

    Get PDF
    We consider the shape optimization problem minā”{E(Ī“)Ā :Ā Ī“āˆˆA,Ā H1(Ī“)=lĀ },\min\big\{{\mathcal E}(\Gamma)\ :\ \Gamma\in{\mathcal A},\ {\mathcal H}^1(\Gamma)=l\ \big\}, where H1{\mathcal H}^1 is the one-dimensional Hausdorff measure and A{\mathcal A} is an admissible class of one-dimensional sets connecting some prescribed set of points D={D1,ā€¦,Dk}āŠ‚Rd{\mathcal D}=\{D_1,\dots,D_k\}\subset{\mathbb R}^d. The cost functional E(Ī“){\mathcal E}(\Gamma) is the Dirichlet energy of Ī“\Gamma defined through the Sobolev functions on Ī“\Gamma vanishing on the points DiD_i. We analyze the existence of a solution in both the families of connected sets and of metric graphs. At the end, several explicit examples are discussed.Comment: 23 pages, 11 figures, ESAIM Control Optim. Calc. Var., (to appear

    Comparing Java Programs: Syntactic and Contextual Semantic Differences

    Get PDF
    This thesis describes the foundation for developing a tool that compares Java programs, or different versions of a program. The tool captures syntactic differences and contextual semantic differences as well. Syntactic differences are ā€œordinaryā€ changes in the code. This tool works much in the same way as the Unix tool diff, but it is much smarter than diff. This is because it exploits the fact that programs are built differently than ordinary text. The tool diffā€™s purpose is to compare text, and it will therefore give imprecise or too verbose results. The tool described in this thesis can identify contextual semantic differences because it knows the contexts of methods, meaning that it knows whether methods are directly declared in the class, inherited from implemented interfaces or if methods override the classā€™ parentā€™s method. The approach in this thesis for comparing Java programs is to transform the programs into abstract syntax trees. The transformation from source code to abstract syntax trees are done with the help Strafunski. Strafunski is a software bundle that supports generic programming. The implementation of the tool is done in Haskell. Haskell is a functional programming language. The work of comparing abstract syntax trees can be broken down into the problem of finding the largest common subtree of two abstract syntax trees and further more, the problem of finding the longest common subsequence of two sequences. This thesis describes and presents new algorithms for doing this and it also describe working Haskell code of the implementation of the tool

    On space efficiency of algorithms working on structural decompositions of graphs

    Get PDF
    Dynamic programming on path and tree decompositions of graphs is a technique that is ubiquitous in the field of parameterized and exponential-time algorithms. However, one of its drawbacks is that the space usage is exponential in the decomposition's width. Following the work of Allender et al. [Theory of Computing, '14], we investigate whether this space complexity explosion is unavoidable. Using the idea of reparameterization of Cai and Juedes [J. Comput. Syst. Sci., '03], we prove that the question is closely related to a conjecture that the Longest Common Subsequence problem parameterized by the number of input strings does not admit an algorithm that simultaneously uses XP time and FPT space. Moreover, we complete the complexity landscape sketched for pathwidth and treewidth by Allender et al. by considering the parameter tree-depth. We prove that computations on tree-depth decompositions correspond to a model of non-deterministic machines that work in polynomial time and logarithmic space, with access to an auxiliary stack of maximum height equal to the decomposition's depth. Together with the results of Allender et al., this describes a hierarchy of complexity classes for polynomial-time non-deterministic machines with different restrictions on the access to working space, which mirrors the classic relations between treewidth, pathwidth, and tree-depth.Comment: An extended abstract appeared in the proceedings of STACS'16. The new version is augmented with a space-efficient algorithm for Dominating Set using the Chinese remainder theore
    • ā€¦
    corecore