115,627 research outputs found

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    Mining Malware Specifications through Static Reachability Analysis

    Get PDF
    International audienceAbstract. The number of malicious software (malware) is growing out of control. Syntactic signature based detection cannot cope with such growth and manual construction of malware signature databases needs to be replaced by computer learning based approaches. Currently, a single modern signature capturing the semantics of a malicious behavior can be used to replace an arbitrarily large number of old-fashioned syntactical signatures. However teaching computers to learn such behaviors is a challenge. Existing work relies on dynamic analysis to extract malicious behaviors, but such technique does not guarantee the coverage of all behaviors. To sidestep this limitation we show how to learn malware signatures using static reachability analysis. The idea is to model binary programs using pushdown systems (that can be used to model the stack operations occurring during the binary code execution), use reachability analysis to extract behaviors in the form of trees, and use subtrees that are common among the trees extracted from a training set of malware files as signatures. To detect malware we propose to use a tree automaton to compactly store malicious behavior trees and check if any of the subtrees extracted from the file under analysis is malicious. Experimental data shows that our approach can be used to learn signatures from a training set of malware files and use them to detect a test set of malware that is 5 times the size of the training set

    Keyword-Based Delegable Proofs of Storage

    Full text link
    Cloud users (clients) with limited storage capacity at their end can outsource bulk data to the cloud storage server. A client can later access her data by downloading the required data files. However, a large fraction of the data files the client outsources to the server is often archival in nature that the client uses for backup purposes and accesses less frequently. An untrusted server can thus delete some of these archival data files in order to save some space (and allocate the same to other clients) without being detected by the client (data owner). Proofs of storage enable the client to audit her data files uploaded to the server in order to ensure the integrity of those files. In this work, we introduce one type of (selective) proofs of storage that we call keyword-based delegable proofs of storage, where the client wants to audit all her data files containing a specific keyword (e.g., "important"). Moreover, it satisfies the notion of public verifiability where the client can delegate the auditing task to a third-party auditor who audits the set of files corresponding to the keyword on behalf of the client. We formally define the security of a keyword-based delegable proof-of-storage protocol. We construct such a protocol based on an existing proof-of-storage scheme and analyze the security of our protocol. We argue that the techniques we use can be applied atop any existing publicly verifiable proof-of-storage scheme for static data. Finally, we discuss the efficiency of our construction.Comment: A preliminary version of this work has been published in International Conference on Information Security Practice and Experience (ISPEC 2018

    Heterogeneous substitution systems revisited

    Full text link
    Matthes and Uustalu (TCS 327(1-2):155-174, 2004) presented a categorical description of substitution systems capable of capturing syntax involving binding which is independent of whether the syntax is made up from least or greatest fixed points. We extend this work in two directions: we continue the analysis by creating more categorical structure, in particular by organizing substitution systems into a category and studying its properties, and we develop the proofs of the results of the cited paper and our new ones in UniMath, a recent library of univalent mathematics formalized in the Coq theorem prover.Comment: 24 page
    corecore