115,627 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Mining Malware Specifications through Static Reachability Analysis
International audienceAbstract. The number of malicious software (malware) is growing out of control. Syntactic signature based detection cannot cope with such growth and manual construction of malware signature databases needs to be replaced by computer learning based approaches. Currently, a single modern signature capturing the semantics of a malicious behavior can be used to replace an arbitrarily large number of old-fashioned syntactical signatures. However teaching computers to learn such behaviors is a challenge. Existing work relies on dynamic analysis to extract malicious behaviors, but such technique does not guarantee the coverage of all behaviors. To sidestep this limitation we show how to learn malware signatures using static reachability analysis. The idea is to model binary programs using pushdown systems (that can be used to model the stack operations occurring during the binary code execution), use reachability analysis to extract behaviors in the form of trees, and use subtrees that are common among the trees extracted from a training set of malware files as signatures. To detect malware we propose to use a tree automaton to compactly store malicious behavior trees and check if any of the subtrees extracted from the file under analysis is malicious. Experimental data shows that our approach can be used to learn signatures from a training set of malware files and use them to detect a test set of malware that is 5 times the size of the training set
Keyword-Based Delegable Proofs of Storage
Cloud users (clients) with limited storage capacity at their end can
outsource bulk data to the cloud storage server. A client can later access her
data by downloading the required data files. However, a large fraction of the
data files the client outsources to the server is often archival in nature that
the client uses for backup purposes and accesses less frequently. An untrusted
server can thus delete some of these archival data files in order to save some
space (and allocate the same to other clients) without being detected by the
client (data owner). Proofs of storage enable the client to audit her data
files uploaded to the server in order to ensure the integrity of those files.
In this work, we introduce one type of (selective) proofs of storage that we
call keyword-based delegable proofs of storage, where the client wants to audit
all her data files containing a specific keyword (e.g., "important"). Moreover,
it satisfies the notion of public verifiability where the client can delegate
the auditing task to a third-party auditor who audits the set of files
corresponding to the keyword on behalf of the client. We formally define the
security of a keyword-based delegable proof-of-storage protocol. We construct
such a protocol based on an existing proof-of-storage scheme and analyze the
security of our protocol. We argue that the techniques we use can be applied
atop any existing publicly verifiable proof-of-storage scheme for static data.
Finally, we discuss the efficiency of our construction.Comment: A preliminary version of this work has been published in
International Conference on Information Security Practice and Experience
(ISPEC 2018
Heterogeneous substitution systems revisited
Matthes and Uustalu (TCS 327(1-2):155-174, 2004) presented a categorical
description of substitution systems capable of capturing syntax involving
binding which is independent of whether the syntax is made up from least or
greatest fixed points. We extend this work in two directions: we continue the
analysis by creating more categorical structure, in particular by organizing
substitution systems into a category and studying its properties, and we
develop the proofs of the results of the cited paper and our new ones in
UniMath, a recent library of univalent mathematics formalized in the Coq
theorem prover.Comment: 24 page
- …