1,842 research outputs found

    Decoding billions of integers per second through vectorization

    Get PDF
    In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

    The Inflation Technique Completely Solves the Causal Compatibility Problem

    Full text link
    The causal compatibility question asks whether a given causal structure graph -- possibly involving latent variables -- constitutes a genuinely plausible causal explanation for a given probability distribution over the graph's observed variables. Algorithms predicated on merely necessary constraints for causal compatibility typically suffer from false negatives, i.e. they admit incompatible distributions as apparently compatible with the given graph. In [arXiv:1609.00672], one of us introduced the inflation technique for formulating useful relaxations of the causal compatibility problem in terms of linear programming. In this work, we develop a formal hierarchy of such causal compatibility relaxations. We prove that inflation is asymptotically tight, i.e., that the hierarchy converges to a zero-error test for causal compatibility. In this sense, the inflation technique fulfills a longstanding desideratum in the field of causal inference. We quantify the rate of convergence by showing that any distribution which passes the nthn^{th}-order inflation test must be O(n−1/2)O\left(n^{-1/2}\right)-close in Euclidean norm to some distribution genuinely compatible with the given causal structure. Furthermore, we show that for many causal structures, the (unrelaxed) causal compatibility problem is faithfully formulated already by either the first or second order inflation test.Comment: Updated to match forthcoming journal publication as closely as possible. Some content removed for brevity. Expanded citations. Most footnotes moved into the main text. Significant changes to subsection 4.1, where we corrected an error in the example of second order inflation not converging, and added an converse example where second order inflation outperforms other technique

    Instructions-Based Detection of Sophisticated Obfuscation and Packing

    Get PDF
    Every day thousands of malware are released online. The vast majority of these malware employ some kind of obfuscation ranging from simple XOR encryption, to more sophisticated anti-analysis, packing and encryption techniques. Dynamic analysis methods can unpack the file and reveal its hidden code. However, these methods are very time consuming when compared to static analysis. Moreover, considering the large amount of new malware being produced daily, it is not practical to solely depend on dynamic analysis methods. Therefore, finding an effective way to filter the samples and delegate only obfuscated and suspicious ones to more rigorous tests would significantly improve the overall scanning process. Current techniques of identifying obfuscation rely mainly on signatures of known packers, file entropy score, or anomalies in file header. However, these features are not only easily bypass-able, but also do not cover all types of obfuscation. In this paper, we introduce a novel approach to identify obfuscated files based on anomalies in their instructions-based characteristics. We detect the presence of interleaving instructions which are the result of the opaque predicate anti-disassembly trick, and present distinguishing statistical properties based on the opcodes and control flow graphs of obfuscated files. Our detection system combines these features with other file structural features and leads to a very good result of detecting obfuscated malware

    Role of Secondary Motifs in Fast Folding Polymers: A Dynamical Variational Principle

    Full text link
    A fascinating and open question challenging biochemistry, physics and even geometry is the presence of highly regular motifs such as alpha-helices in the folded state of biopolymers and proteins. Stimulating explanations ranging from chemical propensity to simple geometrical reasoning have been invoked to rationalize the existence of such secondary structures. We formulate a dynamical variational principle for selection in conformation space based on the requirement that the backbone of the native state of biologically viable polymers be rapidly accessible from the denatured state. The variational principle is shown to result in the emergence of helical order in compact structures.Comment: 4 pages, RevTex, 4 eps figure
    • 

    corecore