Search CORE

16 research outputs found

Stream VByte: Faster Byte-Oriented Integer Compression

Author: Kurz Nathan
Lemire Daniel
Rupp Christoph
Publication venue: 'Elsevier BV'
Publication date: 27/09/2017
Field of study

Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Google's Varint-GB). They are appealing due to their simplicity and engineering convenience. Amazon's varint-G8IU is one of the fastest byte-oriented compression technique published so far. It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors. To surpass varint-G8IU, we present Stream VByte, a novel byte-oriented compression technique that separates the control stream from the encoded data. Like varint-G8IU, Stream VByte is well suited for SIMD instructions. We show that Stream VByte decoding can be up to twice as fast as varint-G8IU decoding over real data sets. In this sense, Stream VByte establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache

arXiv.org e-Print Archive

R-libre

Crossref

Better bitmap performance with Roaring bitmaps

Author: Beyer
Colantonio
Culpepper
Fusco
Inoue
Kaser
Kaser
Lemire
Lemire
Lemire
Warren
Publication venue: 'Wiley'
Publication date: 15/03/2016
Field of study

Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle's lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. We compare it to two high-performance RLE-based bitmap encoding techniques: WAH (Word Aligned Hybrid compression scheme) and Concise (Compressed `n' Composable Integer Set). On synthetic and real data, we find that Roaring bitmaps (1) often compress significantly better (e.g., 2 times) and (2) are faster than the compressed alternatives (up to 900 times faster for intersections). Our results challenge the view that RLE-based bitmap compression is best

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

On the Unicity of Smartphone Applications

Author: Eckersley P.
Geweke J.
Levin D. A.
Metropolis N.
Olejnik L.
Sweeney L. A.
Publication venue
Publication date: 12/10/2015
Field of study

Prior works have shown that the list of apps installed by a user reveal a lot about user interests and behavior. These works rely on the semantics of the installed apps and show that various user traits could be learnt automatically using off-the-shelf machine-learning techniques. In this work, we focus on the re-identifiability issue and thoroughly study the unicity of smartphone apps on a dataset containing 54,893 Android users collected over a period of 7 months. Our study finds that any 4 apps installed by a user are enough (more than 95% times) for the re-identification of the user in our dataset. As the complete list of installed apps is unique for 99% of the users in our dataset, it can be easily used to track/profile the users by a service such as Twitter that has access to the whole list of installed apps of users. As our analyzed dataset is small as compared to the total population of Android users, we also study how unicity would vary with larger datasets. This work emphasizes the need of better privacy guards against collection, use and release of the list of installed apps.Comment: 10 pages, 9 Figures, Appeared at ACM CCS Workshop on Privacy in Electronic Society (WPES) 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation.

Author: Andreassen Ole A
Dale Anders M
Djurovic Srdjan
Fan Chun Chieh
Frei Oleksandr
Holland Dominic
Maeland Steffen
O'Connell Kevin S
Shadrin Alexey A
Smeland Olav B
Thompson Wesley K
Wang Yunpeng
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Accumulating evidence from genome wide association studies (GWAS) suggests an abundance of shared genetic influences among complex human traits and disorders, such as mental disorders. Here we introduce a statistical tool, MiXeR, which quantifies polygenic overlap irrespective of genetic correlation, using GWAS summary statistics. MiXeR results are presented as a Venn diagram of unique and shared polygenic components across traits. At 90% of SNP-heritability explained for each phenotype, MiXeR estimates that 8.3 K variants causally influence schizophrenia and 6.4 K influence bipolar disorder. Among these variants, 6.2 K are shared between the disorders, which have a high genetic correlation. Further, MiXeR uncovers polygenic overlap between schizophrenia and educational attainment. Despite a genetic correlation close to zero, the phenotypes share 8.3 K causal variants, while 2.5 K additional variants influence only educational attainment. By considering the polygenicity, discoverability and heritability of complex phenotypes, MiXeR analysis may improve our understanding of cross-trait genetic architectures

University of Bergen

eScholarship - University of California

NORA - Norwegian Open Research Archives