2,086 research outputs found

    Decoding billions of integers per second through vectorization

    Get PDF
    In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    About BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design)

    Full text link
    BIRDS stands for "Bioinformatics and Information Retrieval Data Structures analysis and design" and is a 4-year project (2016--2019) that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 690941. The overall goal of BIRDS is to establish a long term international network involving leading researchers in the development of efficient data structures in the fields of Bioinformatics and Information Retrieval, to strengthen the partnership through the exchange of knowledge and expertise, and to develop integrated approaches to improve current approaches in both fields. The research will address challenges in storing, processing, indexing, searching and navigating genome-scale data by designing new algorithms and data structures for sequence analysis, networks representation or compressing and indexing repetitive data. BIRDS project is carried out by 7 research institutions from Australia (University of Melbourne), Chile (University of Chile and University of Concepci\'on), Finland (University of Helsinki), Japan (Kyushu University), Portugal (Instituto de Engenharia de Sistemas e Computadores, Investiga\c{c}\~ao e Desenvolvimento em Lisboa, INESC-ID), and Spain (University of A Coru\~na), and a Spanish SME (Enxenio S.L.). It is coordinated by the University of A Coru\~na (Spain).Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. CERI 201

    A Suggested Lightweight Lossless Compression Approach for Internet of Everything Devices

    Get PDF
    محدودية مساحة التخزين، وارتفاع حركة مرور نقل بيانات المتحسس وكفاءة نقل الطاقة هي عينات من المشاكل الصعبة في تطوير تطبيقات إنترنت كل شيء. وقد تم في هذا البحث معالجه هذه المشاكل من خلال اقتراح طريقة ضغط بدون فقدان للبيانات، والتي تقوم على عمليات خفيفة بالتعقيد. الطريقة المقترحة تعمل بكفاءة حتى مع الاجهزة ذات الأداء المنخفض. وعلاوة على ذلك، تحسن  أجهزة المتحسس بشكل فعال في إنترنت الكل شيء بواسطة التقليل من استهلاك الطاقة والموارد المستخدمة. وبالتالي، توفير الطاقة واطالة عمر أجهزة إنترنت الكل شيء. تم اختبار الطريقة المقترحة، على مجموعتين من البيانات كمعيار اساسي حسب نسبة الضغط المحسوبة على الرسائل بين نمط شخص الى شخص. بالإضافة إلى ذلك، نسبة ضغط على المتحسسات الطبية (دقات القلب ودرجة حرارة الجسم) بين نمط آلة والى شخص من إنترنت الكل شيء. في الاختبارين، الطريقة قد حصلت على مقياس ضغط كبير.Limit storage space, high traffic sensor data transfer and power efficient transmission are samples of the challenging issues in the development of Internet of Everything (IoE) apps. This paper tackles these issues by presenting a suggested lossless compression approach according to lightweight operations. The suggested approach is working efficiently even with a low-performance equipment. Furthermore, enhancing the sensor node effectively of IoE by minimizing energy exhaustion and resource utilizing. Hence, provision power and expanding the age of IoE devices. The suggested approach is evaluated by using two datasets as a benchmark by calculating compression ratio firstly, on messages between person to person and secondly, on healthcare sensors (HeartRate and Body Temperature) between machine to person pattern of IoE.  In two tests, the suggested approach may obtain a significant compression ratio

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics
    corecore