377 research outputs found

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Time Series Management Systems: A 2022 Survey

    Get PDF

    Data-intensive Systems on Modern Hardware : Leveraging Near-Data Processing to Counter the Growth of Data

    Get PDF
    Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications. At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs. However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound. Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located. Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects. Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP. Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput. Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times

    Enabling efficient OS paging for main-memory OLTP databases

    Full text link

    Gestion efficace et partage sécurisé des traces de mobilité

    Get PDF
    Nowadays, the advances in the development of mobile devices, as well as embedded sensors have permitted an unprecedented number of services to the user. At the same time, most mobile devices generate, store and communicate a large amount of personal information continuously. While managing personal information on the mobile devices is still a big challenge, sharing and accessing these information in a safe and secure way is always an open and hot topic. Personal mobile devices may have various form factors such as mobile phones, smart devices, stick computers, secure tokens or etc. It could be used to record, sense, store data of user's context or environment surrounding him. The most common contextual information is user's location. Personal data generated and stored on these devices is valuable for many applications or services to user, but it is sensitive and needs to be protected in order to ensure the individual privacy. In particular, most mobile applications have access to accurate and real-time location information, raising serious privacy concerns for their users.In this dissertation, we dedicate the two parts to manage the location traces, i.e. the spatio-temporal data on mobile devices. In particular, we offer an extension of spatio-temporal data types and operators for embedded environments. These data types reconcile the features of spatio-temporal data with the embedded requirements by offering an optimal data presentation called Spatio-temporal object (STOB) dedicated for embedded devices. More importantly, in order to optimize the query processing, we also propose an efficient indexing technique for spatio-temporal data called TRIFL designed for flash storage. TRIFL stands for TRajectory Index for Flash memory. It exploits unique properties of trajectory insertion, and optimizes the data structure for the behavior of flash and the buffer cache. These ideas allow TRIFL to archive much better performance in both Flash and magnetic storage compared to its competitors.Additionally, we also investigate the protect user's sensitive information in the remaining part of this thesis by offering a privacy-aware protocol for participatory sensing applications called PAMPAS. PAMPAS relies on secure hardware solutions and proposes a user-centric privacy-aware protocol that fully protects personal data while taking advantage of distributed computing. For this to be done, we also propose a partitioning algorithm an aggregate algorithm in PAMPAS. This combination drastically reduces the overall costs making it possible to run the protocol in near real-time at a large scale of participants, without any personal information leakage.Aujourd'hui, les progrès dans le développement d'appareils mobiles et des capteurs embarqués ont permis un essor sans précédent de services à l'utilisateur. Dans le même temps, la plupart des appareils mobiles génèrent, enregistrent et de communiquent une grande quantité de données personnelles de manière continue. La gestion sécurisée des données personnelles dans les appareils mobiles reste un défi aujourd’hui, que ce soit vis-à-vis des contraintes inhérentes à ces appareils, ou par rapport à l’accès et au partage sûrs et sécurisés de ces informations. Cette thèse adresse ces défis et se focalise sur les traces de localisation. En particulier, s’appuyant sur un serveur de données relationnel embarqué dans des appareils mobiles sécurisés, cette thèse offre une extension de ce serveur à la gestion des données spatio-temporelles (types et operateurs). Et surtout, elle propose une méthode d'indexation spatio-temporelle (TRIFL) efficace et adaptée au modèle de stockage en mémoire flash. Par ailleurs, afin de protéger les traces de localisation personnelles de l'utilisateur, une architecture distribuée et un protocole de collecte participative préservant les données de localisation ont été proposés dans PAMPAS. Cette architecture se base sur des dispositifs hautement sécurisés pour le calcul distribué des agrégats spatio-temporels sur les données privées collectées
    • …
    corecore