14 research outputs found

    The Case for Learned Index Structures

    Full text link
    Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

    High Performance Computing using Infiniband-based clusters

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Runtime protection of software programs against control- and data-oriented attacks

    Get PDF
    Software programs are everywhere and continue to create value for us at an incredible pace. But this comes at the cost of facing new risks as our well-being and the stability of societies become strongly dependent on their correctness. Even if the software loaded in the memory is considered legitimate or benign, this does not mean that the code will execute as expected at runtime. Software programs, particularly the ones developed in unsafe languages (e.g., C/C++), inevitably contain many memory bugs. Attackers exploiting these bugs can achieve malicious computations outside the original specification of the program by corrupting its control and data variables in the memory. A potential solution to such runtime attacks must either ensure the integrity of those variables or check the validity of the values they hold. A complete version of the former method, which requires inspection of all memory accesses, can eliminate all the performance benefits of the language used. Alternatively, checking whether specific variables constitute a legitimate state is a non-trivial task that needs to handle state explosion and over-approximation issues. Regardless of the method preferred, most runtime protections are subject to common challenges. For example, as the scope of protection widens, such as the inclusion of data-oriented attacks (in addition to control-oriented attacks), performance costs inevitably increase as well. This is especially true for software-based methods that also suffer from weaker security guarantees. On the contrary, most hardware-based techniques promise better security and performance. But they face substantial deployment challenges without offering any solution to existing devices already out there. In this thesis, we aim to tackle these research challenges by delivering multiple runtime protections in different settings. First, the thesis presents the design of a non-invasive hardware module that can enable attesting runtime correctness on critical embedded systems in real-time. Second, we address the performance burden of covering data-oriented attacks, by suggesting a novel technique to distinguish critical variables from those that are unlikely to be attacked. This is to develop a selective protection scheme with practical performance overheads, without having to check all data variables or corresponding memory accesses. Third, the thesis presents a software-based solution that promises hardware-level protection for critical variables. For this purpose, it leverages the CPU registers available in any architecture with extra help from cryptography. Lastly, we explore the use of runtime interactions with the operating system to identify malicious software executions

    Communication Architectures for Scalable GPU-centric Computing Systems

    Get PDF
    In recent years, power consumption has become the main concern in High Performance Computing (HPC). This has lead to heterogeneous computing systems in which Central Processing Units (CPUs) are supported by accelerators, such as Graphics Processing Units (GPUs). While GPUs used to be seen as slave devices to which the main processor offloads computation, today’s systems tend to deploy more GPUs than CPUs. Eventually, the GPU will become a first-class processor, bearing increasing responsibilities. Promoting the GPU to a first-class processor comes with many challenges, such as progress guarantees, dynamic memory management, and scheduling. However, one of the main challenges is the GPU’s inability to orchestrate communication, which is currently entirely handled by the CPU. This work addresses that issue and presents solutions to allow GPUs to source and sink network traffic independently. Many important aspects are addressed, ranging from the application level to how networking hardware is accessed. First, important and large scale exascale applications are studied to further understand their communication behavior and applications’ requirements. Several metrics are presented, including time spent for communication, message sizes, and the length of queues that are required to match messages with receive requests. One aspect the analysis revealed is that messages are becoming smaller at scale, which renders the matching of messages and receive requests an important problem to address. The next part analyzes how the GPU can directly access the network with various communication models being presented and benchmarked. It is shown that a flat address space of distributed GPU memories shows superior bandwidth than put/get communication or CPU-controlled message passing, but less communication can be overlapped with computation. Overall, GPU-controlled communication is always superior, both in terms of time-to-solution and energy spending. The final part addresses communication management on GPUs, which is required to provide high-level communication abstractions. Besides other fundamental building blocks, an algorithm for the message matching is presented that yields similar performance as CPUs. However, it is also shown that the messaging protocol can be relaxed to improve performance significantly, leveraging the massive amount of parallelism provided by the GPU’s architecture

    Performance Benchmarking of State-of-the-Art Software Switches for NFV

    Full text link
    With the ultimate goal of replacing proprietary hardware appliances with Virtual Network Functions (VNFs) implemented in software, Network Function Virtualization (NFV) has been gaining popularity in the past few years. Software switches route traffic between VNFs and physical Network Interface Cards (NICs). It is of paramount importance to compare the performance of different switch designs and architectures. In this paper, we propose a methodology to compare fairly and comprehensively the performance of software switches. We first explore the design spaces of seven state-of-the-art software switches and then compare their performance under four representative test scenarios. Each scenario corresponds to a specific case of routing NFV traffic between NICs and/or VNFs. In our experiments, we evaluate the throughput and latency between VNFs in two of the most popular virtualization environments, namely virtual machines (VMs) and containers. Our experimental results show that no single software switch prevails in all scenarios. It is, therefore, crucial to choose the most suitable solution for the given use case. At the same time, the presented results and analysis provide a deeper insight into the design tradeoffs and identifies potential performance bottlenecks that could inspire new designs.Comment: 17 page

    Analysis and application of hash-based similarity estimation techniques for biological sequence analysis

    Get PDF
    In Bioinformatics, a large group of problems requires the computation or estimation of sequence similarity. However, the analysis of biological sequence data has, among many others, three capital challenges: a large amount generated data which contains technology-specific errors (that can be mistaken for biological signals), and that might need to be analyzed without access to a reference genome. Through the use of locality sensitive hashing methods, both the efficient estimation of sequence similarity and tolerance against the errors specific to biological data can be achieved. We developed a variant of the winnowing algorithm for local minimizer computation, which is specifically geared to deal with repetitive regions within biological sequences. Through compressing redundant information, we can both reduce the size of the hash tables required to save minimizer sketches, as well as reduce the amount of redundant low quality alignment candidates. Analyzing the distribution of segment lengths generated by this approach, we can better judge the size of required data structures, as well as identify hash functions feasible for this technique. Our evaluation could verify that simple and fast hash functions, even when using small hash value spaces (hash functions with small codomain), are sufficient to compute compressed minimizers and perform comparable to uniformly randomly chosen hash values. We also outlined an index for a taxonomic protein database using multiple compressed winnowings to identify alignment candidates. To store MinHash values, we present a cache-optimized implementation of a hash table using Hopscotch hashing to resolve collisions. As a biological application of similarity based analysis, we describe the analysis of double digest restriction site associated DNA sequencing (ddRADseq). We implemented a simulation software able to model the biological and technological influences of this technology to allow better development and testing of ddRADseq analysis software. Using datasets generated by our software, as well as data obtained from population genetic experiments, we developed an analysis workflow for ddRADseq data, based on the Stacks software. Since the quality of results generated by Stacks strongly depends on how well the used parameters are adapted to the specific dataset, we developed a Snakemake workflow that automates preprocessing tasks while also allowing the automatic exploration of different parameter sets. As part of this workflow, we developed a PCR deduplication approach able to generate consensus reads incorporating the base quality values (as reported by the sequencing device), without performing an alignment first. As an outlook, we outline a MinHashing approach that can be used for a faster and more robust clustering, while addressing incomplete digestion and null alleles, two effects specific for ddRADseq that current analysis tools cannot reliably detect

    Large scale parallel state space search utilizing graphics processing units and solid state disks

    Get PDF
    The evolution of science is a double-track process composed of theoretical insights on the one hand and practical inventions on the other one. While in most cases new theoretical insights motivate hardware developers to produce systems following the theory, in some cases the shown hardware solutions force theoretical research to forecast the results to expect. Progress in computer science rely on two aspects, processing information and storing it. Improving one side without touching the other will evidently impose new problems without producing a real alternative solution to the problem. While decreasing the time to solve a challenge may provide a solution to long term problems it will fail in solving problems which require much storage. In contrast, increasing the available amount of space for information storage will definitively allow harder problems to be solved by offering enough time. This work studies two recent developments in the hardware to utilize them in the domain of graph searching. The trend to discontinue information storage on magnetic disks and use electronic media instead and the tendency to parallelize the computation to speed up information processing are analyzed. Storing information on rotating magnetic disk has become the standard way since a couple of years and has reached a point where the storage capacity can be seen as infinite due to the possibility of adding new drives instantly with low costs. However, while the possible storage capacity increases every year, the transferring speed does not. At the beginning of this work, solid state media appeared on the market, slowly suppressing hard disks in speed demanding applications. Today, when finishing this work solid state drives are replacing magnetic disks in mobile computing, and computing centers use them as caching media to increase information retrieving speed. The reason is the huge advantage in random access where the speed does not drop so significantly as with magnetic drives. While storing and retrieving huge amounts of information is one side of the medal, the other one is the processing speed. Here the trend from increasing the clock frequency of single processors stagnated in 2006 and the manufacturers started to combine multiple cores in one processor. While a CPU is a general purpose processor the manufacturers of graphics processing units (GPUs) encounter the challenge to perform the same computation for a large number of image points. Here, a parallelization offers huge advantages, so modern graphics cards have evolved to highly parallel computing instances with several hundreds of cores. The challenge is to utilize these processors in other domains than graphics processing. One of the vastly used tasks in computer science is search. Not only disciplines with an obvious search but also in software testing searching a graph is the crucial aspect. Strategies which enable to examine larger graphs, be it by reducing the number of considered nodes or by increasing the searching speed, have to be developed to battle the rising challenges. This work enhances searching in multiple scientific domains like explicit state Model Checking, Action Planning, Game Solving and Probabilistic Model Checking proposing strategies to find solutions for the search problems. Providing an universal search strategy which can be used in all environments to utilize solid state media and graphics processing units is not possible due to the heterogeneous aspects of the domains. Thus, this work presents a tool kit of strategies tied together in an universal three stage strategy. In the first stage the edges leaving a node are determined, in the second stage the algorithm follows the edges to generate nodes. The duplicate detection in stage three compares all newly generated nodes to existing once and avoids multiple expansions. For each stage at least two strategies are proposed and decision hints are given to simplify the selection of the proper strategy. After describing the strategies the kit is evaluated in four domains explaining the choice for the strategy, evaluating its outcome and giving future clues on the topic

    Towards Network-Accelerated Databases

    Get PDF
    Throughout the last years, data processing systems have seen substantial changes, notably moving towards disaggregation of resources. This shift separates compute and storage resources into distinct servers for better resource utilization, as they can now be scaled independently based on demand. This development is crucial for cloud-native Database Management Systems (DBMS), which mainly build on such disaggregated structures. This thesis examines two significant hardware trends in disaggregated architectures for DBMSs: modern networks and heterogeneous computing. Modern networks such as Remote Direct Memory Access (RDMA) are critical for efficient, high-throughput, low-latency data transfer, but present challenges for achieving optimal performance for DBMSs. The reason for this is that RDMA comes with a low-level interface with a plentitude of performance-critical aspects to consider. To address this challenge, this thesis introduces a high-level programming interface, the Data Flow Interface, specifically targeting the needs of data-intensive processing systems. In addition, this thesis highlights the emerging trend toward programmable network devices that offer data processing capabilities in the network. This trend is especially interesting for distributed DBMSs as they have to transfer large amounts of data over the network due to the disaggregated architecture, but also typical distributed data processing operations such as joins have to shuffle data between compute nodes. In the thesis, in-network processing devices are evaluated with typical DBMS operations to investigate the benefits and potential shortcomings. Another trend in the data center is the increasing heterogeneity of computing units such as GPUs and FPGAs due to their fast processing capabilities. Incorporating these heterogeneous devices into disaggregated architectures with fast networks has many merits. The reason is that specialized compute units can be exposed as network-attached disaggregated accelerator pools and thus provide flexible and scalable high-performance data processing. This integration of heterogeneous compute units and fast RDMA-capable networks is however non-trivial since networks like RDMA are typically not directly supported for devices besides CPUs and are as such non-trivial to integrate efficiently. The challenge of how to achieve efficient communication between different types of compute devices is addressed by proposing a network-driven communication scheme that leverages a programmable switch to carry out the network communication on behalf of the compute devices

    Collaborative autonomy in heterogeneous multi-robot systems

    Get PDF
    As autonomous mobile robots become increasingly connected and widely deployed in different domains, managing multiple robots and their interaction is key to the future of ubiquitous autonomous systems. Indeed, robots are not individual entities anymore. Instead, many robots today are deployed as part of larger fleets or in teams. The benefits of multirobot collaboration, specially in heterogeneous groups, are multiple. Significantly higher degrees of situational awareness and understanding of their environment can be achieved when robots with different operational capabilities are deployed together. Examples of this include the Perseverance rover and the Ingenuity helicopter that NASA has deployed in Mars, or the highly heterogeneous robot teams that explored caves and other complex environments during the last DARPA Sub-T competition. This thesis delves into the wide topic of collaborative autonomy in multi-robot systems, encompassing some of the key elements required for achieving robust collaboration: solving collaborative decision-making problems; securing their operation, management and interaction; providing means for autonomous coordination in space and accurate global or relative state estimation; and achieving collaborative situational awareness through distributed perception and cooperative planning. The thesis covers novel formation control algorithms, and new ways to achieve accurate absolute or relative localization within multi-robot systems. It also explores the potential of distributed ledger technologies as an underlying framework to achieve collaborative decision-making in distributed robotic systems. Throughout the thesis, I introduce novel approaches to utilizing cryptographic elements and blockchain technology for securing the operation of autonomous robots, showing that sensor data and mission instructions can be validated in an end-to-end manner. I then shift the focus to localization and coordination, studying ultra-wideband (UWB) radios and their potential. I show how UWB-based ranging and localization can enable aerial robots to operate in GNSS-denied environments, with a study of the constraints and limitations. I also study the potential of UWB-based relative localization between aerial and ground robots for more accurate positioning in areas where GNSS signals degrade. In terms of coordination, I introduce two new algorithms for formation control that require zero to minimal communication, if enough degree of awareness of neighbor robots is available. These algorithms are validated in simulation and real-world experiments. The thesis concludes with the integration of a new approach to cooperative path planning algorithms and UWB-based relative localization for dense scene reconstruction using lidar and vision sensors in ground and aerial robots
    corecore