37 research outputs found

    HLG: A framework for computing graphs in Residue Number System and its application in Fully Homomorphic Encryption

    Get PDF
    Implementation of Fully Homomorphic Encryption (FHE) is challenging. Especially when considering hardware acceleration, the major performance bottleneck is data transfer. Here we propose an algebraic framework called Heterogenous Lattice Graph (HLG) to build and process computing graphs in Residue Number System (RNS), which is the basis of high performance implementation of mainstream FHE algorithms. There are three main design goals for HLG framework: ‱ Design a dedicated IR (HLG IR) for RNS system, here splitting and combination of data placeholders has practical implications in an algebraic sense. Existing IRs cannot efficiently support these operations. ‱ Lower the technical barriers for both crypto researchers and hardware engineers by decoupling front-end cryptographic algorithms from the back-end hardware platforms. The algorithms and solutions built on HLG framework can be written once and run everywhere. Researchers and engineers don’t need to understand each other. ‱ Try to reduce the cost of data transfer between CPU and GPU/FPGA/dedicated hardware, by providing the intermediate representation (IR) of the computing graph for hardware compute engine, which allows task scheduling without help from CPU. We have implemented CKKS algorithm based on HLG framework, together with a compute engine for multiple CPU cores. Experiment shows that we can outperform SEAL v3 Library in several use cases in multi-threading scenarios

    RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

    Full text link
    RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

    Improving Application Performance in the Emerging Hyper-converged Infrastructure

    Get PDF
    University of Minnesota Ph.D. dissertation.April 2019. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); viii, 118 pages.In today's world, the hyper-converged infrastructure is emerging as a new type of infrastructure. In the hyper-converged infrastructure, service providers deploy compute, network and storage services on inexpensive hardware rather than expensive proprietary hardware. It allows the service providers to customize the services they can provide by deploying applications in Virtual Machines (VMs) or containers. They can have controls on all resources including compute, network and storage. In this hyper-converged infrastructure, improving the application performance is an important issue. Throughout my Ph.D. research, I have been studying how to improve the performance of applications in the emerging hyper-converged infrastructure. I have been focusing on improving the performance of applications in VMs and in containers when accessing data, and how to improve the performance of applications in the networked storage environment. In the hyper-converged infrastructure, administrators can provide desktop services by deploying Virtual Desktop Infrastructure application (VDI) based on VMs. We first investigate how to identify storage requirements and determine how to meet such requirements with minimal storage resources for VDI application. We create a model to describe the behavior of VDI, and collect real VDI traces to populate this model. The model allows us to identify the storage requirements of VDI and determine the potential bottlenecks in storage. Based on this information, we can tell what capacity and minimum capability a storage system needs in order to support and satisfy a given VDI configuration. We show that our model can describe more fine-grained storage requirements of VDI compared with the rules of thumb which are currently used in industry. In the hyper-converged infrastructure, more and more applications are running in containers. We design and implement a system, called k8sES (k8s Enhanced Storage), that efficiently supports applications with various storage SLOs (Service Level Objectives) along with all other requirements deployed in the Kubernetes environment which is based on containers. Kubernetes (k8s) is a system for managing containerized applications across multiple hosts. The current storage support for containerized applications in k8s is limited. To satisfy users' SLOs, k8s administrators must manually configure storage in advance, and users must know the configurations and capabilities of different types of the provided storage. In k8sES, storage resources are dynamically allocated based on users' requirements. Given users' SLOs, k8sES will select the correct node and storage that can meet their requirements when scheduling applications. The storage allocation mechanism in k8sES also improves the storage utilization efficiency. In addition, we provide a tool to monitor the I/O activities of both applications and storage devices in Kubernetes. With the capabilities of controlling client, network and storage with hyper-convergence, we study how to coordinate different components along the I/O path to ensure latency SLOs for applications in the networked storage environment. We propose and implement JoiNS, a system trying to ensure latency SLOs for applications that access data on remote networked storage. JoiNS carefully considers all the components along the I/O path and controls them in a coordinated fashion. JoiNS has both global network and storage visibility with a logically centralized controller which keeps monitoring the status of each involved component. JoiNS coordinates these components and adjusts the priority of I/Os in each component based on the latency SLO, network and storage status, time estimation, and characteristics of each I/O request

    SimuBoost: Scalable Parallelization of Functional System Simulation

    Get PDF
    FĂŒr das Sammeln detaillierter Laufzeitinformationen, wie Speicherzugriffsmustern, wird in der Betriebssystem- und Sicherheitsforschung hĂ€ufig auf die funktionale Systemsimulation zurĂŒckgegriffen. Der Simulator fĂŒhrt dabei die zu untersuchende Arbeitslast in einer virtuellen Maschine (VM) aus, indem er schrittweise Instruktionen interpretiert oder derart ĂŒbersetzt, sodass diese auf dem Zustand der VM arbeiten. Dieser Prozess ermöglicht es, eine umfangreiche Instrumentierung durchzufĂŒhren und so an Informationen zum Laufzeitverhalten zu gelangen, die auf einer physischen Maschine nicht zugĂ€nglich sind. Obwohl die funktionale Systemsimulation als mĂ€chtiges Werkzeug gilt, stellt die durch die Interpretation oder Übersetzung resultierende immense AusfĂŒhrungsverlangsamung eine substanzielle EinschrĂ€nkung des Verfahrens dar. Im Vergleich zu einer nativen AusfĂŒhrung messen wir fĂŒr QEMU eine 30-fache Verlangsamung, wobei die Aufzeichnung von Speicherzugriffen diesen Faktor verdoppelt. Mit Simulatoren, die umfangreichere Instrumentierungsmöglichkeiten mitbringen als QEMU, kann die Verlangsamung um eine GrĂ¶ĂŸenordnung höher ausfallen. Dies macht die funktionale Simulation fĂŒr lang laufende, vernetzte oder interaktive Arbeitslasten uninteressant. DarĂŒber hinaus erzeugt die Verlangsamung ein unrealistisches Zeitverhalten, sobald AktivitĂ€ten außerhalb der VM (z. B. Ein-/Ausgabe) involviert sind. In dieser Arbeit stellen wir SimuBoost vor, eine Methode zur drastischen Beschleunigung funktionaler Systemsimulation. SimuBoost fĂŒhrt die zu untersuchende Arbeitslast zunĂ€chst in einer schnellen hardwaregestĂŒtzten virtuellen Maschine aus. Dies ermöglicht volle InteraktivitĂ€t mit Benutzern und NetzwerkgerĂ€ten. WĂ€hrend der AusfĂŒhrung erstellt SimuBoost periodisch Abbilder der VM (engl. Checkpoints). Diese dienen als Ausgangspunkt fĂŒr eine parallele Simulation, bei der jedes Intervall unabhĂ€ngig simuliert und analysiert wird. Eine heterogene deterministische Wiederholung (engl. heterogeneous deterministic Replay) garantiert, dass in dieser Phase die vorherige hardwaregestĂŒtzte AusfĂŒhrung jedes Intervalls exakt reproduziert wird, einschließlich Interaktionen und realistischem Zeitverhalten. Unser Prototyp ist in der Lage, die Laufzeit einer funktionalen Systemsimulation deutlich zu reduzieren. WĂ€hrend mit herkömmlichen Verfahren fĂŒr die Simulation des Bauprozesses eines modernen Linux ĂŒber 5 Stunden benötigt werden, schließt SimuBoost die Simulation in nur 15 Minuten ab. Dies sind lediglich 16% mehr Zeit, als der Bau in einer schnellen hardwaregestĂŒtzten VM in Anspruch nimmt. SimuBoost ist imstande, diese Geschwindigkeit auch bei voller Instrumentierung zur Aufzeichnung von Speicherzugriffen beizubehalten. Die vorliegende Arbeit ist das erste Projekt, welches das Konzept der Partitionierung und Parallelisierung der AusfĂŒhrungszeit auf die interaktive Systemvirtualisierung in einer Weise anwendet, die eine sofortige parallele funktionale Simulation gestattet. Wir ergĂ€nzen die praktische Umsetzung mit einem mathematischen Modell zur formalen Beschreibung der Beschleunigungseigenschaften. Dies erlaubt es, fĂŒr ein gegebenes Szenario die voraussichtliche parallele Simulationszeit zu prognostizieren und gibt eine Orientierung zur Wahl der optimalen IntervalllĂ€nge. Im Gegensatz zu bisherigen Arbeiten legt SimuBoost einen starken Fokus auf die Skalierbarkeit ĂŒber die Grenzen eines einzelnen physischen Systems hinaus. Ein zentraler SchlĂŒssel hierzu ist der Einsatz moderner Checkpointing-Technologien. Im Rahmen dieser Arbeit prĂ€sentieren wir zwei neuartige Methoden zur effizienten und effektiven Kompression von periodischen Systemabbildern

    LocaRDS: A Localization Reference Data Set

    Get PDF
    The use of wireless signals for the purposes of localization enables a host of applications relating to the determination and verification of the positions of network participants ranging from radar to satellite navigation. Consequently, this has been a longstanding interest of theoretical and practical research in mobile networks and many solutions have been proposed in the scientific literature. However, it is hard to assess the performance of these in the real world and, more importantly, to compare their advantages and disadvantages in a controlled scientific manner. With this work, we attempt to improve the current state of art methodology in localization research and to place it on a solid scientific grounding for future investigations. Concretely, we developed LocaRDS, an open reference data set of real-world crowdsourced flight data featuring more than 222 million measurements from over 50 million transmissions recorded by 323 sensors. We demonstrate how we can verify the quality of LocaRDS measurements so that it can be used to test, analyze and directly compare different localization methods. Finally, we provide an example implementation for the aircraft localization problem and a discussion of possible metrics for use with LocaRDS

    Automating Security Risk and Requirements Management for Cyber-Physical Systems

    Get PDF
    Cyber-physische Systeme ermöglichen zahlreiche moderne AnwendungsfĂ€lle und GeschĂ€ftsmodelle wie vernetzte Fahrzeuge, das intelligente Stromnetz (Smart Grid) oder das industrielle Internet der Dinge. Ihre SchlĂŒsselmerkmale KomplexitĂ€t, HeterogenitĂ€t und Langlebigkeit machen den langfristigen Schutz dieser Systeme zu einer anspruchsvollen, aber unverzichtbaren Aufgabe. In der physischen Welt stellen die Gesetze der Physik einen festen Rahmen fĂŒr Risiken und deren Behandlung dar. Im Cyberspace gibt es dagegen keine vergleichbare Konstante, die der Erosion von Sicherheitsmerkmalen entgegenwirkt. Hierdurch können sich bestehende Sicherheitsrisiken laufend Ă€ndern und neue entstehen. Um SchĂ€den durch böswillige Handlungen zu verhindern, ist es notwendig, hohe und unbekannte Risiken frĂŒhzeitig zu erkennen und ihnen angemessen zu begegnen. Die BerĂŒcksichtigung der zahlreichen dynamischen sicherheitsrelevanten Faktoren erfordert einen neuen Automatisierungsgrad im Management von Sicherheitsrisiken und -anforderungen, der ĂŒber den aktuellen Stand der Wissenschaft und Technik hinausgeht. Nur so kann langfristig ein angemessenes, umfassendes und konsistentes Sicherheitsniveau erreicht werden. Diese Arbeit adressiert den dringenden Bedarf an einer Automatisierungsmethodik bei der Analyse von Sicherheitsrisiken sowie der Erzeugung und dem Management von Sicherheitsanforderungen fĂŒr Cyber-physische Systeme. Das dazu vorgestellte Rahmenwerk umfasst drei Komponenten: (1) eine modelbasierte Methodik zur Ermittlung und Bewertung von Sicherheitsrisiken; (2) Methoden zur Vereinheitlichung, Ableitung und Verwaltung von Sicherheitsanforderungen sowie (3) eine Reihe von Werkzeugen und Verfahren zur Erkennung und Reaktion auf sicherheitsrelevante Situationen. Der Schutzbedarf und die angemessene Stringenz werden durch die Sicherheitsrisikobewertung mit Hilfe von Graphen und einer sicherheitsspezifischen Modellierung ermittelt und bewertet. Basierend auf dem Modell und den bewerteten Risiken werden anschließend fundierte Sicherheitsanforderungen zum Schutz des Gesamtsystems und seiner FunktionalitĂ€t systematisch abgeleitet und in einer einheitlichen, maschinenlesbaren Struktur formuliert. Diese maschinenlesbare Struktur ermöglicht es, Sicherheitsanforderungen automatisiert entlang der Lieferkette zu propagieren. Ebenso ermöglicht sie den effizienten Abgleich der vorhandenen FĂ€higkeiten mit externen Sicherheitsanforderungen aus Vorschriften, Prozessen und von GeschĂ€ftspartnern. Trotz aller getroffenen Maßnahmen verbleibt immer ein gewisses Restrisiko einer Kompromittierung, worauf angemessen reagiert werden muss. Dieses Restrisiko wird durch Werkzeuge und Prozesse adressiert, die sowohl die lokale und als auch die großrĂ€umige Erkennung, Klassifizierung und Korrelation von VorfĂ€llen verbessern. Die Integration der Erkenntnisse aus solchen VorfĂ€llen in das Modell fĂŒhrt hĂ€ufig zu aktualisierten Bewertungen, neuen Anforderungen und verbessert weitere Analysen. Abschließend wird das vorgestellte Rahmenwerk anhand eines aktuellen Anwendungsfalls aus dem Automobilbereich demonstriert.Cyber-Physical Systems enable various modern use cases and business models such as connected vehicles, the Smart (power) Grid, or the Industrial Internet of Things. Their key characteristics, complexity, heterogeneity, and longevity make the long-term protection of these systems a demanding but indispensable task. In the physical world, the laws of physics provide a constant scope for risks and their treatment. In cyberspace, on the other hand, there is no such constant to counteract the erosion of security features. As a result, existing security risks can constantly change and new ones can arise. To prevent damage caused by malicious acts, it is necessary to identify high and unknown risks early and counter them appropriately. Considering the numerous dynamic security-relevant factors requires a new level of automation in the management of security risks and requirements, which goes beyond the current state of the art. Only in this way can an appropriate, comprehensive, and consistent level of security be achieved in the long term. This work addresses the pressing lack of an automation methodology for the security-risk assessment as well as the generation and management of security requirements for Cyber-Physical Systems. The presented framework accordingly comprises three components: (1) a model-based security risk assessment methodology, (2) methods to unify, deduce and manage security requirements, and (3) a set of tools and procedures to detect and respond to security-relevant situations. The need for protection and the appropriate rigor are determined and evaluated by the security risk assessment using graphs and a security-specific modeling. Based on the model and the assessed risks, well-founded security requirements for protecting the overall system and its functionality are systematically derived and formulated in a uniform, machine-readable structure. This machine-readable structure makes it possible to propagate security requirements automatically along the supply chain. Furthermore, they enable the efficient reconciliation of present capabilities with external security requirements from regulations, processes, and business partners. Despite all measures taken, there is always a slight risk of compromise, which requires an appropriate response. This residual risk is addressed by tools and processes that improve the local and large-scale detection, classification, and correlation of incidents. Integrating the findings from such incidents into the model often leads to updated assessments, new requirements, and improves further analyses. Finally, the presented framework is demonstrated by a recent application example from the automotive domain

    Improving Storage with Stackable Extensions

    Get PDF
    Storage is a central part of computing. Driven by exponentially increasing content generation rate and a widening performance gap between memory and secondary storage, researchers are in the perennial quest to push for further innovation. This has resulted in novel ways to “squeeze” more capacity and performance out of current and emerging storage technology. Adding intelligence and leveraging new types of storage devices has opened the door to a whole new class of optimizations to save cost, improve performance, and reduce energy consumption. In this dissertation, we first develop, analyze, and evaluate three storage exten- sions. Our first extension tracks application access patterns and writes data in the way individual applications most commonly access it to benefit from the sequential throughput of disks. Our second extension uses a lower power flash device as a cache to save energy and turn off the disk during idle periods. Our third extension is designed to leverage the characteristics of both disks and solid state devices by placing data in the most appropriate device to improve performance and save power. In developing these systems, we learned that extending the storage stack is a complex process. Implementing new ideas incurs a prolonged and cumbersome de- velopment process and requires developers to have advanced knowledge of the entire system to ensure that extensions accomplish their goal without compromising data recoverability. Futhermore, storage administrators are often reluctant to deploy specific storage extensions without understanding how they interact with other ex- tensions and if the extension ultimately achieves the intended goal. We address these challenges by using a combination of approaches. First, we simplify the stor- age extension development process with system-level infrastructure that implements core functionality commonly needed for storage extension development. Second, we develop a formal theory to assist administrators deploy storage extensions while guaranteeing that the given high level goals are satisfied. There are, however, some cases for which our theory is inconclusive. For such scenarios we present an experi- mental methodology that allows administrators to pick an extension that performs best for a given workload. Our evaluation demostrates the benefits of both the infrastructure and the formal theory

    JetStream: Enabling high throughput live event streaming on multi-site clouds

    Get PDF
    International audienceScientific and commercial applications operate nowadays on tens of cloud datacenters around the globe, following similar patterns: they aggregate monitoring or sensor data, assess the QoS or run global data mining queries based on inter-site event stream processing. Enabling fast data transfers across geographically distributed sites allows such applications to manage the continuous streams of events in real time and quickly react to changes. However, traditional event processing engines often consider data resources as second-class citizens and support access to data only as a side-effect of computation (i.e. they are not concerned by the transfer of events from their source to the processing site). This is an efficient approach as long as the processing is executed in a single cluster where nodes are interconnected by low latency networks. In a distributed environment, consisting of multiple datacenters, with orders of magnitude differences in capabilities and connected by a WAN, this will undoubtedly lead to significant latency and performance variations. This is namely the challenge we address in this paper, by proposing JetStream, a high performance batch-based streaming middleware for efficient transfers of events between cloud datacenters. JetStream is able to self-adapt to the streaming conditions by modeling and monitoring a set of context parameters. It further aggregates the available bandwidth by enabling multi-route streaming across cloud sites, while at the same time optimizing resource utilization and increasing cost efficiency. The prototype was validated on tens of nodes from US and Europe datacenters of the Windows Azure cloud with synthetic benchmarks and a real-life application monitoring the ALICE experiment at CERN. The results show a 3x increase of the transfer rate using the adaptive multi-route streaming, compared to state of the art solutions
    corecore