130 research outputs found

    Efficient Oblivious Sorting and Shuffling for Hardware Enclaves

    Get PDF
    Oblivious sorting is arguably the most important building block in the design of efficient oblivious algorithms. We propose new oblivious sorting algorithms for hardware enclaves. Our algorithms achieve asymptotic optimality in terms of both computational overhead and the number of page swaps the enclave has to make to fetch data from insecure memory or disk. We also aim to minimize the concrete constants inside the big-O. One of our algorithms achieve bounds tight to the constant in terms of the number of page swaps. We have implemented our algorithms and made them publicly available through open source. In comparison with (an unoptimized version of) bitonic sort, which is asymptotically non-optimal but the de facto algorithm used in practice, we achieve a speedup of 2000 times for 12 GB inputs

    An Associativity Threshold Phenomenon in Set-Associative Caches

    Full text link
    In an α\alpha-way set-associative cache, the cache is partitioned into disjoint sets of size α\alpha, and each item can only be cached in one set, typically selected via a hash function. Set-associative caches are widely used and have many benefits, e.g., in terms of latency or concurrency, over fully associative caches, but they often incur more cache misses. As the set size α\alpha decreases, the benefits increase, but the paging costs worsen. In this paper we characterize the performance of an α\alpha-way set-associative LRU cache of total size kk, as a function of α=α(k)\alpha = \alpha(k). We prove the following, assuming that sets are selected using a fully random hash function: - For α=ω(logk)\alpha = \omega(\log k), the paging cost of an α\alpha-way set-associative LRU cache is within additive O(1)O(1) of that a fully-associative LRU cache of size (1o(1))k(1-o(1))k, with probability 11/poly(k)1 - 1/\operatorname{poly}(k), for all request sequences of length poly(k)\operatorname{poly}(k). - For α=o(logk)\alpha = o(\log k), and for all c=O(1)c = O(1) and r=O(1)r = O(1), the paging cost of an α\alpha-way set-associative LRU cache is not within a factor cc of that a fully-associative LRU cache of size k/rk/r, for some request sequence of length O(k1.01)O(k^{1.01}). - For α=ω(logk)\alpha = \omega(\log k), if the hash function can be occasionally changed, the paging cost of an α\alpha-way set-associative LRU cache is within a factor 1+o(1)1 + o(1) of that a fully-associative LRU cache of size (1o(1))k(1-o(1))k, with probability 11/poly(k)1 - 1/\operatorname{poly}(k), for request sequences of arbitrary (e.g., super-polynomial) length. Some of our results generalize to other paging algorithms besides LRU, such as least-frequently used (LFU)

    Air Traffic Management Abbreviation Compendium

    Get PDF
    As in all fields of work, an unmanageable number of abbreviations are used today in aviation for terms, definitions, commands, standards and technical descriptions. This applies in general to the areas of aeronautical communication, navigation and surveillance, cockpit and air traffic control working positions, passenger and cargo transport, and all other areas of flight planning, organization and guidance. In addition, many abbreviations are used more than once or have different meanings in different languages. In order to obtain an overview of the most common abbreviations used in air traffic management, organizations like EUROCONTROL, FAA, DWD and DLR have published lists of abbreviations in the past, which have also been enclosed in this document. In addition, abbreviations from some larger international projects related to aviation have been included to provide users with a directory as complete as possible. This means that the second edition of the Air Traffic Management Abbreviation Compendium includes now around 16,500 abbreviations and acronyms from the field of aviation

    TACKLING PERFORMANCE AND SECURITY ISSUES FOR CLOUD STORAGE SYSTEMS

    Get PDF
    Building data-intensive applications and emerging computing paradigm (e.g., Machine Learning (ML), Artificial Intelligence (AI), Internet of Things (IoT) in cloud computing environments is becoming a norm, given the many advantages in scalability, reliability, security and performance. However, under rapid changes in applications, system middleware and underlying storage device, service providers are facing new challenges to deliver performance and security isolation in the context of shared resources among multiple tenants. The gap between the decades-old storage abstraction and modern storage device keeps widening, calling for software/hardware co-designs to approach more effective performance and security protocols. This dissertation rethinks the storage subsystem from device-level to system-level and proposes new designs at different levels to tackle performance and security issues for cloud storage systems. In the first part, we present an event-based SSD (Solid State Drive) simulator that models modern protocols, firmware and storage backend in detail. The proposed simulator can capture the nuances of SSD internal states under various I/O workloads, which help researchers understand the impact of various SSD designs and workload characteristics on end-to-end performance. In the second part, we study the security challenges of shared in-storage computing infrastructures. Many cloud providers offer isolation at multiple levels to secure data and instance, however, security measures in emerging in-storage computing infrastructures are not studied. We first investigate the attacks that could be conducted by offloaded in-storage programs in a multi-tenancy cloud environment. To defend against these attacks, we build a lightweight Trusted Execution Environment, IceClave to enable security isolation between in-storage programs and internal flash management functions. We show that while enforcing security isolation in the SSD controller with minimal hardware cost, IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.4x better performance than the conventional host-based trusted computing approach. In the third part, we investigate the performance interference problem caused by other tenants' I/O flows. We demonstrate that I/O resource sharing can often lead to performance degradation and instability. The block device abstraction fails to expose SSD parallelism and pass application requirements. To this end, we propose a software/hardware co-design to enforce performance isolation by bridging the semantic gap. Our design can significantly improve QoS (Quality of Service) by reducing throughput penalties and tail latency spikes. Lastly, we explore more effective I/O control to address contention in the storage software stack. We illustrate that the state-of-the-art resource control mechanism, Linux cgroups is insufficient for controlling I/O resources. Inappropriate cgroup configurations may even hurt the performance of co-located workloads under memory intensive scenarios. We add kernel support for limiting page cache usage per cgroup and achieving I/O proportionality

    Hyperscale Data Processing With Network-Centric Designs

    Get PDF
    Today’s largest data processing workloads are hosted in cloud data centers. Due to unprecedented data growth and the end of Moore’s Law, these workloads have ballooned to the hyperscale level, encompassing billions to trillions of data items and hundreds to thousands of machines per query. Enabling and expanding with these workloads are highly scalable data center networks that connect up to hundreds of thousands of networked servers. These massive scales fundamentally challenge the designs of both data processing systems and data center networks, and the classic layered designs are no longer sustainable. Rather than optimize these massive layers in silos, we build systems across them with principled network-centric designs. In current networks, we redesign data processing systems with network-awareness to minimize the cost of moving data in the network. In future networks, we propose new interfaces and services that the cloud infrastructure offers to applications and codesign data processing systems to achieve optimal query processing performance. To transform the network to future designs, we facilitate network innovation at scale. This dissertation presents a line of systems work that covers all three directions. It first discusses GraphRex, a network-aware system that combines classic database and systems techniques to push the performance of massive graph queries in current data centers. It then introduces data processing in disaggregated data centers, a promising new cloud proposal. It details TELEPORT, a compute pushdown feature that eliminates data processing performance bottlenecks in disaggregated data centers, and Redy, which provides high-performance caches using remote disaggregated memory. Finally, it presents MimicNet, a fine-grained simulation framework that evaluates network proposals at datacenter scale with machine learning approximation. These systems demonstrate that our ideas in network-centric designs achieve orders of magnitude higher efficiency compared to the state of the art at hyperscale

    Mobile Ad-Hoc Networks

    Get PDF
    Being infrastructure-less and without central administration control, wireless ad-hoc networking is playing a more and more important role in extending the coverage of traditional wireless infrastructure (cellular networks, wireless LAN, etc). This book includes state-of the-art techniques and solutions for wireless ad-hoc networks. It focuses on the following topics in ad-hoc networks: vehicular ad-hoc networks, security and caching, TCP in ad-hoc networks and emerging applications. It is targeted to provide network engineers and researchers with design guidelines for large scale wireless ad hoc networks

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    Resource Management in Multi-Access Edge Computing (MEC)

    Get PDF
    This PhD thesis investigates the effective ways of managing the resources of a Multi-Access Edge Computing Platform (MEC) in 5th Generation Mobile Communication (5G) networks. The main characteristics of MEC include distributed nature, proximity to users, and high availability. Based on these key features, solutions have been proposed for effective resource management. In this research, two aspects of resource management in MEC have been addressed. They are the computational resource and the caching resource which corresponds to the services provided by the MEC. MEC is a new 5G enabling technology proposed to reduce latency by bringing cloud computing capability closer to end-user Internet of Things (IoT) and mobile devices. MEC would support latency-critical user applications such as driverless cars and e-health. These applications will depend on resources and services provided by the MEC. However, MEC has limited computational and storage resources compared to the cloud. Therefore, it is important to ensure a reliable MEC network communication during resource provisioning by eradicating the chances of deadlock. Deadlock may occur due to a huge number of devices contending for a limited amount of resources if adequate measures are not put in place. It is crucial to eradicate deadlock while scheduling and provisioning resources on MEC to achieve a highly reliable and readily available system to support latency-critical applications. In this research, a deadlock avoidance resource provisioning algorithm has been proposed for industrial IoT devices using MEC platforms to ensure higher reliability of network interactions. The proposed scheme incorporates Banker’s resource-request algorithm using Software Defined Networking (SDN) to reduce communication overhead. Simulation and experimental results have shown that system deadlock can be prevented by applying the proposed algorithm which ultimately leads to a more reliable network interaction between mobile stations and MEC platforms. Additionally, this research explores the use of MEC as a caching platform as it is proclaimed as a key technology for reducing service processing delays in 5G networks. Caching on MEC decreases service latency and improve data content access by allowing direct content delivery through the edge without fetching data from the remote server. Caching on MEC is also deemed as an effective approach that guarantees more reachability due to proximity to endusers. In this regard, a novel hybrid content caching algorithm has been proposed for MEC platforms to increase their caching efficiency. The proposed algorithm is a unification of a modified Belady’s algorithm and a distributed cooperative caching algorithm to improve data access while reducing latency. A polynomial fit algorithm with Lagrange interpolation is employed to predict future request references for Belady’s algorithm. Experimental results show that the proposed algorithm obtains 4% more cache hits due to its selective caching approach when compared with case study algorithms. Results also show that the use of a cooperative algorithm can improve the total cache hits up to 80%. Furthermore, this thesis has also explored another predictive caching scheme to further improve caching efficiency. The motivation was to investigate another predictive caching approach as an improvement to the formal. A Predictive Collaborative Replacement (PCR) caching framework has been proposed as a result which consists of three schemes. Each of the schemes addresses a particular problem. The proactive predictive scheme has been proposed to address the problem of continuous change in cache popularity trends. The collaborative scheme addresses the problem of cache redundancy in the collaborative space. Finally, the replacement scheme is a solution to evict cold cache blocks and increase hit ratio. Simulation experiment has shown that the replacement scheme achieves 3% more cache hits than existing replacement algorithms such as Least Recently Used, Multi Queue and Frequency-based replacement. PCR algorithm has been tested using a real dataset (MovieLens20M dataset) and compared with an existing contemporary predictive algorithm. Results show that PCR performs better with a 25% increase in hit ratio and a 10% CPU utilization overhead

    Using Tracing To Enhance Data Cache Performance in CPUs: The creation of a Trace-Assisted Cache to increase cache hits and decrease runtime

    Get PDF
    The processor-memory gap is widening every year with no prospect of reprieve. More and more latency is being added to program runtimes as memory cannot satisfy the demands of CPUs quickly enough. In the past, this has been alleviated through caches of increasing complexity or techniques like prefetching, to give the illusion of faster memory. However, these techniques have drawbacks because they are reactive or rely on incomplete information. In general, this leads to large amounts of latency in programs due to processor stalls. It is our contention that through tracing a program's data accesses and feeding this information back to the cache, overall program runtime can be reduced. This is achieved through a new piece of hardware called a Trace-Assisted Cache (TAC). This uses traces to gain foreknowledge of the memory requests the processor is likely to make, allowing them to be actioned before the processor requests the data, overlapping memory and computation instructions. Comparing the TAC against a standard CPU without a cache, we see improvements in runtimes of up to 65%. However, we see degraded performance of around 8% on average when compared to Set-Associative and Direct-Mapped caches. This is because improvements are swamped by high overheads and synchronisation times between components. We also see that benchmarks that exhibit several qualities: a balance of computation and memory instructions and keeping data well spread out in memory fare better using TAC than other benchmarks on the same hardware. Overall this demonstrates that whilst there is potential to reduce runtime via increasing the agency of the cache through Trace Assistance, it requires a highly efficient implementation to be competitive otherwise any potential gains are negated by the increase in overheads

    Virtualization techniques for memory resource exploitation

    Get PDF
    Cloud infrastructures have become indispensable in our daily lives with the rise of cloud-based services offered by companies like Facebook, Google, Amazon and many others. These cloud infrastructures use a large numbers of servers provisioned with their own computing resources. Each of these servers use a piece of software, called the Hypervisor (``HV''), that allows them to create multiple virtual instances of the server's physical computing resources and abstract them into "Virtual Machines'' (VMs). A VM runs an Operating System, which in turn runs the applications. The VMs within the servers generate varying memory demand behavior. When the demand increases, costly operations such as (virtual) disk accesses and/or VM migrations can occur. As a result, it is necessary to optimize the utilization of the local memory resources within a single computing server. However, pressure on the memory resources can still increase, making it necessary to migrate the VM to a different server with larger memory or add more memory to the same server. At this point, it is important to consider that some of the servers in the cloud infrastructure might have memory resources that they are not using. Considering the possibility to make memory available to the server, new architectures have been introduced that provide hardware support to enable servers to share their memory capacity. This thesis presents multiple contributions to the memory management problem. First, it addresses the problem of optimizing memory resources in a virtualized server through different types of memory abstractions. Two full contributions are presented for managing memory within a single server called SmarTmem and CARLEMM. In this respect, a third contribution is also presented, called CAVMem, that works as the foundation for CARLEMM. Second, this thesis presents two contributions for memory capacity aggregation across multiple servers, offering two mechanisms called GV-Tmem and vMCA, this latter being based on GV-Tmem but with significant enhancements. These mechanisms distribute the server's total memory within a single-server and globally across computing servers using a user-space process with high-level memory management policies.Las infraestructuras para la nube se han vuelto indispensables en nuestras vidas diarias con la proliferación de los servicios ofrecidos por compañías como Facebook, Google, Amazon entre otras. Estas infraestructuras utilizan una gran cantidad de servidores proveídos con sus propios recursos computacionales. Cada unos de estos servidores utilizan un software, llamado el Hipervisor (“HV”), que les permite crear múltiples instancias virtuales de los recursos físicos de computación del servidor y abstraerlos en “Máquinas Virtuales” (VMs). Una VM ejecuta un Sistema Operativo (OS), el cual a su vez ejecuta aplicaciones. Las VMs dentro de los servidores generan un comportamiento variable de demanda de memoria. Cuando la demanda de memoria aumenta, operaciones costosas como accesos al disco (virtual) y/o migraciones de VMs pueden ocurrir. Como resultado, es necesario optimizar la utilización de los recursos de memoria locales dentro del servidor. Sin embargo, la demanda por memoria puede seguir aumentando, haciendo necesario que la VM migre a otro servidor o que se añada más memoria al servidor. En este punto, es importante considerar que algunos servidores podrían tener recursos de memoria que no están utilizando. Considerando la posibilidad de hacer más memoria disponible a los servidores que lo necesitan, nuevas arquitecturas de servidores han sido introducidos que brindan el soporte de hardware necesario para habilitar que los servidores puedan compartir su capacidad de memoria. Esta tesis presenta múltiples contribuciones para el problema de manejo de memoria. Primero, se enfoca en el problema de optimizar los recursos de memoria en un servidor virtualizado a través de distintos tipos de abstracciones de memoria. Dos contribuciones son presentadas para administrar memoria de manera automática dentro de un servidor virtualizado, llamadas SmarTmem y CARLEMM. En este contexto, una tercera contribución es presentada, llamada CAVMem, que proporciona los fundamentos para el desarrollo de CARLEMM. Segundo, la tesis presenta dos contribuciones enfocadas en la agregación de capacidad de memoria a través de múltiples servidores, ofreciendo dos mecanismos llamados GV-Tmem y vMCA, siendo este último basado en GV-Tmem pero con mejoras significativas. Estos mecanismos administran la memoria total de un servidor a nivel local y de manera global a lo largo de los servidores de la infraestructura de nube utilizando un proceso de usuario que implementa políticas de manejo de ..
    corecore