706 research outputs found

    Secure storage systems for untrusted cloud environments

    Get PDF
    The cloud has become established for applications that need to be scalable and highly available. However, moving data to data centers owned and operated by a third party, i.e., the cloud provider, raises security concerns because a cloud provider could easily access and manipulate the data or program flow, preventing the cloud from being used for certain applications, like medical or financial. Hardware vendors are addressing these concerns by developing Trusted Execution Environments (TEEs) that make the CPU state and parts of memory inaccessible from the host software. While TEEs protect the current execution state, they do not provide security guarantees for data which does not fit nor reside in the protected memory area, like network and persistent storage. In this work, we aim to address TEEs’ limitations in three different ways, first we provide the trust of TEEs to persistent storage, second we extend the trust to multiple nodes in a network, and third we propose a compiler-based solution for accessing heterogeneous memory regions. More specifically, • SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER implements a key-value interface. Its design is based on LSM data structures, but extends them to provide confidentiality, integrity, and freshness for the stored data. Thus, SPEICHER can prove to the client that the data has not been tampered with by an attacker. • AVOCADO is a distributed in-memory key-value store (KVS) that extends the trust that TEEs provide across the network to multiple nodes, allowing KVSs to scale beyond the boundaries of a single node. On each node, AVOCADO carefully divides data between trusted memory and untrusted host memory, to maximize the amount of data that can be stored on each node. AVOCADO leverages the fact that we can model network attacks as crash-faults to trust other nodes with a hardened ABD replication protocol. • TOAST is based on the observation that modern high-performance systems often use several different heterogeneous memory regions that are not easily distinguishable by the programmer. The number of regions is increased by the fact that TEEs divide memory into trusted and untrusted regions. TOAST is a compiler-based approach to unify access to different heterogeneous memory regions and provides programmability and portability. TOAST uses a load/store interface to abstract most library interfaces for different memory regions

    Vector-Processing for Mobile Devices: Benchmark and Analysis

    Full text link
    Vector processing has become commonplace in today's CPU microarchitectures. Vector instructions improve performance and energy which is crucial for resource-constraint mobile devices. The research community currently lacks a comprehensive benchmark suite to study the benefits of vector processing for mobile devices. This paper presents Swan-an extensive vector processing benchmark suite for mobile applications. Swan consists of a diverse set of data-parallel workloads from four commonly used mobile applications: operating system, web browser, audio/video messaging application, and PDF rendering engine. Using Swan benchmark suite, we conduct a detailed analysis of the performance, power, and energy consumption of vectorized workloads, and show that: (a) Vectorized kernels increase the pressure on cache hierarchy due to the higher rate of memory requests. (b) Vector processing is more beneficial for workloads with lower precision operations and higher cache hit rates. (c) Limited Instruction-Level Parallelism and strided memory accesses to multi-dimensional data structures prevent vector processing benefits from scaling with more SIMD functional units and wider registers. (d) Despite lower computation throughput than domain-specific accelerators, such as GPU, vector processing outperforms these accelerators for kernels with lower operation counts. Finally, we show five common computation patterns in mobile data-parallel workloads that dominate the execution time.Comment: 2023 IEEE International Symposium on Workload Characterization (IISWC

    Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi

    Get PDF
    Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for the designer's benefit. However, tools largely remain restricted to a single machine and in the case of RTL simulation, we believe that this leaves much potential performance on the table. We introduce Metro-MPI to improve RTL simulation for modern 10 billion transistor-scale chips. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. Our implementation of Metro-MPI in Open-Piton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, opening new avenues for distributed RTL simulation of emerging system-on-chip designs. Compared to sequential and multithreaded RTL simulations of smaller designs, Metro-MPI achieves up to 135.98× and 9.29× speedups. Similarly, for a representative regression run, Metro-Mpireduces energy consumption by up to 2.53× and 2.91× .This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (contract PID2019-107255GB-C21), by the Generalitat de Catalunya (contract 2017-SGR-1328), by the European Union within the framework of the ERDF of Catalonia 2014-2020 under the DRAC project [001-P-001723], and by the Arm-BSC Center of Excellence. G. Lopez-Paradís has been supported by the Generalitat de Catalunya through a FI fellowship 2021FI-B00994 and GSoC 2021, and M. Moreto by a Ramon y Cajal fellowship no. RYC-2016-21104. A. Armejach is a Serra Hunter Fellow.Peer ReviewedPostprint (author's final draft

    Efficient channelization on a Graphics Processing Unit

    Full text link
    We present an implementation of a channelizer (F-engine) running on a Graphics Processing Unit (GPU). While not the first GPU implementation of a channelizer, we have put significant effort into optimizing the implementation. We are able to process four antennas each with 2 Gsample/s, 10-bit dual-polarized input and 8-bit output, on a single commodity GPU. This fully utilizes the available PCIe bandwidth of the GPU. The system is not as optimized for a single high-bandwidth antenna, but handles 6.2 Gsample/s, limited by single-core CPU performance.Comment: Submitted to The Journal of Astronomical Telescopes, Instruments, and System

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    Gravitational waves as a probe for cosmology and non-standard physics

    Get PDF
    The direct detection of Gravitational Waves (GWs) in 2015 revealed an entirely new way of studying the Universe and fundamental physics. This tremendous achievement, almost 100 years since the initial study of GWs by Einstein, allows us to see into regions invisible to our standard Electromagnetic (EM) observations, from the primordial universe to the interior of neutron stars. In this thesis, we exploit this new probe to investigate a number of physical characteristics of the Universe and to test our current understanding of Einstein’s General Theory of Relativity. To accomplish this, we use powerful cosmological simulations, which are able to replicate the observed large-scale structure down to complex, non-linear scales. More specifically: In Chapter 1, we review the current paradigm of cosmology, the LCDM model and its limitations. We motivate some of the fundamental assumptions and problems that we aim to tackle in the next chapters, and provide a brief introduction to GWs and their propagation in spacetime. In Chapter 2, we describe our numerical simulations and their outputs for the values of gravitational potentials. In Chapter 3, we exploit GWs Standard Sirens, i.e. GWs observations where we are able to locate their counterpart sources in the EM spectrum, in order to constrain the level of inhomogeneities in our Universe. In parallel, we test possible degeneracies of the former with theories that modify general relativity. In Chapter 4, we study the presence of inhomogeneities on the propagation of GWs. We derive a modified propagation equation due to the presence of matter anisotropies. We solve the latter numerically, exploiting realistic environments from numerical simulations. For this, we develop a novel ray-tracing approach. As a result, we are able to derive potential effects on the stochastic GW background and examine lensing observables and their cosmological applications. Finally, in Chapter 5, we investigate the power of galaxies’ clustering on a statistical method of determination of the Hubble-Lemaˆıtre parameter based on GWs. Since the majority of GWs observations will not have an EM counterpart, we can combine in a statistical way, the galaxies that reside within the GWs sky localisation region, in order to determine H0. We estimate the capabilities of the method, in realistic environments with clustered galaxies, from numerical simulations. At the same time, we evaluate the importance of clustering, when galaxies are missing from survey catalogues, due to observational limitations

    酸性条件下におけるメタン生成古細菌の死滅促進現象とその数理モデルに関する研究

    Get PDF
    The study focused on the effects of low pH on the decay rate of methanogenic microorganisms and their potential VFA inhibition. Results showed that low pH conditions, with a pH below 6.5, significantly enhanced the decay rate of methanogenic microorganisms, leading to irreversible inhibition. Whereas, the acidogenic bacteria were found to be more tolerant to the acidic environment. The biocidal phenomenon was mainly attributed to the proton concentration in the system and was not affected by the presence of phosphate buffer. The impacts of acidic species (C1–C5 VFAs) and the VFA concentration (until 40 mM of un-dissociated VFAs) on the biomass decay were negligibly small comparing to the low pH stress. A modified ADM1 equipped with a lag-phase sub-model could reproduce the deterioration of reactor performance during an acidic failure and resulte in a 37% improvement in the Nash-Sutcliffe efficiency coefficient compared to the calibrated ADM1 model.北九州市立大

    Brain Computations and Connectivity [2nd edition]

    Get PDF
    This is an open access title available under the terms of a CC BY-NC-ND 4.0 International licence. It is free to read on the Oxford Academic platform and offered as a free PDF download from OUP and selected open access locations. Brain Computations and Connectivity is about how the brain works. In order to understand this, it is essential to know what is computed by different brain systems; and how the computations are performed. The aim of this book is to elucidate what is computed in different brain systems; and to describe current biologically plausible computational approaches and models of how each of these brain systems computes. Understanding the brain in this way has enormous potential for understanding ourselves better in health and in disease. Potential applications of this understanding are to the treatment of the brain in disease; and to artificial intelligence which will benefit from knowledge of how the brain performs many of its extraordinarily impressive functions. This book is pioneering in taking this approach to brain function: to consider what is computed by many of our brain systems; and how it is computed, and updates by much new evidence including the connectivity of the human brain the earlier book: Rolls (2021) Brain Computations: What and How, Oxford University Press. Brain Computations and Connectivity will be of interest to all scientists interested in brain function and how the brain works, whether they are from neuroscience, or from medical sciences including neurology and psychiatry, or from the area of computational science including machine learning and artificial intelligence, or from areas such as theoretical physics

    Lessons from Formally Verified Deployed Software Systems (Extended version)

    Full text link
    The technology of formal software verification has made spectacular advances, but how much does it actually benefit the development of practical software? Considerable disagreement remains about the practicality of building systems with mechanically-checked proofs of correctness. Is this prospect confined to a few expensive, life-critical projects, or can the idea be applied to a wide segment of the software industry? To help answer this question, the present survey examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use. It considers the technologies used, the form of verification applied, the results obtained, and the lessons that can be drawn for the software industry at large and its ability to benefit from formal verification techniques and tools. Note: a short version of this paper is also available, covering in detail only a subset of the considered systems. The present version is intended for full reference.Comment: arXiv admin note: text overlap with arXiv:1211.6186 by other author

    Memory Safety Acceleration on RISC-V for C Programming Language

    Full text link
    Memory corruption vulnerabilities can lead to memory attacks. Three of the top ten most dangerous weaknesses in computer security are memory-related. Memory attack is one of a computer system’s oldest but everlasting problems. Companies and the government lost billions of dollars due to memory security breaches. Memory safety is paramount to securing memory systems. Pointer-based memory safety protection has been shown as a promising solution covering both out-of-bounds and use-after-free errors. However, pointer-based memory safety relies on additional information (metadata) to check validity when a pointer is dereferenced. Such operations on the metadata introduce significant performance overhead to the system. Existing hardware/software implementations are primarily limited to proprietary closed-source microprocessors, simulation-only studies, or require changes to the input source code. In order to provide the need for memory security, we created a memory-safe RISC-V platform with low-performance overhead. In this thesis, a novel hardware/software co-design methodology consisting of a RISC-V based processor is extended with new instructions and microarchitecture enhancements, enabling complete memory safety in the C programming language and faster memory safety checks. Furthermore, a compiler is instrumented to provide security operations considering the changes to the processor. Moreover, a design exploration framework is proposed to provide an in-depth search for optimal hardware/software configuration for application-specific workloads regarding performance overhead, security coverage, area cost, and critical path latency. The entire system is realized by enhancing a RISC-V Rocket-chip system-on-chip (SoC). The resultant processor SoC is implemented on an FPGA and evaluated with applications from SPEC 2006 (for generic applications), MiBench (for embedded applications), and Olden benchmark suites for performance. The system, including the RISC-V CHISEL, compiler, profiling and analysis tool-chain, is fully available and open-source to the public
    corecore