48 research outputs found

    An inter-cluster communication facility for lightweight manycore processors in the Nanvix OS

    Get PDF
    TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.Em conjunto com a maior escalabilidade e eficiência energética, os processadores lightweight manycores trouxeram um novo conjunto de desafios no desenvolvimento de software provenientes de suas particularidades arquiteturais. Neste contexto, sistemas operacionais tornam o desenvolvimento de aplicações menos onerosos, menos suscetíveis a erros e mais eficientes. A camada de abstração provida pelos sistemas operacionais suprime as características do hardware sob uma perspectiva simplificada e eficaz. No entanto, parte dos desafios de desenvolvimento encontrados em lightweight manycores deriva diretamente de runtimes e sistemas operacionais existentes, que não lidam completamente com a complexidade arquitetural desses processadores. Acreditamos que sistemas operacionais para a próxima geração de lightweight manycores necessitam ser repensados a partir de seus conceitos básicos considerando as severas restrições arquiteturais. Em particular, as abstrações de comunicação desempenham um papel crucial na escalabilidade e desempenho das aplicações devido à natureza distribuída dos manycores. O objetivo deste trabalho é propor mecanismos de comunicação entre clusters para o processador manycore emergente MPPA-256. Estes mecanismos fazem parte de uma Camada de Abstração de Hardware (HAL) genérica e flexível para lightweight manycores que lida diretamente com os principais problemas encontrados no projeto de um sistema operacional para esses processadores. Sob estes mecanismos, serviços de comunicação também serão propostos para um sistema operacional baseado no modelo microkernel, que busca fornecer um esqueleto básico para as abstrações de comunicação. As contribuições deste trabalho estão inseridas em um contexto de pesquisa mais amplo, que procura investigar a criação de um sistema operacional distribuído baseado em uma abordagem multikernel, denominado Nanvix OS. O Nanvix OS se concentrará em questões de programabilidade e portabilidade através de um sistema operacional compatível com o padrão POSIX para lightweight manycore. Os resultados mostram como algoritmos distribuídos conhecidos podem ser eficientemente suportados pelo Nanvix OS e incentivam melhorias providas pelo uso adequado dos aceleradores de Acesso Direto à Memória (DMA).Jointly with further scalability and energy efficiency, lightweight manycores brought a new set of challenges in software development coming from their architectural particularities. In this context, Operating Systems (OSs) make application development less costly, less error-prone, and more efficient. The abstraction layer provided by OSs suppresses hardware characteristics from a simplified and productive perspective. However, part of the development challenges encountered in lightweight manycores stems from the existing runtimes and OSs, which do not entirely address the complexity of these processors. We believe that OSs for the next generation of lightweight manycores must be redesigned from scratch to cope with their tight architectural constraints. In particular, communication abstractions play a crucial role in application scalability and performance due to the distributed nature of manycores. The purpose of this undergraduate dissertation is to propose an inter-cluster communication facility for the emerging manycore MPPA-256 processor. This facility is part of a generic and flexible Hardware Abstraction Layer (HAL) that deals directly with the key issues encountered in designing an OS for these processors. Above this facility, communication services will also be proposed for an OS based on the microkernel model, which seeks to provide a basic framework for communication abstractions. The contributions of this undergraduate dissertation are embedded in a broader research context that aims to investigate the creation of a distributed OS based on a multikernel approach, called Nanvix OS. Nanvix OS focuses on programmability and portability issues for manycores through a POSIX-compliant OS. The results present how well known distributed algorithms can be efficiently supported by Nanvix OS and encourage improvements provided by the proper use of Direct memory access (DMA) accelerators

    A Unified Operating System for Clouds and Manycore: fos

    Get PDF
    Single chip processors with thousands of cores will be available in the next ten years and clouds of multicore processors afford the operating system designer thousands of cores today. Constructing operating systems for manycore and cloud systems face similar challenges. This work identifies these shared challenges and introduces our solution: a factored operating system (fos) designed to meet the scalability, faultiness, variability of demand, and programming challenges of OSâ s for single-chip thousand-core manycore systems as well as current day cloud computers. Current monolithic operating systems are not well suited for manycores and clouds as they have taken an evolutionary approach to scaling such as adding fine grain locks and redesigning subsystems, however these approaches do not increase scalability quickly enough. fos addresses the OS scalability challenge by using a message passing design and is composed out of a collection of Internet inspired servers. Each operating system service is factored into a set of communicating servers which in aggregate implement a system service. These servers are designed much in the way that distributed Internet services are designed, but provide traditional kernel services instead of Internet services. Also, fos embraces the elasticity of cloud and manycore platforms by adapting resource utilization to match demand. fos facilitates writing applications across the cloud by providing a single system image across both future 1000+ core manycores and current day Infrastructure as a Service cloud computers. In contrast, current cloud environments do not provide a single system image and introduce complexity for the user by requiring different programming models for intra- vs inter-machine communication, and by requiring the use of non-OS standard management tools

    RMem: An OS Service for Transparent Remote Memory Access in Lightweight Manycores

    Get PDF
    International audienceLightweight manycores deliver high performance and scal-ability at low power consumption. However, architectural intricacies of these processors impose programmability challenges that keep them away from mass adoption. While several efforts aim at introducing parallel programming environments to lightweight manycores, few initiatives are concerned about how to design rich Operating Systems (OSs) to them. In this work, we focus on the open challenges that arise from constrained memory subsystems of lightweight manycores, such as the presence of multiple address spaces and limited on-chip memory. To cope with transparent data access in this scenario, we introduce an OS service, named RMem. This service provides a shared memory abstraction over multiple address spaces and exposes system calls that enable one-sided communication on top of this abstraction. We implemented a prototype of our service in the Nanvix research OS, and we deployed the system the Kalray MPPA-256 lightweight manycore. Our experimental results with a microbenchmark unveiled that, while exposing an easier-to-program interface, the RMem Service may deliver about 91% of the write performance and up to 2.4× better read performance than the primitives in the libraries of the experimental platform

    HARE: Final Report

    Get PDF
    This report documents the results of work done over a 6 year period under the FAST-OS programs. The first effort was called Right-Weight Kernels, (RWK) and was concerned with improving measurements of OS noise so it could be treated quantitatively; and evaluating the use of two operating systems, Linux and Plan 9, on HPC systems and determining how these operating systems needed to be extended or changed for HPC, while still retaining their general-purpose nature. The second program, HARE, explored the creation of alternative runtime models, building on RWK. All of the HARE work was done on Plan 9. The HARE researchers were mindful of the very good Linux and LWK work being done at other labs and saw no need to recreate it. Even given this limited funding, the two efforts had outsized impact: _ Helped Cray decide to use Linux, instead of a custom kernel, and provided the tools needed to make Linux perform well _ Created a successor operating system to Plan 9, NIX, which has been taken in by Bell Labs for further development _ Created a standard system measurement tool, Fixed Time Quantum or FTQ, which is widely used for measuring operating systems impact on applications _ Spurred the use of the 9p protocol in several organizations, including IBM _ Built software in use at many companies, including IBM, Cray, and Google _ Spurred the creation of alternative runtimes for use on HPC systems _ Demonstrated that, with proper modifications, a general purpose operating systems can provide communications up to 3 times as effective as user-level libraries Open source was a key part of this work. The code developed for this project is in wide use and available at many places. The core Blue Gene code is available at https://bitbucket.org/ericvh/hare. We describe details of these impacts in the following sections. The rest of this report is organized as follows: First, we describe commercial impact; next, we describe the FTQ benchmark and its impact in more detail; operating systems and runtime research follows; we discuss infrastructure software; and close with a description of the new NIX operating system, future work, and conclusions

    PIKA: A Network Service for Multikernel Operating Systems

    Get PDF
    PIKA is a network stack designed for multikernel operating systems that target potential future architectures lacking cache-coherent shared memory but supporting message passing. PIKA splits the network stack into several servers that communicate using a low-overhead message passing layer. A key challenge faced by PIKA is the maintenance of shared state, such as a single accept queue and load balance information. PIKA addresses this challenge using a speculative 3-way handshake for connection acceptance, and a new distributed load balancing scheme for spreading connections. A PIKA prototype achieves competitive performance, excellent scalability, and low service times under load imbalance on commodity hardware. Finally, we demonstrate that splitting network stack processing by function across separate cores is a net loss on commodity hardware, and we describe conditions under which it may be advantageous

    Quest-V: A Virtualized Multikernel for High-Confidence Systems

    Full text link
    This paper outlines the design of `Quest-V', which is implemented as a collection of separate kernels operating together as a distributed system on a chip. Quest-V uses virtualization techniques to isolate kernels and prevent local faults from affecting remote kernels. This leads to a high-confidence multikernel approach, where failures of system subcomponents do not render the entire system inoperable. A virtual machine monitor for each kernel keeps track of shadow page table mappings that control immutable memory access capabilities. This ensures a level of security and fault tolerance in situations where a service in one kernel fails, or is corrupted by a malicious attack. Communication is supported between kernels using shared memory regions for message passing. Similarly, device driver data structures are shareable between kernels to avoid the need for complex I/O virtualization, or communication with a dedicated kernel responsible for I/O. In Quest-V, device interrupts are delivered directly to a kernel, rather than via a monitor that determines the destination. Apart from bootstrapping each kernel, handling faults and managing shadow page tables, the monitors are not needed. This differs from conventional virtual machine systems in which a central monitor, or hypervisor, is responsible for scheduling and management of host resources amongst a set of guest kernels. In this paper we show how Quest-V can implement novel fault isolation and recovery techniques that are not possible with conventional systems. We also show how the costs of using virtualization for isolation of system services does not add undue overheads to the overall system performance

    Rhymes: a shared virtual memory system for non-coherent tiled many-core architectures

    Get PDF
    The rising core count per processor is pushing chip complexity to a level that hardware-based cache coherency protocols become too hard and costly to scale. We need new designs of many-core hardware and software other than traditional technologies to keep up with the ever-increasing scalability demands. The Intel Single-chip Cloud Computer (SCC) is a recent research processor exemplifying a new cluster-on-chip architecture which promotes a software-oriented approach instead of hardware support to implementing shared memory coherence. This paper presents a shared virtual memory (SVM) system, dubbed Rhymes, tailored to such a new processor kind of non-coherent and hybrid memory architectures. Rhymes features a two-way cache coherence protocol to enforce release consistency for pages allocated in shared physical memory (SPM) and scope consistency for pages in per-core private memory. It also supports page remapping on a per-core basis to boost data locality. We implement Rhymes on the SCC port of the Barrelfish OS. Experimental results show that our SVM outperforms the pure SPM approach used by Intel's software managed coherence (SMC) library by up to 12 times, with superlinear speedups (due to L2 cache effect) noted for applications with strong data reuse patterns.published_or_final_versio
    corecore