169 research outputs found

    Manutenibilidade da semĂąntica de modelos de dados de produtos compartilhados em rede interoperĂĄvel

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico.Os dados manipulados por aplicaçÔes de engenharia tĂȘm sido tratados usando sistemas de gerĂȘncia de banco de dados ou mecanismos dedicados embutidos em sistemas CAx. As tendĂȘncias atuais de competitividade industrial apontam para a necessidade de integrar as aplicaçÔes de engenharia. Duas demandas principais se manifestam: o uso de um mecanismo para o acesso interoperĂĄvel em rede aos dados, e a necessidade de manipular modelos de dados baseados em diferentes paradigmas. Esta tese de doutorado apresenta uma revisĂŁo sobre tecnologia de banco de dados com ĂȘnfase em aplicaçÔes de engenharia, introduz os problemas de intercĂąmbio de dados de produtos e interoperabilidade de aplicaçÔes, e discute o problema de perda semĂąntica na tradução de modelos de dados de produtos compartilhados em rede e baseados em formatos padrĂŁo. Uma anĂĄlise dos problemas que emergem nesta tradução, com o objetivo de avaliar a manutenibilidade da semĂąntica dos dados ao longo de uma rede interoperĂĄvel, Ă© executada e comentada

    A Fully Userspace Remote Storage Access Stack

    Get PDF
    As computer networking has evolved and the available throughput has increased, the efficiency of the network software stack has become increasingly important. This is because the latency introduced by software has gone from insignificant, compared to historically poor network performance, to the largest component of latency for a modern local-area network. Currently, the vast majority of code that accesses the hardware is part of the kernel, because the kernel is responsible for ensuring that user applications do not interfere with each other when accessing the hardware. Remote Direct Memory Access~(RDMA) provides a solution for applications to perform direct data transfers over the network without requiring context switches into the kernel, but relies instead on specialized hardware interfaces to handle the virtual address mappings and transport protocols. This more intelligent hardware allows for direct control from the userspace application, eliminating the cost of context switches into the kernel. This in turn reduces the overall latency of message transfers. Just like networking, storage is currently undergoing a similar evolution. For most of the recent history of computing, the most common durable storage mechanism has been mechanical hard disk drives, which can only be accessed at block level and have high latency compared to the software drivers used to access the data. However, the introduction of solid state disks~(SSDs) based on Flash significantly decreased the latency, as there are no mechanical parts that need to move to access the data. Upcoming non-volatile memory solutions reduce this latency even further, and even allow byte-level access to the storage medium. Thus, just like with networking, software drivers become the bottleneck and we look for solutions to bypass the kernel to improve the efficiency of direct userspace access to storage. This thesis offers two contributions as part of a solution to these problems. The first part introduces urdma, a software RDMA driver which leverages the Data Plane Development Kit (DPDK) to perform network data transfers in userspace without specialized RDMA interface hardware. The second part examines remote locking protocols, which are required for synchronization in distributed storage systems. We define an RDMA locking mechanism referred to as Verbs Offload Locking Technology (VOLT), which allows acquisition of a remote lock object without any CPU usage by the target node. This offloading allows VOLT to be used with disaggregated memory servers that have limited onboard CPU resources, while also lowering the application overhead for remote locking. Finally, we define a bytecode framework using enhanced Berkeley Packet Filter (eBPF) bytecode for extending the capabilities of an RDMA-capable network interface card (NIC) with new operations, and show how this can be used to implement our remote locking operation

    Master of Science

    Get PDF
    thesisThe advent of the era of cheap and pervasive many-core and multicore parallel sys-tems has highlighted the disparity of the performance achieved between novice and expert developers targeting parallel architectures. This disparity is most notiable with software for running general purpose computations on grachics processing units (GPGPU programs). Current methods for implementing GPGPU programs require an expert level understanding of the memory hierarchy and execution model of the hardware to reach peak performance. Even for experts, rewriting a program to exploit these hardware features can be tedious and error prone. Compilers and their ability to make code transformations can assist in the implementation of GPGPU programs, handling many of the target specic details. This thesis presents CUDA-CHiLL, a source to source compiler transformation and code generation framework for the parallelization and optimization of computations expressed in sequential loop nests for running on many-core GPUs. This system uniquely uses a complete scripting language to describe composable compiler transformations that can be written, shared and reused by nonexpert application and library developers. CUDA-CHiLL is built on the polyhedral program transformation and code generation framework CHiLL, which is capable of robust composition of transformations while preserving the correctness of the program at each step. Through its use of powerful abstractions and a scripting interface, CUDA-CHiLL allows for a developer to focus on optimization strategies and ignore the error prone details and low level constructs of GPGPU programming. The high level framework can be used inside an orthogonal auto-tuning system that can quickly evaluate the space of possible implementations. Although specicl to CUDA at the moment, many of the abstractions would hold for any GPGPU framework, particularly Open CL. The contributions of this thesis include a programming language approach to providing transformation abstraction and composition, a unifying framework for general and GPU specicl transformations, and demonstration of the framework on standard benchmarks that show it capable of matching or outperforming hand-tuned GPU kernels

    A shared-disk parallel cluster file system

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em InformĂĄtica Pela Universidade Nova de Lisboa, Faculdade de CiĂȘncias e TecnologiaToday, clusters are the de facto cost effective platform both for high performance computing (HPC) as well as IT environments. HPC and IT are quite different environments and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs). These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds. Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include: · Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data. · Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required). A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was slightly modified, and two kernel modules and a user-level daemon were added. In the prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN. Our benchmarks for non-overlapping writers over a single file shared among processes running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa IBM Shared University Research (SUR

    Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors

    Get PDF
    International audienceParallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Conse-quently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to the recent Mersenne Twister for Graphics Processors (MTGP) that has just been released. Our work provides empirically checked statuses designed to initialize a particular configuration of this generator, in order to prevent any potential bias introduced by the parallelization of the PRNG

    Feasibility study of an Integrated Program for Aerospace vehicle Design (IPAD). Volume 6: IPAD system development and operation

    Get PDF
    The strategy of the IPAD implementation plan presented, proposes a three phase development of the IPAD system and technical modules, and the transfer of this capability from the development environment to the aerospace vehicle design environment. The system and technical module capabilities for each phase of development are described. The system and technical module programming languages are recommended as well as the initial host computer system hardware and operating system. The cost of developing the IPAD technology is estimated. A schedule displaying the flowtime required for each development task is given. A PERT chart gives the developmental relationships of each of the tasks and an estimate of the operational cost of the IPAD system is offered
    • 

    corecore