Search CORE

56 research outputs found

Clustered multithreading for speculative execution

Author: Marukatat Rangsipan
Publication venue: The University of Edinburgh
Publication date: 01/01/2003
Field of study

Infrastructure for washable computing

Author: Post E. Rehmi, 1966-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1999.Includes bibliographical references (leaves 73-74).Wash-and-wear multilayer electronic circuitry can be constructed on fabric substrates, using conductive textiles and suitably packaged components. Fabrics are perhaps the first composite materials engineered by humanity; their evolution led to the development of the Jacquard loom, which itself led to the development of the modern computer. The development of fabric circuitry is a compelling closure of the cycle that points to a new class of textiles which interact with their users and their environments, while retaining the properties that made them the first ubiquitous "smart material". Fabrics are in several respects superior to existing flexible substrates in terms of their durability, conformability, and breathability. The present work adopts a modular approach to circuit fabrication, from which follow circuit design techniques and component packages optimized for use in fabric-based circuitry, flexible all-fabric interconnects, and multilayer circuits. While maintaining close compatibility with existing components, tools, and techniques, the present work demonstrates all steps of a process to create multilayer printed circuits on fabric substrates using conductive textiles.by E. Rehmi Post.S.M

DSpace@MIT

A Hybrid Hardware/Software Architecture That Combines a 4-wide Very Long Instruction Word Software Processor (VLIW) with Application-specific Super-complex Instruction Set Hardware Functions

Author: Kusic Dara Marie
Publication venue
Publication date: 13/10/2005
Field of study

Application-driven processor designs are becoming increasingly feasible. Today, advances in field-programmable gate array (FPGA) technology are opening the doors to fast and highly-feasible hardware/software co-designed architectures. Over 100,000 FPGA logic array blocks and nearly 100 ASIC multiply-accumulate cores combine with extensible CPU cores to foster the design of configurable, application-driven hybrid processors.This thesis proposes a hardware/software co-designed architecture targeted to an FPGA. The architecture is a very-long instruction-word (VLIW) processor coupled with super-complex instruction set (SuperCISC) hardware co-processors. Results of the VLIW/SuperCISC show performance speedups over a single-issue processor of 9x to 332x, and entire application speedups from 4x to 127x. Contributions of this research include a 4-way VLIW designed from the ground up, a zero-overhead implementation of a hardware/software interface, evaluation of the scalability of shared data stores, examples of application-specific hardware accelerants, a SystemC simulator, and an evaluation of shared memory configurations

D-Scholarship@Pitt

Performance Characteristics of an Intelligent Memory System

Author: Teller Justin Stevenson
Publication venue
Publication date: 07/07/2004
Field of study

The memory system is increasingly becoming a performance bottleneck. Several intelligent memory systems, such as the ActivePages, DIVA, and IRAM architectures, have been proposed to alleviate the processor-memory bottleneck. This thesis presents the Memory Arithmetic Unit and Interface (MAUI) architecture. The MAUI architecture combines ideas of the ActivePages, DIVA, and ULMT architectures into a new intelligent memory system. A simulator of the MAUI architecture was added to the SimpleScalar v4.0 toolset. Simulation results indicate that the MAUI architecture provides the largest application speedup when operating on datasets that are much too large to fit in the processor's cache and when integrated with systems using a high performance DRAM system and a low performance processor. By coupling a 2000 MHz processor with an 800 MHz DRDRAM DRAM system, the Stream benchmark, originally written by John D. McCalpin, completed 121% faster in simulations when optimized to use the MAUI architecture

Digital Repository at the University of Maryland

Recommended from our members

System Design for Software Packet Processing

Author: Han Sangjin
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The role of software in computer networks has never been more crucial than today, with the advent of Internet-scale services and cloud computing. The trend toward software-based network dataplane—as in network function virtualization—requires software packet processing to meet challenging perfomance requirements, such as supporting exponentially increasing link bandwidth and microsecond-order latency. Many architectural aspects of existing software systems for packet processing, however, are decades old and ill-suited totoday’s network I/O workloads.In this dissertation, we explore the design space of high-performance software packet processing systems in the context of two application domains, . First, we start by discussingthe limitations of BSD Socket, which is a de-facto standard in network I/O for server applications. We quantify its performance limitations and propose a clean-slate API, called MegaPipe, as an alternative to BSD Socket. In the second part of this dissertation, we switch our focus to in-network software systems for network functions, such as network switches and middleboxes. We present Berkeley Extensible Software Switch (BESS), a modular framework for building extensible network functions. BESS introduces various novel techniques to achieve high-performance software packet processing, without compromising on either programmability or flexibility

eScholarship - University of California

An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

Author: Wyngaard Janet Ruth
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2014
Field of study

Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

Cape Town University OpenUCT

Leveraging virtualization technologies for resource partitioning in mixed criticality systems

Author: Li Ye
Publication venue
Publication date: 28/11/2015
Field of study

Multi- and many-core processors are becoming increasingly popular in embedded systems. Many of these processors now feature hardware virtualization capabilities, such as the ARM Cortex A15, and x86 processors with Intel VT-x or AMD-V support. Hardware virtualization offers opportunities to partition physical resources, including processor cores, memory and I/O devices amongst guest virtual machines. Mixed criticality systems and services can then co-exist on the same platform in separate virtual machines. However, traditional virtual machine systems are too expensive because of the costs of trapping into hypervisors to multiplex and manage machine physical resources on behalf of separate guests. For example, hypervisors are needed to schedule separate VMs on physical processor cores. Additionally, traditional hypervisors have memory footprints that are often too large for many embedded computing systems. This dissertation presents the design of the Quest-V separation kernel, which partitions services of different criticality levels across separate virtual machines, or sandboxes. Each sandbox encapsulates a subset of machine physical resources that it manages without requiring intervention of a hypervisor. In Quest-V, a hypervisor is not needed for normal operation, except to bootstrap the system and establish communication channels between sandboxes. This approach not only reduces the memory footprint of the most privileged protection domain, it removes it from the control path during normal system operation, thereby heightening security

Boston University Institutional Repository (OpenBU)

Memory Subsystems for Security, Consistency, and Scalability

Author: Hsu Terry Ching-Hsiang
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

In response to the continuous demand for the ability to process ever larger datasets, as well as discoveries in next-generation memory technologies, researchers have been vigorously studying memory-driven computing architectures that shall allow data-intensive applications to access enormous amounts of pooled non-volatile memory. As applications continue to interact with increasing amounts of components and datasets, existing systems struggle to eÿciently enforce the principle of least privilege for security. While non-volatile memory can retain data even after a power loss and allow for large main memory capacity, programmers have to bear the burdens of maintaining the consistency of program memory for fault tolerance as well as handling huge datasets with traditional yet expensive memory management interfaces for scalability. Today’s computer systems have become too sophisticated for existing memory subsystems to handle many design requirements. In this dissertation, we introduce three memory subsystems to address challenges in terms of security, consistency, and scalability. Specifcally, we propose SMVs to provide threads with fne-grained control over access privileges for a partially shared address space for security, NVthreads to allow programmers to easily leverage nonvolatile memory with automatic persistence for consistency, and PetaMem to enable memory-centric applications to freely access memory beyond the traditional process boundary with support for memory isolation and crash recovery for security, consistency, and scalability

Purdue E-Pubs

A Performance Evaluation of Hypervisor, Unikernel, and Container Network I/O Virtualization

Author: Enberg Pekka
Publication venue: Helsingin yliopisto
Publication date: 01/01/2016
Field of study

Hypervisors and containers are the two main virtualization techniques that enable cloud computing. Both techniques have performance overheads on CPU, memory, networking, and disk performance compared to bare metal. Unikernels have recently been proposed as an optimization for hypervisor-based virtualization to reduce performance overheads. In this thesis, we evaluate network I/O performance overheads for hypervisor-based virtualization using Kernel-based Virtual Machine (KVM) and the OSv unikernel and for container-based virtualization using Docker comparing the different configurations and optimizations. We measure the raw networking latency and throughput and CPU utilization by using the Netperf benchmarking tool and measure network intensive application performance using the Memcached key-value store and the Mutilate benchmarking tool. We show that compared to bare metal Linux, Docker with bridged networking has the least performance overhead with OSv using vhost-net coming a close second

Helsingin yliopiston digitaalinen arkisto

Sisäkkäiset virtuaaliympäristöt

Author: Kuutvuori Jani
Publication venue
Publication date: 28/08/2017
Field of study

Virtual Machines have been a common computation platform in areas of cloud computing for some time now. VMs offer a decent amount of isolation for security and system resources, and from application perspective they behave much like native environments. Software containers are gaining popularity, as a new application delivery technology. Just like VMs, applications started inside containers are running in isolated environments but without the performance overhead caused by virtualization of system resources. This makes containers seem like a more effient option for VMs. In this thesis, different combinations of containers and VMs are benchmarked. For each benchmark, host environment is also measured, to understand the overhead caused by the underlying virtuel environment technology. Benchmarks used include storage and network access benchmarks, and also an application benchmark of compiling Linux kernel. As another part of the thesis, a CPU intensive workload is run on the virtualization host server. Then the benchmarks are repeated, in order to determine how much the given workload effects the benchmark score, and also if this effect can be observed from the virtualization guest side by measuring CPU steal time. Results show that containers are slightly slower in the application benchmark than the host. The main difference is expected to come from the way docker handles storage accesses. With default network configuration, the container is losing in terms of performance to the host. In every benchmark we did, VMs always lost to host and containers in performance.Virtuaalikoneista on tullut yleinen laskenta-alusta pilvitietokoneille. Ne eristävät virtuaaliympäristön muista palveluista samalla fyysisellä koneella ja sovellusten näkökulmasta ne toimivat lähes samalla tavalla kuin natiivit ympäristöt. Ohjelmistokontit ovat nousseet suosioon tehokkaana sovellusten toimitusteknologiana. Molemmat, sekä virtuaalikoneet, että ohjelmistokontit tarjoavat niiden sisällä suoritettaville sovelluksille eristetyn virtuaaliympäristön. Ohjelmistokontit eivät pyri virtualisoimaan kaikkia järjestelmän resursseja vaan käyttävät alla olevaa käyttöjärjestelmän ydintä hyväkseen. Tämä tekee ohjelmistokonteista houkuttelevan vaihtoehdon virtuaalikoneille. Tässä diplomityössä suoritettiin erilaisia suorituskykymittauksia ohjelmistokonttien ja virtuaalikoneiden avulla luoduissa ympäristöissä. Myös alla olevan isäntäkoneen natiivisuorituskyky mitattiin, josta saatiin hyvä arvo erilaisten virtuaaliympäristöjen vertailuun. Mittasimme pysyvän muistin, verkon ja sovelluksen suorituskyvyn. Sovelluksena toimi Linuxin kääntäminen lähdekoodista toimivaksi käyttöjärjestelmäksi. Tuloksemme osoittavat, että sovellussuorituskykytestissä kontit häviävät natiivijärjestelmän suorituskyvylle vain vähän. Eron oletetaan johtuvan tavasta, jolla valitsemamme konttiteknologia hoitaa pysyvän muistin lukemisen ja kirjoittamisen. Oletusverkkoasetuksilla, kontit hävisivät natiivijärjestelmälle myös. Kaikissa tekemissämme suorituskykymittauksissa virtuaalikoneet hävisivät natiivijärjestelmälle sekä ohjelmistokonteille

Aaltodoc Publication Archive