134 research outputs found

    Virtualization for a Network Processor Runtime System

    Get PDF
    The continuing ossification of the Internet is slowing the pace of network innovation. Network diversification presents one solution to this problem, by virtualizing the network at multiple layers. Diversified networks consist of a shared physical substrate, virtual routers (metarouters), and virtual links (metalinks). Virtualizing routers enables smooth and incremental upgrades to new network services. Our current priority for a diversified router prototype is to enable reserved slices of the network for researchers to perform repeatable, high-speed network experiments. General-purpose processors have well established techniques for virtualization, but do not scale efficiently to multi-gigabit speeds. To achieve these speeds, we employ network processors (NPs), typically consisting of multicore, multi-threaded processors with asymmetric, heterogeneous memories. The complexity and lack of hardware thread isolation in NP’s, combined with a lack of simple programming models, creates numerous challenges for effective sharing between metarouters. In this paper, we detail strategies for enabling NP virtualization at the link, memory, and processor levels, to better enable a research infrastructure for network innovation

    Instruction fusion and vector processor virtualization for higher throughput simultaneous multithreaded processors

    Get PDF
    The utilization wall, caused by the breakdown of threshold voltage scaling, hinders performance gains for new generation microprocessors. To alleviate its impact, an instruction fusion technique is first proposed for multiscalar and many-core processors. With instruction fusion, similar copies of an instruction to be run on multiple pipelines or cores are merged into a single copy for simultaneous execution. Instruction fusion applied to vector code enables the processor to idle early pipeline stages and instruction caches at various times during program implementation with minimum performance degradation, while reducing the program size and the required instruction memory bandwidth. Instruction fusion is applied to a MIPS-based dual-core that resembles an ideal multiscalar of degree two. Benchmarking using an FPGA prototype shows a 6-11% reduction in dynamic power dissipation as well as a 17-45% decrease in code size with frequent performance improvements due to higher instruction cache hit rates. The second part of this dissertation deals with vector processors (VPs) which are commonly assigned exclusively to a single thread/core, and are not often performance and energy efficient due to mismatches with the vector needs of individual applications. An easy-to-implement VP virtualization technology is presented to improve the VP in terms of utilization and energy efficiency. The proposed VP virtualization technology, when applied, improves aggregate VP utilization by enabling simultaneous execution of multiple threads of similar or disparate vector lengths on a multithreaded VP. With a vector register file (VRF) virtualization technique invented to dynamically allocate physical vector registers to threads, the virtualization approach improves programmer productivity by providing at run time a distinct physical register name space to each competing thread, thus eliminating the need to solve register name conflicts statically. The virtualization technique is applied to a multithreaded VP prototyped on an FPGA; it supports VP sharing as well as power gating for better energy efficiency. A throughput-driven scheduler is proposed to optimize the virtualized VP’s utilization in dynamic environments where diverse threads are created randomly. Simulations of various low utilization benchmarks show that, with the proposed scheduler and power gating, the virtualized VP yields a larger than 3-fold speedup while the reduction in the total energy consumption approaches 40% compared to the same VP running in the single-threaded mode. The third part of this dissertation focuses on combining the two aforementioned technologies to create an improved VP prototype that is fully virtualized to support thread fusion and dynamic lane-based power-gating (PG). The VP is capable of dynamically triggering thread fusion according to the availability of similar threads in the task queue. Once thread fusion is triggered, every vector instruction issued to the virtualized VP is interpreted as two similar instructions working in two independent virtual spaces, thus doubling the vector instruction issue rate. Based on an accurate power model of the VP prototype, two different policies are proposed to dynamically choose the optimal number of active VP lanes. With the combined effort of VP lane-based PG and thread fusion, compared to a conventional VP without the two proposed capabilities, benchmarking shows that the new prototype yields up to 33.8% energy reduction in addition to 40% runtime improvement, or up to 62.7% reduction in the product of energy and runtime

    Virtual network security: threats, countermeasures, and challenges

    Get PDF
    Network virtualization has become increasingly prominent in recent years. It enables the creation of network infrastructures that are specifically tailored to the needs of distinct network applications and supports the instantiation of favorable environments for the development and evaluation of new architectures and protocols. Despite the wide applicability of network virtualization, the shared use of routing devices and communication channels leads to a series of security-related concerns. It is necessary to provide protection to virtual network infrastructures in order to enable their use in real, large scale environments. In this paper, we present an overview of the state of the art concerning virtual network security. We discuss the main challenges related to this kind of environment, some of the major threats, as well as solutions proposed in the literature that aim to deal with different security aspects.Network virtualization has become increasingly prominent in recent years. It enables the creation of network infrastructures that are specifically tailored to the needs of distinct network applications and supports the instantiation of favorable environme61CNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICORNP - REDE NACIONAL DE ENSINO E PESQUISAFAPERGS - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DO RIO GRANDE DO SULsem informaçãosem informaçãosem informaçã

    Accurate emulation of CPU performance

    Get PDF
    This paper addresses the question of CPU performance emulation, which allows experimenters to evaluate applications under a wide range of reproducible experimental conditions. Specifically, we propose Fracas, a CPU emulator that leverages the Linux Completely Fair Scheduler to achieve performance emulation of homogeneous or heterogeneous multi-core systems. Several benchmarks reproducing different types of workload (CPU-bound, IO-bound) are then used to thoroughly compare Fracas with another CPU emulator and hardware frequency scaling. We show that the design of Fracas results in a more accurate and a less intrusive CPU emulation solution

    Overlay virtualized wireless sensor networks for application in industrial internet of things : a review

    Get PDF
    Abstract: In recent times, Wireless Sensor Networks (WSNs) are broadly applied in the Industrial Internet of Things (IIoT) in order to enhance the productivity and efficiency of existing and prospective manufacturing industries. In particular, an area of interest that concerns the use of WSNs in IIoT is the concept of sensor network virtualization and overlay networks. Both network virtualization and overlay networks are considered contemporary because they provide the capacity to create services and applications at the edge of existing virtual networks without changing the underlying infrastructure. This capability makes both network virtualization and overlay network services highly beneficial, particularly for the dynamic needs of IIoT based applications such as in smart industry applications, smart city, and smart home applications. Consequently, the study of both WSN virtualization and overlay networks has become highly patronized in the literature, leading to the growth and maturity of the research area. In line with this growth, this paper provides a review of the development made thus far concerning virtualized sensor networks, with emphasis on the application of overlay networks in IIoT. Principally, the process of virtualization in WSN is discussed along with its importance in IIoT applications. Different challenges in WSN are also presented along with possible solutions given by the use of virtualized WSNs. Further details are also presented concerning the use of overlay networks as the next step to supporting virtualization in shared sensor networks. Our discussion closes with an exposition of the existing challenges in the use of virtualized WSN for IIoT applications. In general, because overlay networks will be contributory to the future development and advancement of smart industrial and smart city applications, this review may be considered by researchers as a reference point for those particularly interested in the study of this growing field

    Virtualization services: scalable methods for virtualizing multicore systems

    Get PDF
    Multi-core technology is bringing parallel processing capabilities from servers to laptops and even handheld devices. At the same time, platform support for system virtualization is making it easier to consolidate server and client resources, when and as needed by applications. This consolidation is achieved by dynamically mapping the virtual machines on which applications run to underlying physical machines and their processing cores. Low cost processor and I/O virtualization methods efficiently scaled to different numbers of processing cores and I/O devices are key enablers of such consolidation. This dissertation develops and evaluates new methods for scaling virtualization functionality to multi-core and future many-core systems. Specifically, it re-architects virtualization functionality to improve scalability and better exploit multi-core system resources. Results from this work include a self-virtualized I/O abstraction, which virtualizes I/O so as to flexibly use different platforms' processing and I/O resources. Flexibility affords improved performance and resource usage and most importantly, better scalability than that offered by current I/O virtualization solutions. Further, by describing system virtualization as a service provided to virtual machines and the underlying computing platform, this service can be enhanced to provide new and innovative functionality. For example, a virtual device may provide obfuscated data to guest operating systems to maintain data privacy; it could mask differences in device APIs or properties to deal with heterogeneous underlying resources; or it could control access to data based on the ``trust' properties of the guest VM. This thesis demonstrates that extended virtualization services are superior to existing operating system or user-level implementations of such functionality, for multiple reasons. First, this solution technique makes more efficient use of key performance-limiting resource in multi-core systems, which are memory and I/O bandwidth. Second, this solution technique better exploits the parallelism inherent in multi-core architectures and exhibits good scalability properties, in part because at the hypervisor level, there is greater control in precisely which and how resources are used to realize extended virtualization services. Improved control over resource usage makes it possible to provide value-added functionalities for both guest VMs and the platform. Specific instances of virtualization services described in this thesis are the network virtualization service that exploits heterogeneous processing cores, a storage virtualization service that provides location transparent access to block devices by extending the functionality provided by network virtualization service, a multimedia virtualization service that allows efficient media device sharing based on semantic information, and an object-based storage service with enhanced access control.Ph.D.Committee Chair: Schwan, Karsten; Committee Member: Ahamad, Mustaq; Committee Member: Fujimoto, Richard; Committee Member: Gavrilovska, Ada; Committee Member: Owen, Henry; Committee Member: Xenidis, Jim

    Doctor of Philosophy

    Get PDF
    dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications
    corecore