1,653 research outputs found
Recommended from our members
Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures
The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu
Application-Level Performance Improvement for Stream Program on CGRA-based systems
Department of Computer EngineeringCoarse-Grained Reconfigurable Architectures (CGRAs), often used as coprocessors for DSP and multimedia kernels, can deliver highly energy-effcient execution for compute-intensive kernels. Simultaneously, stream applications, which consist of many actors and channels connecting them, can provide natural representations for DSP applications, and therefore be a good match for CGRAs. We present our results of mapping DSP applications written in StreamIt language to CGRAs, along with our mapping flow. One important challenge in mapping is how to manage the multitude of kernels in the application for the limited local memory of a CGRA, for which we present a novel integer linear programming-based solution. Our evaluation results demonstrate that our software and hardware optimizations can help generate highly effcient mapping of stream applications to CGRAs, enabling far more energy-effcient executions (7x worse to 50x better) compared to using state-of-theart GP-GPUs. Further, we eliminate communication overhead and reduce computation overhead using combination of sychronous/asynchronous processors and DMA. This optimization also improve performance by 17.1% on average comparing to baseline system.ope
Virtualization of network I/O on modern operating systems
Network I/O of modern operating systems is incomplete. In this networkage, users and their applications are still unable to control theirown traffic, even on their local host. Network I/O is a sharedresource of a host machine, and traditionally, to address problemswith a shared resource, system research has virtualized the resource.Therefore, it is reasonable to ask if the virtualization can providesolutions to problems in network I/O of modern operating systems, inthe same way as the other components of computer systems, such asmemory and CPU. With the aim of establishing the virtualization ofnetwork I/O as a design principle of operating systems, thisdissertation first presents a virtualization model, hierarchicalvirtualization of network interface. Systematic evaluation illustratesthat the virtualization model possesses desirable properties forvirtualization of network I/O, namely flexible control granularity,resource protection, partitioning of resource consumption, properaccess control and generality as a control model. The implementedprototype exhibits practical performance with expected functionality,and allowed flexible and dynamic network control by users andapplications, unlike existing systems designed solely for systemadministrators. However, because the implementation was hardcoded inkernel source code, the prototype was not perfect in its functionalcoverage and flexibility. Accordingly, this dissertation investigatedhow to decouple OS kernels and packet processing code throughvirtualization, and studied three degrees of code virtualization,namely, limited virtualization, partial virtualization, and completevirtualization. In this process, a novel programming model waspresented, based on embedded Java technology, and the prototypeimplementation exhibited the following characteristics, which aredesirable for network code virtualization. First, users program inJava to carry out safe and simple programming for packetprocessing. Second, anyone, even untrusted applications, can performinjection of packet processing code in the kernel, due to isolation ofcode execution. Third, the prototype implementation empirically provedthat such a virtualization does not jeopardize system performance.These cases illustrate advantages of virtualization, and suggest thatthe hierarchical virtualization of network interfaces can be aneffective solution to problems in network I/O of modern operatingsystems, both in the control model and in implementation
GPGPU-Enabled Physics Based Deformed Model Simulation
Computer simulation techniques are widely adopted nowadays in many areas like manufacturing, engineering, graphics, animation, virtual reality and so on. However, the standard finite element based simulation is notorious for its expensive computation. To address this challenge, I present a GPU-based parallel implementation for simulating large elastic deformation. Classic modal analysis provides a set of orthonormal bases vectors, which span a spectral space encoding the dynamics of the elastic body. As each basis vector is orthogonal to each other, the computation is completely decoupled and can be well-fit into the modern GPGPU platform. We further explore the latest feature of NVIDIA CUDA so that the result of GPU computation can be directly used for upcoming rendering/visualization and a significant amount of overheads for transmitting data from client GPU and host CPU via the PCI-Express bus are avoided. Real-time simulation is made possible with this technique for many cases that otherwise is not possible
HeTM: Transactional Memory for Heterogeneous Systems
Modern heterogeneous computing architectures, which couple multi-core CPUs
with discrete many-core GPUs (or other specialized hardware accelerators),
enable unprecedented peak performance and energy efficiency levels.
Unfortunately, though, developing applications that can take full advantage of
the potential of heterogeneous systems is a notoriously hard task. This work
takes a step towards reducing the complexity of programming heterogeneous
systems by introducing the abstraction of Heterogeneous Transactional Memory
(HeTM). HeTM provides programmers with the illusion of a single memory region,
shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with
support for atomic transactions. Besides introducing the abstract semantics and
programming model of HeTM, we present the design and evaluation of a concrete
implementation of the proposed abstraction, which we named Speculative HeTM
(SHeTM). SHeTM makes use of a novel design that leverages on speculative
techniques and aims at hiding the inherently large communication latency
between CPUs and discrete GPUs and at minimizing inter-device synchronization
overhead. SHeTM is based on a modular and extensible design that allows for
easily integrating alternative TM implementations on the CPU's and GPU's sides,
which allows the flexibility to adopt, on either side, the TM implementation
(e.g., in hardware or software) that best fits the applications' workload and
the architectural characteristics of the processing unit. We demonstrate the
efficiency of the SHeTM via an extensive quantitative study based both on
synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on
Parallel Architectures and Compilation Techniques (PACT'19
- …