Search CORE

1,653 research outputs found

Recommended from our members

Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu

eScholarship - University of California

Application-Level Performance Improvement for Stream Program on CGRA-based systems

Author: Lee Hongsik
Publication venue: Graduate School of UNIST
Publication date: 01/02/2016
Field of study

Department of Computer EngineeringCoarse-Grained Reconfigurable Architectures (CGRAs), often used as coprocessors for DSP and multimedia kernels, can deliver highly energy-effcient execution for compute-intensive kernels. Simultaneously, stream applications, which consist of many actors and channels connecting them, can provide natural representations for DSP applications, and therefore be a good match for CGRAs. We present our results of mapping DSP applications written in StreamIt language to CGRAs, along with our mapping flow. One important challenge in mapping is how to manage the multitude of kernels in the application for the limited local memory of a CGRA, for which we present a novel integer linear programming-based solution. Our evaluation results demonstrate that our software and hardware optimizations can help generate highly effcient mapping of stream applications to CGRAs, enabling far more energy-effcient executions (7x worse to 50x better) compared to using state-of-theart GP-GPUs. Further, we eliminate communication overhead and reduce computation overhead using combination of sychronous/asynchronous processors and DMA. This optimization also improve performance by 17.1% on average comparing to baseline system.ope

ScholarWorks@UNIST

Virtualization of network I/O on modern operating systems

Author: Okumura Takashi
Publication venue
Publication date: 22/06/2007
Field of study

Network I/O of modern operating systems is incomplete. In this networkage, users and their applications are still unable to control theirown traffic, even on their local host. Network I/O is a sharedresource of a host machine, and traditionally, to address problemswith a shared resource, system research has virtualized the resource.Therefore, it is reasonable to ask if the virtualization can providesolutions to problems in network I/O of modern operating systems, inthe same way as the other components of computer systems, such asmemory and CPU. With the aim of establishing the virtualization ofnetwork I/O as a design principle of operating systems, thisdissertation first presents a virtualization model, hierarchicalvirtualization of network interface. Systematic evaluation illustratesthat the virtualization model possesses desirable properties forvirtualization of network I/O, namely flexible control granularity,resource protection, partitioning of resource consumption, properaccess control and generality as a control model. The implementedprototype exhibits practical performance with expected functionality,and allowed flexible and dynamic network control by users andapplications, unlike existing systems designed solely for systemadministrators. However, because the implementation was hardcoded inkernel source code, the prototype was not perfect in its functionalcoverage and flexibility. Accordingly, this dissertation investigatedhow to decouple OS kernels and packet processing code throughvirtualization, and studied three degrees of code virtualization,namely, limited virtualization, partial virtualization, and completevirtualization. In this process, a novel programming model waspresented, based on embedded Java technology, and the prototypeimplementation exhibited the following characteristics, which aredesirable for network code virtualization. First, users program inJava to carry out safe and simple programming for packetprocessing. Second, anyone, even untrusted applications, can performinjection of packet processing code in the kernel, due to isolation ofcode execution. Third, the prototype implementation empirically provedthat such a virtualization does not jeopardize system performance.These cases illustrate advantages of virtualization, and suggest thatthe hierarchical virtualization of network interfaces can be aneffective solution to problems in network I/O of modern operatingsystems, both in the control model and in implementation

D-Scholarship@Pitt

GPGPU-Enabled Physics Based Deformed Model Simulation

Author: Hernandez Samaniego Alejandro Enedino
Publication venue: UNM Digital Repository
Publication date: 12/09/2014
Field of study

Computer simulation techniques are widely adopted nowadays in many areas like manufacturing, engineering, graphics, animation, virtual reality and so on. However, the standard finite element based simulation is notorious for its expensive computation. To address this challenge, I present a GPU-based parallel implementation for simulating large elastic deformation. Classic modal analysis provides a set of orthonormal bases vectors, which span a spectral space encoding the dynamics of the elastic body. As each basis vector is orthogonal to each other, the computation is completely decoupled and can be well-fit into the modern GPGPU platform. We further explore the latest feature of NVIDIA CUDA so that the result of GPU computation can be directly used for upcoming rendering/visualization and a significant amount of overheads for transmitting data from client GPU and host CPU via the PCI-Express bus are avoided. Real-time simulation is made possible with this technique for many cases that otherwise is not possible

HeTM: Transactional Memory for Heterogeneous Systems

Author: Castro Daniel
Ilic Aleksandar
Khan Amin M.
Romano Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2019
Field of study

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19

arXiv.org e-Print Archive

Crossref