436 research outputs found
Recommended from our members
Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures
The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu
ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment
Applications in science and engineering often require huge computational
resources for solving problems within a reasonable time frame. Parallel
supercomputers provide the computational infrastructure for solving such
problems. A traditional application scheduler running on a parallel cluster
only supports static scheduling where the number of processors allocated to an
application remains fixed throughout the lifetime of execution of the job. Due
to the unpredictability in job arrival times and varying resource requirements,
static scheduling can result in idle system resources thereby decreasing the
overall system throughput. In this paper we present a prototype framework
called ReSHAPE, which supports dynamic resizing of parallel MPI applications
executed on distributed memory platforms. The framework includes a scheduler
that supports resizing of applications, an API to enable applications to
interact with the scheduler, and a library that makes resizing viable.
Applications executed using the ReSHAPE scheduler framework can expand to take
advantage of additional free processors or can shrink to accommodate a high
priority application, without getting suspended. In our research, we have
mainly focused on structured applications that have two-dimensional data arrays
distributed across a two-dimensional processor grid. The resize library
includes algorithms for processor selection and processor mapping. Experimental
results show that the ReSHAPE framework can improve individual job turn-around
time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference
on Parallel Processing (ICPP'07
Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks
Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
A Survey of Techniques for Architecting TLBs
“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used
in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently
and a TLB miss is extremely costly, prudent management of TLB is important for improving performance
and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and
managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and
distinctions. We believe that this paper will be useful for chip designers, computer architects and system
engineers
ACCELERATING STORAGE APPLICATIONS WITH EMERGING KEY VALUE STORAGE DEVICES
With the continuous data explosion in the big data era, traditional software and hardware stack
are facing unprecedented challenges on how to operate on such data scale. Thus, designing new
architectures and efficient systems for data oriented applications has become increasingly critical.
This motivates us to re-think of the conventional storage system design and re-architect both
software and hardware to meet the challenges of scale.
Besides the fast growth of data volume, the increasing demand on storage applications such
as video streaming, data analytics are pushing high performance flash based storage devices to
replace the traditional spinning disks. Such all-flash era increase the data reliability concerns
due to the endurance problem of flash devices. Key-value stores (KVS) are important storage
infrastructure to handle the fast growing unstructured data and have been widely deployed in a
variety of scale-out enterprise applications such as online retail, big data analytic, social networks,
etc. How to efficiently manage data redundancy for key-value stores to provide data reliability, how
to efficiently support range query for key-value stores to accelerate analytic oriented applications
under emerging key-value store system architecture become an important research problem.
In this research, we focus on how to design new software hardware architectures for the keyvalue
store applications to provide reliability and improve query performance. In order to address
the different issues identified in this dissertation, we propose to employ a logical key management
layer, a thin layer above the KV devices that maps logical keys into phsyical keys on the devices.
We show how such a layer can enable multiple solutions to improve the performance and reliability
of KVSSD based storage systems. First, we present KVRAID, a high performance, write
efficient erasure coding management scheme on emerging key-value SSDs. The core innovation
of KVRAID is to propose a logical key management layer that maps logical keys to physical keys
to efficiently pack similar size KV objects and dynamically manage the membership of erasure
coding groups. Unlike existing schemes which manage erasure codes on the block level, KVRAID
manages the erasure codes on the KV object level. In order to achieve better storage efficiency for variable sized objects, KVRAID predefines multiple fixed sizes (slabs) according to the object size
distribution for the erasure code. KVRAID uses a logical to physical key conversion to pack the
KV objects of similar size into a parity group. KVRAID uses a lazy deletion mechanism with a
garbage collector for object updates. Our experiments show that in 100% put case, KVRAID outperforms
software block RAID by 18x in case of throughput and reduces 15x write amplification
(WAF) with only ~5% CPU utilization. In a mixed update/get workloads, KVRAID achieves ~4x
better throughput with ~23% CPU utilization and reduces the storage overhead and WAF by 3.6x
and 11.3x in average respectively.
Second, we present KVRangeDB, an ordered log structure tree based key index that supports
range queries on a hash-based KVSSD. In addition, we propose to pack smaller application records
into a larger physical record on the device through the logical key management layer. We compared
the performance of KVRangeDB against RocksDB implementation on KVSSD and stateof-
art software KV-store Wisckey on block device, on three types of real world applications of
cloud-serving workloads, TABLEFS filesystem and time-series databases. For cloud serving applications,
KVRangeDB achieves 8.3x and 1.7x better 99.9% write tail latency respectively compared
to RocksDB implementation on KV-SSD and Wisckey on block SSD. On the query side,
KVrangeDB only performs worse for those very long scans, but provides fast point queries and
closed range queries. The experiments on TABLEFS demonstrate that using KVRangeDB for
metadata indexing can boost the performance by a factor of ~6.3x in average and reduce ~3.9x
CPU cost for four metadata-intensive workloads compared to RocksDB implementation on KVSSD.
Compared toWisckey, KVRangeDB improves performance by ~2.6x in average and reduces
~1.7x CPU usage.
Third, we propose a generic FPGA accelerator for emerging Minimum Storage Regenerating
(MSR) codes encoding/decoding which maximizes the computation parallelism and minimizes
the data movement between off-chip DRAM and the on-chip SRAM buffers. To demonstrate the
efficiency of our proposed accelerator, we implemented the encoding/decoding algorithms for a
specific MSR code called Zigzag code on Xilinx VCU1525 acceleration card. Our evaluation shows our proposed accelerator can achieve ~2.4-3.1x better throughput and ~4.2-5.7x better
power efficiency compared to the state-of-art multi-core CPU implementation and ~2.8-3.3x better
throughput and ~4.2-5.3x better power efficiency compared to a modern GPU accelerato
- …