Search CORE

646 research outputs found

Improving GPGPU Energy-Efficiency through Concurrent Kernel Execution and DVFS

Author: JIAO QING
Publication venue
Publication date: 22/09/2014
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Comparative study of a time diversity scheme applied to G3 systems for narrowband power-line communications

Author: Rivard Yves-François
Publication venue
Publication date: 01/01/2016
Field of study

A dissertation submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Masters of Science in Engineering (Electrical). Johannesburg, 2016Power-line communications can be used for the transfer of data across electrical net- works in applications such as automatic meter reading in smart grid technology. As the power-line channel is harsh and plagued with non-Gaussian noise, robust forward error correction schemes are required. This research is a comparative study where a Luby transform code is concatenated with power-line communication systems provided by an up-to-date standard published by electricit e R eseau Distribution France named G3 PLC. Both decoding using Gaussian elimination and belief propagation are imple- mented to investigate and characterise their behaviour through computer simulations in MATLAB. Results show that a bit error rate performance improvement is achiev- able under non worst-case channel conditions using a Gaussian elimination decoder. An adaptive system is thus recommended which decodes using Gaussian elimination and which has the appropriate data rate. The added complexity can be well tolerated especially on the receiver side in automatic meter reading systems due to the network structure being built around a centralised agent which possesses more resources.MT201

Wits Institutional Repository on DSPACE

HARS, a Heterogeneity-Aware Runtime System for Self-Adaptive Multithreaded Applications

Author: Yun Jaeyoung
Publication venue: Graduate School of UNIST
Publication date: 01/02/2016
Field of study

Department of Computer EngineeringWe are in the age of rapid changes. In particular, the computer is not just a device for carrying out complex computations. Various users can use these devices in various ways. Someone could just use a computer for document writing, while others can use one for video encoding or playing computer games. Each process has different computational demands. If we use a supercomputer for writing documents only, it is truly a wasteful mistake. It is like using a sledgehammer to crack a nut. Each process has own computational demands. If we can give the proper computation power to each process, this would be more sufficient than the before. In this manner, heterogeneous multi-processing (HMP) arose. HMP is a promising technique that can support both high and low demand tasks efficiently. This topic has been investigated in some prior works but an efficient system software to support HMP with self-adaptive computing has been little researched, especially on multithreaded applications. Therefore, we propose HARS, a heterogeneity-aware runtime system for self-adaptive multithreaded applications. HARS monitors application-level performance data and dynamically controls the system state to achieve the performance target with efficient power consumption. As an extended version of HARS, we also propose MP-HARS, which that supports multiple applications. Through our evaluation, we can see that HARS and MP-HARS achieve higher efficiency then the baseline version and HARS is comparable to the static optimal version.ope

ScholarWorks@UNIST

Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

Author: Huck R. W.
Rafferty William
Reekie D. Hugh M.
Publication venue
Publication date
Field of study

Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

NASA Technical Reports Server

Recommended from our members

A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI

Author: Alon E
Asanović K
Avizienis R
Bailey S
Blagojević M
Chen PH
Chiu PF
Flatresse P
Jevtić R
Keller B
Kwak J
Le HP
Lee Y
Nikolić B
Puggelli A
Richards B
Sutardja N
Waterman A
Zimmer B
Publication venue: eScholarship, University of California
Publication date: 01/04/2016
Field of study

This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices

eScholarship - University of California

Recommended from our members

Mobile Audiovisual Terminal: System Design and Subjective Testing in DECT and UMTS networks

Author: Cosmas J
Gill D
Pearmain A
Publication venue: IEEE*
Publication date: 01/07/2000
Field of study

It is anticipated that there will shortly be a requirement for multimedia terminals that operate via mobile communications systems. This paper presents a functional specification for such a terminal operating at 32 kb/s in a digital European cordless telecommunications (DECT) and universal mobile telecommunications system (UMTS) radio network. A terminal has been built, based on a PC with digital signal processor (DSP) boards for audio and video coding and decoding. Speech coding is by a phonetically driven code-excited linear prediction (CELP) speech coder and video coding by a block-oriented hybrid discrete cosine transform (DCT) coder. Separate channel coding is provided for the audio and video data. The paper describes the techniques used for audio and video coding, channel coding, and synchronization. Methods of subjective testing in a DECT network and in a UMTS network are also described. These consisted of subjective tests of first impressions of the mobile audio–visual terminal (MAVT) quality, interactive tests, and the completion of an exit questionnaire. The test results showed that the quality of the audio was sufficiently good for comprehension and the video was sufficiently good for following and repeating simple mechanical tasks. However, the quality of the MAVT was not good enough for general use where high-quality audio and video was needed, especially when transmission was in a noisy radio environment

Brunel University Research Archive

Rubik: fast analytical power management for latency-critical systems

Author: Bartolini Davide Basilio
Beckmann Nathan Zachary
Kasture Harshad
Sanchez Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2015
Field of study

Latency-critical workloads (e.g., web search), common in datacenters, require stable tail (e.g., 95th percentile) latencies of a few milliseconds. Servers running these workloads are kept lightly loaded to meet these stringent latency targets. This low utilization wastes billions of dollars in energy and equipment annually. Applying dynamic power management to latency-critical workloads is challenging. The fundamental issue is coping with their inherent short-term variability: requests arrive at unpredictable times and have variable lengths. Without knowledge of the future, prior techniques either adapt slowly and conservatively or rely on application-specific heuristics to maintain tail latency. We propose Rubik, a fine-grain DVFS scheme for latency-critical workloads. Rubik copes with variability through a novel, general, and efficient statistical performance model. This model allows Rubik to adjust frequencies at sub-millisecond granularity to save power while meeting the target tail latency. Rubik saves up to 66% of core power, widely outperforms prior techniques, and requires no application-specific tuning. Beyond saving core power, Rubik robustly adapts to sudden changes in load and system performance. We use this capability to design RubikColoc, a colocation scheme that uses Rubik to allow batch and latency-critical work to share hardware resources more aggressively than prior techniques. RubikColoc reduces datacenter power by up to 31% while using 41% fewer servers than a datacenter that segregates latency-critical and batch work, and achieves 100% core utilization.National Science Foundation (U.S.) (Grant CCF-1318384

DSpace@MIT

Crossref

Enhancing the efficiency and practicality of software transactional memory on massively multithreaded systems

Author: Kestor Gökçen
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one stream of instructions in parallel. To exploit CMT's capabilities, programmers have to parallelize their applications, which is not a trivial task. Transactional Memory (TM) is one of parallel programming models that aims at simplifying synchronization by raising the level of abstraction between semantic atomicity and the means by which that atomicity is achieved. TM is a promising programming model but there are still important challenges that must be addressed to make it more practical and efficient in mainstream parallel programming. The first challenge addressed in this dissertation is that of making the evaluation of TM proposals more solid with realistic TM benchmarks and being able to run the same benchmarks on different STM systems. We first introduce a benchmark suite, RMS-TM, a comprehensive benchmark suite to evaluate HTMs and STMs. RMS-TM consists of seven applications from the Recognition, Mining and Synthesis (RMS) domain that are representative of future workloads. RMS-TM features current TM research issues such as nesting and I/O inside transactions, while also providing various TM characteristics. Most STM systems are implemented as user-level libraries: the programmer is expected to manually instrument not only transaction boundaries, but also individual loads and stores within transactions. This library-based approach is increasingly tedious and error prone and also makes it difficult to make reliable performance comparisons. To enable an "apples-to-apples" performance comparison, we then develop a software layer that allows researchers to test the same applications with interchangeable STM back ends. The second challenge addressed is that of enhancing performance and scalability of TM applications running on aggressive multi-core/multi-threaded processors. Performance and scalability of current TM designs, in particular STM desings, do not always meet the programmer's expectation, especially at scale. To overcome this limitation, we propose a new STM design, STM2, based on an assisted execution model in which time-consuming TM operations are offloaded to auxiliary threads while application threads optimistically perform computation. Surprisingly, our results show that STM2 provides, on average, speedups between 1.8x and 5.2x over state-of-the-art STM systems. On the other hand, we notice that assisted-execution systems may show low processor utilization. To alleviate this problem and to increase the efficiency of STM2, we enriched STM2 with a runtime mechanism that automatically and adaptively detects application and auxiliary threads' computing demands and dynamically partition hardware resources between the pair through the hardware thread prioritization mechanism implemented in POWER machines. The third challenge is to define a notion of what it means for a TM program to be correctly synchronized. The current definition of transactional data race requires all transactions to be totally ordered "as if'' serialized by a global lock, which limits the scalability of TM designs. To remove this constraint, we first propose to relax the current definition of transactional data race to allow a higher level of concurrency. Based on this definition we propose the first practical race detection algorithm for C/C++ applications (TRADE) and implement the corresponding race detection tool. Then, we introduce a new definition of transactional data race that is more intuitive, transparent to the underlying TM implementation, can be used for a broad set of C/C++ TM programs. Based on this new definition, we proposed T-Rex, an efficient and scalable race detection tool for C/C++ TM applications. Using TRADE and T-Rex, we have discovered subtle transactional data races in widely-used STAMP applications which have not been reported in the past

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Design for scalability in 3D computer graphics architectures

Author: Holten-Lund Hans Erik
Publication venue
Publication date: 01/03/2002
Field of study

Online Research Database In Technology

On-board B-ISDN fast packet switching architectures. Phase 1: Study

Author: Faris Faris
Inukai Thomas
Lee Fred
Paul Dilip
Shyy Dong-Jye
Publication venue
Publication date
Field of study

The broadband integrate services digital network (B-ISDN) is an emerging telecommunications technology that will meet most of the telecommunications networking needs in the mid-1990's to early next century. The satellite-based system is well positioned for providing B-ISDN service with its inherent capabilities of point-to-multipoint and broadcast transmission, virtually unlimited connectivity between any two points within a beam coverage, short deployment time of communications facility, flexible and dynamic reallocation of space segment capacity, and distance insensitive cost. On-board processing satellites, particularly in a multiple spot beam environment, will provide enhanced connectivity, better performance, optimized access and transmission link design, and lower user service cost. The following are described: the user and network aspects of broadband services; the current development status in broadband services; various satellite network architectures including system design issues; and various fast packet switch architectures and their detail designs

NASA Technical Reports Server