9,157 research outputs found
Improving the Performance and Energy Efficiency of GPGPU Computing through Adaptive Cache and Memory Management Techniques
Department of Computer Science and EngineeringAs the performance and energy efficiency requirement of GPGPUs have risen, memory management techniques of GPGPUs have improved to meet the requirements by employing hardware caches and utilizing heterogeneous memory. These techniques can improve GPGPUs by providing lower latency and higher bandwidth of the memory. However, these methods do not always guarantee improved performance and energy efficiency due to the small cache size and heterogeneity of the memory nodes. While prior works have proposed various techniques to address this issue, relatively little work has been done to investigate holistic support for memory management techniques.
In this dissertation, we analyze performance pathologies and propose various techniques to improve memory management techniques. First, we investigate the effectiveness of advanced cache indexing (ACI) for high-performance and energy-efficient GPGPU computing. Specifically, we discuss the designs of various static and adaptive cache indexing schemes and present implementation for GPGPUs. We then quantify and analyze the effectiveness of the ACI schemes based on a cycle-accurate GPGPU simulator. Our quantitative evaluation shows that ACI schemes achieve significant performance and energy-efficiency gains over baseline conventional indexing scheme. We also analyze the performance sensitivity of ACI to key architectural parameters (i.e., capacity, associativity, and ICN bandwidth) and the cache indexing latency. We also demonstrate that ACI continues to achieve high performance in various settings.
Second, we propose IACM, integrated adaptive cache management for high-performance and energy-efficient GPGPU computing. Based on the performance pathology analysis of GPGPUs, we integrate state-of-the-art adaptive cache management techniques (i.e., cache indexing, bypassing, and warp limiting) in a unified architectural framework to eliminate performance pathologies. Our quantitative evaluation demonstrates that IACM significantly improves the performance and energy efficiency of various GPGPU workloads over the baseline architecture (i.e., 98.1% and 61.9% on average, respectively) and achieves considerably higher performance than the state-of-the-art technique (i.e., 361.4% at maximum and 7.7% on average). Furthermore, IACM delivers significant performance and energy efficiency gains over the baseline GPGPU architecture even when enhanced with advanced architectural technologies (e.g., higher capacity, associativity).
Third, we propose bandwidth- and latency-aware page placement (BLPP) for GPGPUs with heterogeneous memory. BLPP analyzes the characteristics of a application and determines the optimal page allocation ratio between the GPU and CPU memory. Based on the optimal page allocation ratio, BLPP dynamically allocate pages across the heterogeneous memory nodes. Our experimental results show that BLPP considerably outperforms the baseline and state-of-the-art technique (i.e., 13.4% and 16.7%) and performs similar to the static-best version (i.e., 1.2% difference), which requires extensive offline profiling.clos
High-speed Video from Asynchronous Camera Array
This paper presents a method for capturing high-speed video using an
asynchronous camera array. Our method sequentially fires each sensor in a
camera array with a small time offset and assembles captured frames into a
high-speed video according to the time stamps. The resulting video, however,
suffers from parallax jittering caused by the viewpoint difference among
sensors in the camera array. To address this problem, we develop a dedicated
novel view synthesis algorithm that transforms the video frames as if they were
captured by a single reference sensor. Specifically, for any frame from a
non-reference sensor, we find the two temporally neighboring frames captured by
the reference sensor. Using these three frames, we render a new frame with the
same time stamp as the non-reference frame but from the viewpoint of the
reference sensor. Specifically, we segment these frames into super-pixels and
then apply local content-preserving warping to warp them to form the new frame.
We employ a multi-label Markov Random Field method to blend these warped
frames. Our experiments show that our method can produce high-quality and
high-speed video of a wide variety of scenes with large parallax, scene
dynamics, and camera motion and outperforms several baseline and
state-of-the-art approaches.Comment: 10 pages, 82 figures, Published at IEEE WACV 201
Design Considerations for a Highly Segmented Mirror
Design issues for a 30-m highly segmented mirror are explored, with emphasis on parametric models of simple, inexpensive segments. A mirror with many small segments offers cost savings through quantity production and permits high-order active and adaptive wave-front corrections. For a 30-m f/1.5 paraboloidal mirror made of spherical, hexagonal glass segments, with simple warping harnesses and three-point supports, the maximum segment diameter is ~100 mm, and the minimum segment thickness is ~5 mm. Large-amplitude, low-order gravitational deformations in the mirror cell can be compensated if the segments are mounted on a plate floating on astatic supports. Because gravitational deformations in the plate are small, the segment actuators require a stroke of only a few tens of micrometers, and the segment positions can be measured by a wave-front sensor
A Power Cap Oriented Time Warp Architecture
Controlling power usage has become a core objective in modern computing platforms. In this article we present an innovative Time Warp architecture oriented to efficiently run parallel simulations under a power cap. Our architectural organization considers power usage as a foundational design principle, as opposed to classical power-unaware Time Warp design. We provide early experimental results showing the potential of our proposal
Spatio-temporal Video Re-localization by Warp LSTM
The need for efficiently finding the video content a user wants is increasing
because of the erupting of user-generated videos on the Web. Existing
keyword-based or content-based video retrieval methods usually determine what
occurs in a video but not when and where. In this paper, we make an answer to
the question of when and where by formulating a new task, namely
spatio-temporal video re-localization. Specifically, given a query video and a
reference video, spatio-temporal video re-localization aims to localize
tubelets in the reference video such that the tubelets semantically correspond
to the query. To accurately localize the desired tubelets in the reference
video, we propose a novel warp LSTM network, which propagates the
spatio-temporal information for a long period and thereby captures the
corresponding long-term dependencies. Another issue for spatio-temporal video
re-localization is the lack of properly labeled video datasets. Therefore, we
reorganize the videos in the AVA dataset to form a new dataset for
spatio-temporal video re-localization research. Extensive experimental results
show that the proposed model achieves superior performances over the designed
baselines on the spatio-temporal video re-localization task
- …