19 research outputs found
Energy Efficient Support for All Levels of Parallelism for Complex Media Applications
Complex media applications are becoming increasingly common on general-purpose systems such as desktop, laptop, and handheld computers. However, real-time execution of such applications needs a considerable amount of processing power that often surpasses the capabilities of current superscalar processors. Further, high performance processors are often constrained by power and energy consumption, especially in the mobile systems where media applications have become popular.
The objective of this dissertation is to develop general-purpose processors that can meet the performance demands of future media applications in an energy-efficient way, while also continuing to work well on other common workloads for desktop, laptop, and handheld systems. Fortunately, most media applications have multiple types of parallelism: thread-level, data-level, and instruction-level parallelism (TLP/DLP/ILP). In this work, we investigate exploiting these three forms of parallelism to provide both high performance and energy efficiency.
This dissertation makes three broad contributions. First, we analyze the parallelism in complex media applications and make the case that contemporary media applications require efficient support for multiple types of parallelism, including ILP, TLP, and various forms of data-level parallelism such as sub-word SIMD, short vectors, and streams. Second, to find the most energy efficient way of exploiting TLP, we perform a comparison between chip multi-processing (CMP) and simultaneous multi-threading (SMT). Finally, we propose a complete architecture, called ALP, that effectively supports all levels of parallelism described above in an energy efficient way, using an evolutionary programming model and hardware. The most novel part of ALP is a DLP technique called SIMD vectors and streams, which is integrated within a conventional superscalar based CMP/SMT architecture with sub-word SIMD. This technique lies between sub-word SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important.
More broadly, our results show that conventional architectures augmented with evolutionary mechanisms can provide high performance and energy savings for complex media applications without resorting to radically different architectures and programming paradigms
Energy Efficient Support for All Levels of Parallelism for Complex Media Applications
120 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2005.More broadly, our results show that conventional architectures augmented with evolutionary mechanisms can provide high performance and energy savings for complex media applications without resorting to radically different architectures and programming paradigms.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD
The ALPBench Benchmark Suite for Multimedia Applications
Multimedia applications are becoming increasingly important for a large class of general-purpose processors. Contemporary media applications are highly complex and demand high performance. A distinctive feature of these applications is that they have significant parallelism, including thread- , data-, and instruction- level parallelism, that is potentially well-aligned with the increasing parallelism supported by emerging multi-core architectures. Designing systems to meet the demands of these applications therefore requires a benchmark suite comprising these complex applications and that exposes the parallelism present in them.
This paper makes two main contributions. First, it presents ALPBench, a publicly released benchmark suite that pulls together five complex media applications from various sources: speech recognition (CMU Sphinx 3.3), face recognition (CSU), ray tracing (Tachyon), MPEG-2 encode (MSSG), and MPEG-2 decode (MSSG). We have modified the original applications to expose thread-level and data-level parallelism using POSIX threads and Intel's SSE2 instructions respectively. Second, the paper provides a performance characterization of the ALPBench benchmarks, with a focus on parallelism. Such a characterization is useful for architects and compiler writers for designing systems and compiler optimizations for these applications
ALP: Efficient Support for All Levels of Parallelism for Complex Media Applications
The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalars. We observe that the complexity of contemporary media applications requires support for multiple forms of parallelism, including ILP, TLP, and various forms of DLP such as sub-word SIMD, short vectors, and streams. Based on our observations, we propose an architecture, called ALP, that efficiently integrates all of these forms of parallelism with evolutionary changes to the programming model and hardware. The novel part of ALP is a DLP technique called {\em SIMD vectors and streams (SVectors/SStreams)}, which is integrated within a conventional superscalar based CMP/SMT architecture with sub-word SIMD. This technique lies between sub-word SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important. Specifically, SVectors/SStreams are effective -- compared to a system with the other enhancements in ALP, they give speedups of 1.1X to 3.4X and energy-delay product improvements of 1.1X to 5.1X for applications with DLP
Joint Local and Global Hardware Adaptations for Energy
This work concerns algorithms to control energy-driven architecture adaptations for multimedia applications, without and with dynamic voltage scaling (DVS). We identify a broad design space for adaptation control algorithms based on two attributes: (1) when to adapt or temporal granularity and (2) what structures to adapt or spatial granularity. For each attribute, adaptation may be global or local. Our previous work developed a temporally and spatially global algorithm. It invokes adaptation at the granularity of a full frame of a multimedia application (temporally global) and considers the entire hardware con guration at a time (spatially global). It exploits inter-frame execution time variability, slowing computation just enough to eliminate idle time before the real-time deadline. This paper explores temporally and spatially local algorithms and their integration with the previous global algorithm. The local algorithms invoke architectural adaptation within an application frame to exploit intra-frame execution variability, and attempt to save energy without aecting execution time. We consider local algorithms previously studied for non-real-time applications as well as propose new algorithms. We nd that, for systems without and with DVS, the local algorithms are eective in saving energy for multimedia applications, but the new integrated global and local algorithm is best for the systems and applications studied