3,428 research outputs found

    Exploiting a new level of DLP in multimedia applications

    Get PDF
    This paper proposes and evaluates MOM: a novel ISA paradigm targeted at multimedia applications. By fusing conventional vector ISA approaches together with more recent SIMD-like (Single Instruction Multiple Data) ISAs (such as MMX), we have developed a new matrix oriented ISA which efficiently deals with the small matrix structures typically found in multimedia applications. MOM exploits a level of DLP not reachable by neither conventional vector ISAs nor SIMD-like media ISA extensions. Our results show that MOM provides a factor of 1.3x to 4x performance improvement when compared with two different multimedia extensions (MMX and MDMX) on several kernels, which translates into up to a 50% of performance gain when measuring full applications (20% in average). Furthermore, the streaming nature of MOM provides additional advantages for executing multimedia applications, such as a very low fetch pressure or a high tolerance to memory latency, making MOM an ideal candidate for the embedded domain.Peer ReviewedPostprint (published version

    Three-dimensional memory vectorization for high bandwidth media memory systems

    Get PDF
    Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth. We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file. By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.Peer ReviewedPostprint (published version

    DLP+TLP processors for the next generation of media workloads

    Get PDF
    Future media workloads will require about two levels of magnitude the performance achieved by current general purpose processors. High uni-threaded performance will be needed to accomplish real-time constraints together with huge computational throughput, as next generation of media workloads will be eminently multithreaded (MPEG-4/MPEG-7). In order to fulfil the challenge of providing both good uni-threaded performance and throughput, we propose to join the simultaneous multithreading execution paradigm (SMT) together with the ability to execute media-oriented streaming /spl mu/-SIMD instructions. This paper evaluates the performance of two different aggressive SMT processors: one with conventional /spl mu/-SIMD extensions (such as MMX) and one with longer streaming vector /spl mu/-SIMD extensions. We will show that future media workloads are, in fact, dominated by the scalar performance. The combination of SMT plus streaming vector /spl mu/-SIMD helps alleviate the performance bottleneck of the integer unit. SMT allowsPeer ReviewedPostprint (published version

    Quality Planning for Distributed Collaborative Multimedia Applications

    Get PDF
    The tremendous power and low price of today s computer systems have created the opportunity for exciting applications rich with graphics audio and video Despite this potential planning computer systems to support the intensity of these multimedia applications is an extremely difficult task We have developed a flexible model and method that allows us to predict multimedia application performance from the user s perspective Our model takes into account the components fundamental to multimedia application quality latency jitter and data loss In applying our method to three specific applications we have identified some general traits 1) processors are the bottleneck in performance for many multimedia applications 2) networks with more bandwidth often do not increase the quality of multimedia applications and 3) performance for many multimedia applications can be improved greatly by shifting capacity demand from computer system components that are heavily loaded to those that are more lightly loade
    • …
    corecore