83 research outputs found

    CABAC accelerator architectures for video compression in future multimedida : a survey

    Get PDF
    The demands for high quality, real-time performance and multi-format video support in consumer multimedia products are ever increasing. In particular, the future multimedia systems require efficient video coding algorithms and corresponding adaptive high-performance computational platforms. The H.264/AVC video coding algorithms provide high enough compression efficiency to be utilized in these systems, and multimedia processors are able to provide the required adaptability, but the algorithms complexity demands for more efficient computing platforms. Heterogeneous (re-)configurable systems composed of multimedia processors and hardware accelerators constitute the main part of such platforms. In this paper, we survey the hardware accelerator architectures for Context-based Adaptive Binary Arithmetic Coding (CABAC) of Main and High profiles of H.264/AVC. The purpose of the survey is to deliver a critical insight in the proposed solutions, and this way facilitate further research on accelerator architectures, architecture development methods and supporting EDA tools. The architectures are analyzed, classified and compared based on the core hardware acceleration concepts, algorithmic characteristics, video resolution support and performance parameters, and some promising design directions are discussed. The comparative analysis shows that the parallel pipeline accelerator architecture seems to be the most promising

    Performance - Complexity Comparison of Receivers for a LTE MIMO–OFDM System

    Get PDF
    Implementation of receivers for spatial multiplexing multiple-input multiple-output (MIMO) orthogonal-frequency-division-multiplexing (OFDM) systems is considered. The linear minimum mean-square error (LMMSE) and the K-best list sphere detector (LSD) are compared to the iterative successive interference cancellation (SIC) detector and the iterative K-best LSD. The performance of the algorithms is evaluated in 3G long-term evolution (LTE) system. The SIC algorithm is found to perform worse than the K-best LSD when the MIMO channels are highly correlated, while the performance difference diminishes when the correlation decreases. The receivers are designed for 2X2 and 4X4 antenna systems and three different modulation schemes. Complexity results for FPGA and ASIC implementations are found. A modification to the K-best LSD which increases its detection rate is introduced. The ASIC receivers are designed to meet the decoding throughput requirements in LTE and the K-best LSD is found to be the most complex receiver although it gives the best reliable data transmission throughput. The SIC receiver has the best performance–complexity tradeoff in the 2X2 system but in the 4X4 case, the K-best LSD is the most efficient. A receiver architecture which could be reconfigured to using a simple or a more complex detector as the channel conditions change would achieve the best performance while consuming the least amount of power in the receiver

    New benchmarking methodology and programming model for big data processing

    Get PDF
    Big data processing is becoming a reality in numerous real-world applications. With the emergence of new data intensive technologies and increasing amounts of data, new computing concepts are needed. The integration of big data producing technologies, such as wireless sensor networks, Internet of Things, and cloud computing, into cyber-physical systems is reducing the available time to find the appropriate solutions. This paper presents one possible solution for the coming exascale big data processing: a data flow computing concept. The performance of data flow systems that are processing big data should not be measured with the measures defined for the prevailing control flow systems. A new benchmarking methodology is proposed, which integrates the performance issues of speed, area, and power needed to execute the task. The computer ranking would look different if the new benchmarking methodologies were used; data flow systems would outperform control flow systems. This statement is backed by the recent results gained from implementations of specialized algorithms and applications in data flow systems. They show considerable factors of speedup, space savings, and power reductions regarding the implementations of the same in control flow computers. In our view, the next step of data flow computing development should be a move from specialized to more general algorithms and applications.Peer ReviewedPostprint (published version

    Constraint-Driven Instructions Selection and Application Scheduling in the DURASE system

    Get PDF
    International audienceThis paper presents a new constraint-driven method for computational pattern selection, mapping and application scheduling using reconfigurable processor extensions. The presented method is a part of DURASE system (Generic Environment for Design and Utilization of Reconfigurable Application-Specific Processors Extensions). The selected processor extensions are implemented as specialized processor instructions. They correspond to computational patterns identified as most frequently occurring or other interesting patterns in the application graph. Our methods can handle both time-constrained and resource-constrained scheduling. Experimental results obtained for the MediaBench and MiBench benchmarks show that the presented method ensures high speed-ups in application execution

    Reconfigurable Systems: A Potential Solution to the von Neumann Bottleneck

    Get PDF
    The difficulty of overcoming the disparity between processor speeds and data access speeds, a condition known as the von Neumann bottleneck, has been a source of consternation for computer hardware developers for many years. Although a number of temporary solutions have been proposed and implemented in modern machines, these solutions have only managed to treat the major symptoms, rather than solve the root problem. As the number of transistors on a chip roughly doubles every two years, the von Neumann bottleneck has continued to tighten in spite of these solutions, prompting some computer hardware professionals to advocate a paradigm shift away from the von Neumann architecture into something entirely new. Many have begun advocating the relatively new technology of reconfigurable systems, popularly known as morphware. The difficulty with adopting a new architectural paradigm, however, is that developers on both sides of the software-hardware spectrum must start from scratch, creating entirely new operating systems, hardware peripherals, application software, and user interfaces, all of which must seem familiar to the end user, yet still take advantage of the improvements morphware has to offer. With this in mind, this thesis builds off of the fundamental theory and current implementations of morphware to describe the processes and products necessary to develop and deliver morphware to the average user as a viable alternative to current technology

    Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)

    Get PDF
    Translating a system requirement into a low-level representation (e.g., register transfer level or RTL) is the typical goal of the design of FPGA-based systems. However, the Design Space Exploration (DSE) needed to identify the final architecture may be time consuming, even when using high-level synthesis (HLS) tools. In this article, we illustrate our hybrid methodology, which uses a frontend for HLS so that the DSE is performed more rapidly by using a higher level abstraction, but without losing accuracy, thanks to the HP-Labs COTSon simulation infrastructure in combination with our DSE tools (MYDSE tools). In particular, this proposed methodology proved useful to achieve an appropriate design of a whole system in a shorter time than trying to design everything directly in HLS. Our motivating problem was to deploy a novel execution model called data-flow threads (DF-Threads) running on yet-to-be-designed hardware. For that goal, directly using the HLS was too premature in the design cycle. Therefore, a key point of our methodology consists in defining the first prototype in our simulation framework and gradually migrating the design into the Xilinx HLS after validating the key performance metrics of our novel system in the simulator. To explain this workflow, we first use a simple driving example consisting in the modelling of a two-way associative cache. Then, we explain how we generalized this methodology and describe the types of results that we were able to analyze in the AXIOM project, which helped us reduce the development time from months/weeks to days/hours

    Performance Evaluation of Multimedia Content Distribution over Multi-Homed Wireless Networks

    Get PDF
    ABSTRACT-The growing availability of IP based heterogeneous wireless access technologies coupled with the increasing capabilities of mobile devices is creating opportunities for multimedia distribution. Through its multi-homing feature, the ability to support multiple network connections in a single end to end association, the transport layer Stream Control Transmission Protocol (SCTP) can enable seamless and transparent communication sessions over multiple heterogeneous networks. This paper analyzes the performance of multimedia distribution when making use of two multihomingSCTPbasedapproaches: Single Path Transfer and Concurrent Multi-path Transfer, in which a single or all paths within an association are used simultaneously for data transmission. In this investigation various retransmission policies and different parameter sets are used in turn and recommendations are made for achieving best results during video delivery. In order to perform this study a novel realistic evaluation tool-set was proposed and is described, which can simulate video delivery over SCTP. Our simulation results and analysis show how to optimize the transmission of multimedia content over SCTP associations in both single and multipath scenarios. INDEX TERMS-Concurrent multi-path transfer, multi-homing, multimedia distribution, SCTP, single path transfer
    • …
    corecore