114,315 research outputs found

    Homogeneous and Heterogeneous Parallel Architectures in Real-Time Signal Processing and Control

    Get PDF
    This paper presents an investigation into the real time performance of homogeneous and heterogeneous parallel architectures in signal processing and control applications. Several algorithms of regular and irregular nature are considered. These are implemented on a number of uni- processor and multi-processor parallel architectures. Hardware and software resources, capabilities of the architectures and characteristics of the algorithms are considered for suitable matching between the algorithms and the architectures. The partitioning and mapping of the algorithms on the architectures and multi-processor communication techniques are investigated. Finally, a comparison of the results of implementations is made to establish merits of design and development of parallel architectures for real-time signal processing and control applications

    Heterogeneous and Homogeneous Parallel Architectures for Real-Time Adaptive Active Vibration Control

    Get PDF
    This paper presents an investigation into parallel processing techniques for real-time adaptive control of a flexible beam structure. Three different algorithms, namely simulation, control and identification are involved in the adaptive control algorithm. These are implemented on a number of computing platforms including a homogeneous network of transputer nodes, a homogeneous network of digital signal processing (DSP) devices, heterogeneous architectures involving transputers, reduced instruction set computer superscalar processor and DSP device, single DSP devices and transputer nodes and several general purpose sequential processors. The partitioning and mapping of the algorithms on the homogeneous and heterogeneous architectures is also explored. The inter-processor communication speed is investigated to establish the real-time performance aspects of the processors on the basis of the nature of the algorithms involved. A close investigation into the performance of several compilers is made and discussed within the context of real-time implementations. Finally, a comparison of the results of the implementations is made, on the basis of real-time communications performance, computation performance and complier performance, to lead to merits of design of parallel systems incorporating fast processing techniques for real-time control applications

    Mapping and Scheduling Strategies for Heterogeneous Architectures

    Get PDF
    Extensive and computationally complex signal processing and control applications are commonly constructed from small computational blocks where the load decomposition and balance may not be easily achieved. This requires the development of mapping and scheduling strategies based on application to processor matching. In this context, several application algorithms are utilised and investigated in this work within the Development Framework (DF) approach. The DF approach supports the specification, design and implementation of real-time control systems. It also contains several mapping and scheduling tools to improve the performance of systems as well as tools for code generation. To improve the performance of an application, a new approach, namely the the priority-Based Genetic Algorithm, (PBGA) is developed and reported in this paper. The approach is applied to several applications using parallel and distributed heterogeneous architectures and its performance verified in comparison to several previously developed strategies

    Efficient FPGA implementation of high-throughput mixed radix multipath delay commutator FFT processor for MIMO-OFDM

    Get PDF
    This article presents and evaluates pipelined architecture designs for an improved high-frequency Fast Fourier Transform (FFT) processor implemented on Field Programmable Gate Arrays (FPGA) for Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM). The architecture presented is a Mixed-Radix Multipath Delay Commutator. The presented parallel architecture utilizes fewer hardware resources compared to Radix-2 architecture, while maintaining simple control and butterfly structures inherent to Radix-2 implementations. The high-frequency design presented allows enhancing system throughput without requiring additional parallel data paths common in other current approaches, the presented design can process two and four independent data streams in parallel and is suitable for scaling to any power of two FFT size N. FPGA implementation of the architecture demonstrated significant resource efficiency and high-throughput in comparison to relevant current approaches within literature. The proposed architecture designs were realized with Xilinx System Generator (XSG) and evaluated on both Virtex-5 and Virtex-7 FPGA devices. Post place and route results demonstrated maximum frequency values over 400 MHz and 470 MHz for Virtex-5 and Virtex-7 FPGA devices respectively

    Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

    Get PDF
    Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 Ă— 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

    Using the High Productivity Language Chapel to Target GPGPU Architectures

    Get PDF
    It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

    Shining Light On Shadow Stacks

    Full text link
    Control-Flow Hijacking attacks are the dominant attack vector against C/C++ programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the forward edge,i.e., indirect calls through function pointers and virtual calls. Protecting the backward edge is left to stack canaries, which are easily bypassed through information leaks. Shadow Stacks are a fully precise mechanism for protecting backwards edges, and should be deployed with CFI mitigations. We present a comprehensive analysis of all possible shadow stack mechanisms along three axes: performance, compatibility, and security. For performance comparisons we use SPEC CPU2006, while security and compatibility are qualitatively analyzed. Based on our study, we renew calls for a shadow stack design that leverages a dedicated register, resulting in low performance overhead, and minimal memory overhead, but sacrifices compatibility. We present case studies of our implementation of such a design, Shadesmar, on Phoronix and Apache to demonstrate the feasibility of dedicating a general purpose register to a security monitor on modern architectures, and the deployability of Shadesmar. Our comprehensive analysis, including detailed case studies for our novel design, allows compiler designers and practitioners to select the correct shadow stack design for different usage scenarios.Comment: To Appear in IEEE Security and Privacy 201
    • …
    corecore