542 research outputs found

    Multi-Softcore Architecture on FPGA

    Get PDF
    To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication)

    Radar signal processing for sensing in assisted living: the challenges associated with real-time implementation of emerging algorithms

    Get PDF
    This article covers radar signal processing for sensing in the context of assisted living (AL). This is presented through three example applications: human activity recognition (HAR) for activities of daily living (ADL), respiratory disorders, and sleep stages (SSs) classification. The common challenge of classification is discussed within a framework of measurements/preprocessing, feature extraction, and classification algorithms for supervised learning. Then, the specific challenges of the three applications from a signal processing standpoint are detailed in their specific data processing and ad hoc classification strategies. Here, the focus is on recent trends in the field of activity recognition (multidomain, multimodal, and fusion), health-care applications based on vital signs (superresolution techniques), and comments related to outstanding challenges. Finally, this article explores challenges associated with the real-time implementation of signal processing/classification algorithms

    Applications, tools and techniques on the road to exascale computing

    Get PDF
    This volume of the book series “Advances in Parallel Computing” contains the proceedings of ParCo2011, the 14th biennial ParCo Conference, held from 31 August to 3 September 2011, in Ghent, Belgium. In an era when physical limitations have slowed down advances in the performance of single processing units, and new scientific challenges require exascale speed, parallel processing has gained momentum as a key gateway to HPC (High Performance Computing). Historically, the ParCo conferences have focused on three main themes: Algorithms, Architectures (both hardware and software) and Applications. Nowadays, the scenery has changed from traditional multiprocessor topologies to heterogeneous manycores, incorporating standard CPUs, GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays). These platforms are, at a higher abstraction level, integrated in clusters, grids, and clouds. This is reflected in the papers presented at the conference and the contributions as included in these proceedings. An increasing number of new algorithms are optimized for heterogeneous platforms and performance tuning is targeting extreme scale computing. Heterogeneous platforms utilising the compute power and energy efficiency of GPGPUs (General Purpose GPUs) are clearly becoming mainstream HPC systems for a large number of applications in a wide spectrum of application areas. These systems excel in areas such as complex system simulation, real-time image processing and visualisation, etc. High performance computing accelerators may well become the cornerstone of exascale computing applications such as 3-D turbulent combustion flows, nuclear energy simulations, brain research, financial and geophysical modelling. The exploration of new architectures, programming tools and techniques was evidenced by the mini-symposia “Parallel Computing with FPGAs” and “Exascale Programming Models”. The need for exascale hardware and software was also stressed in the industrial session, with contributions from Cray and the European exascale software initiative. Our sincere appreciation goes to the keynote speakers who gave their perspectives on the impact of parallel computing today and the road to exascale computing tomorrow. Our heartfelt thanks go to the authors for their valuable scientific contributions and to the programme committee who reviewed the papers and provided constructive remarks. The international audience was inspired by the quality of the presentations. The attendance and interaction was high and the conference has been an agora where many fruitful ideas were exchanged and explored. We wish to express our sincere thanks to the organizers for the smooth operation of the conference. The University conference centre Het Pand offered an excellent environment for the conference as it allowed delegates to interact informally and easily. A special word of thanks is due to the management and support staff of Het Pand for their proficient and friendly support. The organizers managed to put together an extensive social programme. This included a reception at the medieval Town Hall of Ghent as well as a memorable conference dinner. These social events stimulated interaction amongst delegates and resulted in many new contacts being made. Finally we wish to thank all the many supporters who assisted in the organization and successful running of the event. Erik D'Hollander, Ghent University, Belgium Koen De Bosschere, Ghent University, Belgium Gerhard R. Joubert, TU Clausthal, Germany David Padua, University of Illinois, USA Frans Peters, Philips Research, Netherland

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time
    • …
    corecore