16,358 research outputs found

    Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device

    Get PDF
    Currently, most designers face a daunting task to research different design flows and learn the intricacies of specific software from various manufacturers in hardware/software co-design. An urgent need of creating a scalable hardware/software co-design platform has become a key strategic element for developing hardware/software integrated systems. In this paper, we propose a new design flow for building a scalable co-design platform on FPGA-based system-on-chip. We employ an integrated approach to implement a histogram oriented gradients (HOG) and a support vector machine (SVM) classification on a programmable device for pedestrian tracking. Not only was hardware resource analysis reported, but the precision and success rates of pedestrian tracking on nine open access image data sets are also analysed. Finally, our proposed design flow can be used for any real-time image processingrelated products on programmable ZYNQ-based embedded systems, which benefits from a reduced design time and provide a scalable solution for embedded image processing products

    CERN openlab Whitepaper on Future IT Challenges in Scientific Research

    Get PDF
    This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates

    Latency Performance for Real-Time Audio on BeagleBone Black

    Get PDF
    In this paper we present a set of tests aimed at evaluating the responsiveness of a BeagleBone Black board in real-time interactive audio applications. The default Angstrom Linux distribution was tested without modifying the underlying kernel. Latency measurements and audio quality were compared across the combination of different audio interfaces and audio synthesis models. Data analysis shows that the board is generally characterised by a remarkably high responsiveness; most of the tested configurations are affected by less than 7ms of latency and under-run activity proved to be contained using the correct optimisation techniques

    An investigation of the performance portability of OpenCL

    Get PDF
    This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation

    Adaptive Partitioning for Large-Scale Dynamic Graphs

    Get PDF
    Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I
    • …
    corecore