505 research outputs found

    Evaluation of scheduling heuristics for jitter reduction of real-time streaming applications on multi-core general purpose hardware

    Get PDF
    The real-time system research community has paid a lot of attention to the design of safety critical hard real-time systems for which the use of non-standard hardware and operating systems can be justi﬿ed. However, stream processing applications like medical imaging systems are often not considered safety critical enough to justify the use of hard real-time techniques that would increase the cost of these systems signi﬿cantly. Instead commercial off the shelf (COTS) hardware and OS are used, and techniques at the application level are employed to reduce the variation in the end-to-end latency of these imaging processing systems. In this paper, we study the effectiveness of a number of scheduling heuristics that are intended to reduce the latency and the jitter of stream processing applications that are executed on COTS multiprocessor systems. The proposed scheduling heuristics take the execution times of tasks into account as well as dependencies between the tasks, the data structures accessed by the tasks, and the memory hierarchy. Experiments were carried out on a quad core symmetric multiprocessing (SMP) Intel processor. These experiments show that the proposed heuristics can reduce the end-to-end latency with almost 60%, and reduce the variation in the latency with more than 90% when compared with a naive scheduling heuristic that does not consider execution times, dependencies and the memory hierarchy

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    An Investigation towards Effectiveness in Image Enhancement Process in MPSoC

    Get PDF
    Image enhancement has a primitive role in the vision-based applications. It involves the processing of the input image by boosting its visualization for various applications. The primary objective is to filter the unwanted noises, clutters, sharpening or blur. The characteristics such as resolution and contrast are constructively altered to obtain an outcome of an enhanced image in the bio-medical field. The paper highlights the different techniques proposed for the digital enhancement of images. After surveying these methods that utilize Multiprocessor System-on-Chip (MPSoC), it is concluded that these methodologies have little accuracy and hence none of them are efficiently capable of enhancing the digital biomedical images

    Optimizing Harris Corner Detection on GPGPUs Using CUDA

    Get PDF
    ABSTRACT Optimizing Harris Corner Detection on GPGPUs Using CUDA The objective of this thesis is to optimize the Harris corner detection algorithm implementation on NVIDIA GPGPUs using the CUDA software platform and measure the performance benefit. The Harris corner detection algorithm—developed by C. Harris and M. Stephens—discovers well defined corner points within an image. The corner detection implementation has been proven to be computationally intensive, thus realtime performance is difficult with a sequential software implementation. This thesis decomposes the Harris corner detection algorithm into a set of parallel stages, each of which are implemented and optimized on the CUDA platform. The performance results show that by applying strategic CUDA optimizations to the Harris corner detection implementation, realtime performance is feasible. The optimized CUDA implementation of the Harris corner detection algorithm showed significant speedup over several platforms: standard C, MATLAB, and OpenCV. The optimized CUDA implementation of the Harris corner detection algorithm was then applied to a feature matching computer vision system, which showed significant speedup over the other platforms

    Design and Implementation of an Embedded System for Software Defined Radio

    Get PDF
    In this paper, developing high performance software for demanding real-time embedded systems is proposed. This software-based design will enable the software engineers and system architects in emerging technology areas like 5G Wireless and Software Defined Networking (SDN) to build their algorithms. An ADSP-21364 floating point SHARC Digital Signal Processor (DSP) running at 333 MHz is adopted as a platform for an embedded system. To evaluate the proposed embedded system, an implementation of frame, symbol and carrier phase synchronization is presented as an application. Its performance is investigated with an on line Quadrature Phase Shift keying (QPSK) receiver. Obtained results show that the designed software is implemented successfully based on the SHARC DSP which can utilized efficiently for such algorithms. In addition, it is proven that the proposed embedded system is pragmatic and capable of dealing with the memory constraints and critical time issue due to a long length interleaved coded data utilized for channel coding
    corecore