552,946 research outputs found

    The Challenge of Time-Predictability in Modern Many-Core Architectures

    Get PDF
    The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way

    Bilayer Protograph Codes for Half-Duplex Relay Channels

    Get PDF
    Despite encouraging advances in the design of relay codes, several important challenges remain. Many of the existing LDPC relay codes are tightly optimized for fixed channel conditions and not easily adapted without extensive re-optimization of the code. Some have high encoding complexity and some need long block lengths to approach capacity. This paper presents a high-performance protograph-based LDPC coding scheme for the half-duplex relay channel that addresses simultaneously several important issues: structured coding that permits easy design, low encoding complexity, embedded structure for convenient adaptation to various channel conditions, and performance close to capacity with a reasonable block length. The application of the coding structure to multi-relay networks is demonstrated. Finally, a simple new methodology for evaluating the end-to-end error performance of relay coding systems is developed and used to highlight the performance of the proposed codes.Comment: Accepted in IEEE Trans. Wireless Com

    Programming the Adapteva Epiphany 64-core Network-on-chip Coprocessor

    Full text link
    In the construction of exascale computing systems energy efficiency and power consumption are two of the major challenges. Low-power high performance embedded systems are of increasing interest as building blocks for large scale high- performance systems. However, extracting maximum performance out of such systems presents many challenges. Various aspects from the hardware architecture to the programming models used need to be explored. The Epiphany architecture integrates low-power RISC cores on a 2D mesh network and promises up to 70 GFLOPS/Watt of processing efficiency. However, with just 32 KB of memory per eCore for storing both data and code, and only low level inter-core communication support, programming the Epiphany system presents several challenges. In this paper we evaluate the performance of the Epiphany system for a variety of basic compute and communication operations. Guided by this data we explore strategies for implementing scientific applications on memory constrained low-powered devices such as the Epiphany. With future systems expected to house thousands of cores in a single chip, the merits of such architectures as a path to exascale is compared to other competing systems.Comment: 14 pages, submitted to IJHPCA Journal special editio

    Batch solution of small PDEs with the OPS DSL

    Get PDF
    In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs

    Low latency vision-based control for robotics : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Mechatronics at Massey University, Manawatu, New Zealand

    Get PDF
    In this work, the problem of controlling a high-speed dynamic tracking and interception system using computer vision as the measurement unit was explored. High-speed control systems alone present many challenges, and these challenges are compounded when combined with the high volume of data processing required by computer vision systems. A semi-automated foosball table was chosen as the test-bed system because it combines all the challenges associated with a vision-based control system into a single platform. While computer vision is extremely useful and can solve many problems, it can also introduce many problems such as latency, the need for lens and spatial calibration, potentially high power consumption, and high cost. The objective of this work is to explore how to implement computer vision as the measurement unit in a high-speed controller, while minimising latencies caused by the vision itself, communication interfaces, data processing/strategy, instruction execution, and actuator control. Another objective was to implement the solution in one low-latency, low power, low cost embedded system. A field programmable gate array (FPGA) system on chip (SoC), which combines programmable digital logic with a dual core ARM processor (HPS) on the same chip, was hypothesised to be capable of running the described vision-based control system. The FPGA was used to perform streamed image pre-processing, concurrent stepper motor control and provide communication channels for user input, while the HPS performed the lens distortion mapping, intercept calculation and “strategy” control tasks, as well as controlling overall function of the system. Individual vision systems were compared for latency performance. Interception performance of the semi-automated foosball table was then tested for straight, moderate-speed shots with limited view time, and latency was artificially added to the system and the interception results for the same, centre-field shot tested with a variety of different added latencies. The FPGA based system performed the best in both steady-state latency, and novel event detection latency tests. The developed stepper motor control modules performed well in terms of speed, smoothness, resource consumption, and versatility. They are capable of constant velocity, constant acceleration and variable acceleration profiles, as well as being completely parameterisable. The interception modules on the foosball table achieved a 100% interception rate, with a confidence interval of 95%, and reliability of 98.4%. As artificial latency was added to the system, the performance dropped in terms of overall number of successful intercepts. The decrease in performance was roughly linear with a 60% in reduction in performance caused by 100 ms of added latency. Performance dropped to 0% successful intercepts when 166 ms of latency was added. The implications of this work are that FPGA SoC technology may, in future, enable computer vision to be used as a general purpose, high-speed measurement system for a wide variety of control problems

    The Mont-Blanc prototype: an alternative approach for high-performance computing systems

    Get PDF
    High-performance computing (HPC) is recognized as one of the pillars for further advance of science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging challenges in order to reach Exascale level of performance,which is expected by the year 2020. The much larger embedded and mobile market allows for rapid development of IP blocks, and provides more flexibility in designing an application-specific SoC, in turn giving possibility in balancing performance, energy-efficiency and cost. In the Mont-Blanc project, we advocate for HPC systems be built from such commodity IP blocks, currently used in embedded and mobile SoCs. As a first demonstrator of such approach, we present the Mont-Blanc prototype; the first HPC system built with commodity SoCs, memories, and NICs from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling and integration solutions. We present the system’s architecture, and evaluation including both performance and energy efficiency. Further, we compare the system’s abilities against a production level supercomputer. At the end, we discuss parallel scalability, and estimate the maximum scalability point of this approach across a set of HPC applications.Postprint (published version

    Electronic health records to facilitate clinical research

    Get PDF
    Electronic health records (EHRs) provide opportunities to enhance patient care, embed performance measures in clinical practice, and facilitate clinical research. Concerns have been raised about the increasing recruitment challenges in trials, burdensome and obtrusive data collection, and uncertain generalizability of the results. Leveraging electronic health records to counterbalance these trends is an area of intense interest. The initial applications of electronic health records, as the primary data source is envisioned for observational studies, embedded pragmatic or post-marketing registry-based randomized studies, or comparative effectiveness studies. Advancing this approach to randomized clinical trials, electronic health records may potentially be used to assess study feasibility, to facilitate patient recruitment, and streamline data collection at baseline and follow-up. Ensuring data security and privacy, overcoming the challenges associated with linking diverse systems and maintaining infrastructure for repeat use of high quality data, are some of the challenges associated with using electronic health records in clinical research. Collaboration between academia, industry, regulatory bodies, policy makers, patients, and electronic health record vendors is critical for the greater use of electronic health records in clinical research. This manuscript identifies the key steps required to advance the role of electronic health records in cardiovascular clinical research

    A Survey and Comparative Study of Hard and Soft Real-time Dynamic Resource Allocation Strategies for Multi/Many-core Systems

    Get PDF
    Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, requiring efficient dynamic resource allocation strategies to satisfy hard or soft real-time constraints. This article provides an extensive survey of hard and soft real-time dynamic resource allocation strategies proposed since the mid-1990s and highlights the emerging trends for multi-/many-core systems. The survey covers a taxonomy of the resource allocation strategies and considers their various optimization objectives, which have been used to provide comprehensive comparison. The strategies employ various principles, such as market and biological concepts, to perform the optimizations. The trend followed by the resource allocation strategies, open research challenges, and likely emerging research directions have also been provided
    • …
    corecore