552,946 research outputs found
The Challenge of Time-Predictability in Modern Many-Core Architectures
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way
Bilayer Protograph Codes for Half-Duplex Relay Channels
Despite encouraging advances in the design of relay codes, several important
challenges remain. Many of the existing LDPC relay codes are tightly optimized
for fixed channel conditions and not easily adapted without extensive
re-optimization of the code. Some have high encoding complexity and some need
long block lengths to approach capacity. This paper presents a high-performance
protograph-based LDPC coding scheme for the half-duplex relay channel that
addresses simultaneously several important issues: structured coding that
permits easy design, low encoding complexity, embedded structure for convenient
adaptation to various channel conditions, and performance close to capacity
with a reasonable block length. The application of the coding structure to
multi-relay networks is demonstrated. Finally, a simple new methodology for
evaluating the end-to-end error performance of relay coding systems is
developed and used to highlight the performance of the proposed codes.Comment: Accepted in IEEE Trans. Wireless Com
Programming the Adapteva Epiphany 64-core Network-on-chip Coprocessor
In the construction of exascale computing systems energy efficiency and power
consumption are two of the major challenges. Low-power high performance
embedded systems are of increasing interest as building blocks for large scale
high- performance systems. However, extracting maximum performance out of such
systems presents many challenges. Various aspects from the hardware
architecture to the programming models used need to be explored. The Epiphany
architecture integrates low-power RISC cores on a 2D mesh network and promises
up to 70 GFLOPS/Watt of processing efficiency. However, with just 32 KB of
memory per eCore for storing both data and code, and only low level inter-core
communication support, programming the Epiphany system presents several
challenges. In this paper we evaluate the performance of the Epiphany system
for a variety of basic compute and communication operations. Guided by this
data we explore strategies for implementing scientific applications on memory
constrained low-powered devices such as the Epiphany. With future systems
expected to house thousands of cores in a single chip, the merits of such
architectures as a path to exascale is compared to other competing systems.Comment: 14 pages, submitted to IJHPCA Journal special editio
Batch solution of small PDEs with the OPS DSL
In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs
Low latency vision-based control for robotics : a thesis presented in partial fulfilment of the requirements for the degree of Master of Engineering in Mechatronics at Massey University, Manawatu, New Zealand
In this work, the problem of controlling a high-speed dynamic tracking and interception system using computer vision as the measurement unit was explored.
High-speed control systems alone present many challenges, and these challenges are compounded when combined with the high volume of data processing required by computer vision systems. A semi-automated foosball table was chosen as the test-bed system because it combines all the challenges associated with a vision-based control system into a single platform. While computer vision is extremely useful and can solve many problems, it can also introduce many problems such as latency, the need for lens and spatial calibration, potentially high power consumption, and high cost.
The objective of this work is to explore how to implement computer vision as the measurement unit in a high-speed controller, while minimising latencies caused by the vision itself, communication interfaces, data processing/strategy, instruction execution, and actuator control. Another objective was to implement the solution in one low-latency, low power, low cost embedded system. A field programmable gate array (FPGA) system on chip (SoC), which combines programmable digital logic with a dual core ARM processor (HPS) on the same chip, was hypothesised to be capable of running the described vision-based control system.
The FPGA was used to perform streamed image pre-processing, concurrent stepper motor control and provide communication channels for user input, while the HPS performed the lens distortion mapping, intercept calculation and “strategy” control tasks, as well as controlling overall function of the system. Individual vision systems were compared for latency performance. Interception performance of the semi-automated foosball table was then tested for straight, moderate-speed shots with limited view time, and latency was artificially added to the system and the interception results for the same, centre-field shot tested with a variety of different added latencies.
The FPGA based system performed the best in both steady-state latency, and novel event detection latency tests. The developed stepper motor control modules performed well in terms of speed, smoothness, resource consumption, and versatility. They are capable of constant velocity, constant acceleration and variable acceleration profiles, as well as being completely parameterisable. The interception modules on the foosball table achieved a 100% interception rate, with a confidence interval of 95%, and reliability of 98.4%. As artificial latency was added to the system, the performance dropped in terms of overall number of successful intercepts. The decrease in performance was roughly linear with a 60% in reduction in performance caused by 100 ms of added latency. Performance dropped to 0% successful intercepts when 166 ms of latency was added.
The implications of this work are that FPGA SoC technology may, in future, enable computer vision to be used as a general purpose, high-speed measurement system for a wide variety of control problems
The Mont-Blanc prototype: an alternative approach for high-performance computing systems
High-performance computing (HPC) is recognized as one of the pillars for further advance of science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging challenges in order to reach Exascale level of performance,which is expected by the year 2020. The much larger embedded and mobile market allows for rapid development of IP blocks, and provides more flexibility in designing an application-specific SoC, in turn giving possibility in balancing performance, energy-efficiency and cost. In the Mont-Blanc project, we advocate for HPC systems be built from such commodity IP blocks, currently used in embedded and mobile SoCs.
As a first demonstrator of such approach, we present the Mont-Blanc prototype; the first HPC system built with commodity SoCs, memories, and NICs from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling and integration solutions. We present the system’s architecture, and evaluation including both performance and energy efficiency. Further, we compare the system’s abilities against a production level supercomputer. At the end, we discuss parallel scalability, and estimate the maximum scalability point of this approach across a set of HPC applications.Postprint (published version
Electronic health records to facilitate clinical research
Electronic health records (EHRs) provide opportunities to enhance patient care, embed performance measures in clinical practice, and facilitate clinical research. Concerns have been raised about the increasing recruitment challenges in trials, burdensome and obtrusive data collection, and uncertain generalizability of the results. Leveraging electronic health records to counterbalance these trends is an area of intense interest. The initial applications of electronic health records, as the primary data source is envisioned for observational studies, embedded pragmatic or post-marketing registry-based randomized studies, or comparative effectiveness studies. Advancing this approach to randomized clinical trials, electronic health records may potentially be used to assess study feasibility, to facilitate patient recruitment, and streamline data collection at baseline and follow-up. Ensuring data security and privacy, overcoming the challenges associated with linking diverse systems and maintaining infrastructure for repeat use of high quality data, are some of the challenges associated with using electronic health records in clinical research. Collaboration between academia, industry, regulatory bodies, policy makers, patients, and electronic health record vendors is critical for the greater use of electronic health records in clinical research. This manuscript identifies the key steps required to advance the role of electronic health records in cardiovascular clinical research
A Survey and Comparative Study of Hard and Soft Real-time Dynamic Resource Allocation Strategies for Multi/Many-core Systems
Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, requiring efficient dynamic resource allocation strategies to satisfy hard or soft real-time constraints. This article provides an extensive survey of hard and soft real-time dynamic resource allocation strategies proposed since the mid-1990s and highlights the emerging trends for multi-/many-core systems. The survey covers a taxonomy of the resource allocation strategies and considers their various optimization objectives, which have been used to provide comprehensive comparison. The strategies employ various principles, such as market and biological concepts, to perform the optimizations. The trend followed by the resource allocation strategies, open research challenges, and likely emerging research directions have also been provided
- …