10,680 research outputs found
Model based code generation for distributed embedded systems
Embedded systems are becoming increasingly complex and more distributed. Cost and quality requirements necessitate reuse of the functional software components for multiple deployment architectures. An important step is the allocation of software components to hardware. During this process the differences between the hardware and application software architectures must be reconciled. In this paper we discuss an architecture driven approach involving model-based techniques to resolve these differences and integrate hardware and software components. The system architecture serves as the underpinning based on which distributed real-time components can be generated. Generation of various embedded system architectures using the same functional architecture is discussed. The approach leverages the following technologies – IME (Integrated Modeling Environment), the SAE AADL (Architecture Analysis and Design Language), and Ocarina. The approach is illustrated using the electronic throttle control system as a case study
Synthesis of application specific processor architectures for ultra-low energy consumption
In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm
MARACAS: a real-time multicore VCPU scheduling framework
This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
The MOLDY short-range molecular dynamics package
We describe a parallelised version of the MOLDY molecular dynamics program.
This Fortran code is aimed at systems which may be described by short-range
potentials and specifically those which may be addressed with the embedded atom
method. This includes a wide range of transition metals and alloys. MOLDY
provides a range of options in terms of the molecular dynamics ensemble used
and the boundary conditions which may be applied. A number of standard
potentials are provided, and the modular structure of the code allows new
potentials to be added easily. The code is parallelised using OpenMP and can
therefore be run on shared memory systems, including modern multicore
processors. Particular attention is paid to the updates required in the main
force loop, where synchronisation is often required in OpenMP implementations
of molecular dynamics. We examine the performance of the parallel code in
detail and give some examples of applications to realistic problems, including
the dynamic compression of copper and carbon migration in an iron-carbon alloy
- …