1,448 research outputs found
Modeling and visualizing networked multi-core embedded software energy consumption
In this report we present a network-level multi-core energy model and a
software development process workflow that allows software developers to
estimate the energy consumption of multi-core embedded programs. This work
focuses on a high performance, cache-less and timing predictable embedded
processor architecture, XS1. Prior modelling work is improved to increase
accuracy, then extended to be parametric with respect to voltage and frequency
scaling (VFS) and then integrated into a larger scale model of a network of
interconnected cores. The modelling is supported by enhancements to an open
source instruction set simulator to provide the first network timing aware
simulations of the target architecture. Simulation based modelling techniques
are combined with methods of results presentation to demonstrate how such work
can be integrated into a software developer's workflow, enabling the developer
to make informed, energy aware coding decisions. A set of single-,
multi-threaded and multi-core benchmarks are used to exercise and evaluate the
models and provide use case examples for how results can be presented and
interpreted. The models all yield accuracy within an average +/-5 % error
margin
Energy Transparency for Deeply Embedded Programs
Energy transparency is a concept that makes a program's energy consumption
visible, from hardware up to software, through the different system layers.
Such transparency can enable energy optimizations at each layer and between
layers, and help both programmers and operating systems make energy-aware
decisions. In this paper, we focus on deeply embedded devices, typically used
for Internet of Things (IoT) applications, and demonstrate how to enable energy
transparency through existing Static Resource Analysis (SRA) techniques and a
new target-agnostic profiling technique, without hardware energy measurements.
Our novel mapping technique enables software energy consumption estimations at
a higher level than the Instruction Set Architecture (ISA), namely the LLVM
Intermediate Representation (IR) level, and therefore introduces energy
transparency directly to the LLVM optimizer. We apply our energy estimation
techniques to a comprehensive set of benchmarks, including single- and also
multi-threaded embedded programs from two commonly used concurrency patterns,
task farms and pipelines. Using SRA, our LLVM IR results demonstrate a high
accuracy with a deviation in the range of 1% from the ISA SRA. Our profiling
technique captures the actual energy consumption at the LLVM IR level with an
average error of 3%.Comment: 33 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1510.0709
Effective memory management for mobile environments
Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called âappsâ hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks.
In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness.
We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack.
Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20â30%) with minimal impact on throughput and responsiveness (5â10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques
CoAP Infrastructure for IoT
The Internet of Things (IoT) can be seen as a large-scale network of billions of smart devices. Often IoT
devices exchange data in small but numerous messages, which requires IoT services to be more scalable and
reliable than ever. Traditional protocols that are known in the Web world does not fit well in the constrained
environment that these devices operate in. Therefore many lightweight protocols specialized for the IoT have
been studied, among which the Constrained Application Protocol (CoAP) stands out for its well-known REST
paradigm and easy integration with existing Web. On the other hand, new paradigms such as Fog Computing
emerges, attempting to avoid the centralized bottleneck in IoT services by moving computations to the edge
of the network. Since a node of the Fog essentially belongs to relatively constrained environment, CoAP fits
in well. Among the many attempts of building scalable and reliable systems, Erlang as a typical concurrency-oriented programming (COP) language has been battle tested in the telecom industry, which has similar requirements
as the IoT. In order to explore the possibility of applying Erlang and COP in general to the IoT, this thesis
presents an Erlang based CoAP server/client prototype ecoap with a flexible concurrency model that can
scale up to an unconstrained environment like the Cloud and scale down to a constrained environment like
an embedded platform. The flexibility of the presented server renders the same architecture applicable from
Fog to Cloud. To evaluate its performance, the proposed server is compared with the mainstream CoAP
implementation on an Amazon Web Service (AWS) Cloud instance and a Raspberry Pi 3, representing the
unconstrained and constrained environment respectively. The ecoap server achieves comparable throughput,
lower latency, and in general scales better than the other implementation in the Cloud and on the Raspberry
Pi. The thesis yields positive results and demonstrates the value of the philosophy of Erlang in the IoT space
On the Distribution of Control in Asynchronous Processor Architectures
Institute for Computing Systems ArchitectureThe effective performance of computer systems is to a large measure
determined by the synergy between the processor architecture, the
instruction set and the compiler. In the past, the sequencing of
information within processor architectures has normally been
synchronous: controlled centrally by a clock. However, this global
signal could possibly limit the future gains in performance that can
potentially be achieved through improvements in implementation
technology.
This thesis investigates the effects of relaxing this strict synchrony
by distributing control within processor architectures through the use
of a novel asynchronous design model known as a micronet. The impact
of asynchronous control on the performance of a RISC-style processor
is explored at different levels. Firstly, improvements in the
performance of individual instructions by exploiting actual run-time
behaviours are demonstrated. Secondly, it is shown that micronets are
able to exploit further (both spatial and temporal) instructionlevel
parallelism (ILP) efficiently through the distribution of control to
datapath resources. Finally, exposing fine-grain concurrency within a
datapath can only be of benefit to a computer system if it can easily
be exploited by the compiler. Although compilers for micronet-based
asynchronous processors may be considered to be more complex than
their synchronous counterparts, it is shown that the variable
execution time of an instruction does not adversely affect the
compiler's ability to schedule code efficiently. In conclusion, the
modelling of a processor's datapath as a micronet permits the
exploitation of both finegrain ILP and actual run-time delays, thus
leading to the efficient utilisation of functional units and in turn
resulting in an improvement in overall system performance
Concurrency Platforms for Real-Time and Cyber-Physical Systems
Parallel processing is an important way to satisfy the increasingly demanding computational needs of modern real-time and cyber-physical systems, but existing parallel computing technologies primarily emphasize high-throughput and average-case performance metrics, which are largely unsuitable for direct application to real-time, safety-critical contexts. This work contrasts two concurrency platforms designed to achieve predictable worst case parallel performance for soft real-time workloads with millisecond periods and higher. One of these is then the basis for the CyberMech platform, which enables parallel real-time computing for a novel yet representative application called Real-Time Hybrid Simulation (RTHS). RTHS combines demanding parallel real-time computation with real-time simulation and control in an earthquake engineering laboratory environment, and results concerning RTHS characterize a reasonably comprehensive survey of parallel real-time computing in the static context, where the size, shape, timing constraints, and computational requirements of workloads are fixed prior to system runtime. Collectively, these contributions constitute the first published implementations and evaluations of general-purpose concurrency platforms for real-time and cyber-physical systems, explore two fundamentally different design spaces for such systems, and successfully demonstrate the utility and tradeoffs of parallel computing for statically determined real-time and cyber-physical systems
Memory safety and untrusted extensions for TinyOS
technical reportSensor network applications should be reliable. However, TinyOS, the dominant sensor net OS, lacks basic building blocks for reliable software systems: memory protection, isolation, and safe termination. These features are typically found in general-purpose operating systems but are believed to be too expensive for tiny embedded systems with a few kilobytes of RAM. We dispel this notion and show that CCured, a safe dialect of C, can be leveraged to provide memory safety for largely unmodified TinyOS applications. We build upon safety to implement two very different environments for TinyOS applications. The first, Safe TinyOS, provides a minimal kernel for safely executing trusted applications. Safe execution traps and identifies bugs that would otherwise have silently corrupted RAM. The second environment, UTOS, implements a user-kernel boundary that supports isolation and safe termination of untrusted code. Existing TinyOS components can often be ported to UTOS with little effort. To create our environments, we substantially augmented the CCured toolchain to emit code that is safe under interrupt-driven concurrency, to reduce storage requirements by compressing error messages, to refactor direct hardware access into calls to trusted helper functions, and to make safe programs more efficient using whole-program optimization. A surprising result of our work is that a safe, optimized TinyOS program can be faster than the original unsafe, unoptimized application
- âŠ