1,324 research outputs found

    Towards Increased Power Efficiency in Low End Embedded Processors: Can Cache Help?

    Get PDF
    Embedded processors are often characterized by limited resources and are optimized for specific applications. A rising number of battery powered applications has driven a trend towards increased energy efficiency sometimes even traded with performance. Particularly, lower power and low specification embedded processors lack on-chip cache memories. This is mainly in order to avoid the higher energy overhead a cache structure would pose in an embedded processor. This paper proposes energy and throughput models which can be used to analyze energy and time overhead for a particular application due to introduction of a data cache architecture in a previously non-cached system or alternatively can be used in reconfigurable systems for cache overhead analysis.

    Connecting the World of Embedded Mobiles: The RIOT Approach to Ubiquitous Networking for the Internet of Things

    Full text link
    The Internet of Things (IoT) is rapidly evolving based on low-power compliant protocol standards that extend the Internet into the embedded world. Pioneering implementations have proven it is feasible to inter-network very constrained devices, but had to rely on peculiar cross-layered designs and offer a minimalistic set of features. In the long run, however, professional use and massive deployment of IoT devices require full-featured, cleanly composed, and flexible network stacks. This paper introduces the networking architecture that turns RIOT into a powerful IoT system, to enable low-power wireless scenarios. RIOT networking offers (i) a modular architecture with generic interfaces for plugging in drivers, protocols, or entire stacks, (ii) support for multiple heterogeneous interfaces and stacks that can concurrently operate, and (iii) GNRC, its cleanly layered, recursively composed default network stack. We contribute an in-depth analysis of the communication performance and resource efficiency of RIOT, both on a micro-benchmarking level as well as by comparing IoT communication across different platforms. Our findings show that, though it is based on significantly different design trade-offs, the networking subsystem of RIOT achieves a performance equivalent to that of Contiki and TinyOS, the two operating systems which pioneered IoT software platforms

    Investigating the viability of adaptive caches as a defense mechanism against cache side-channel attacks

    Get PDF
    The ongoing miniaturization of semiconductor manufacturing technologies has enabled the integration of tens to hundreds of processing cores on a single chip. Unlike frequency-scaling where performance is increased equally across the board, core-scaling and hardware thread-scaling harness the additional processing power through the concurrent execution of multiple processes or programs. This approach of mingling or interleaving process executions has engendered a new set of security challenges that risks to undermine nearly three decades’ worth of computer architecture design efforts. The complexity of the runtime interactions and aggressive resource sharing among processes, e.g., caches or interconnect network paths, have created a fertile ground to mount attacks of ever-increasing acuteness against these computer systems. One such class of attacks is cache side-channel attacks. While caches are vital to the performance of current processors, they have also been the target of numerous side-channel attacks. As a result, a few cache architectures have been proposed to defend against these attacks. However, these designs tend to provide security at the expense of performance, area and power. Therefore, the design of secure, high-performance cache architectures is still a pressing research challenge. In this thesis, we examine the viability of self-aware adaptive caches as a defense mechanism against cache side-channel attacks. We define an adaptive cache as a caching structure with (i) run-time reconfiguration capability, and (ii) intelligent built-in logic to monitor itself and determine its parameter settings. Since the success of most cache side-channel attacks depend on the attacker’s knowledge of the key cache parameters such as associativity, set count, replacement policy, among others, an adaptive cache can provide a moving target defense approach against many of these cache side-channel attacks. Therefore, we hypothesize that the runtime changes in certain cache parameters should render some of the side-channel attacks less effective due to their dependence on knowing the exact configuration of the caches.2020-06-03T00:00:00

    Combining Target-independent Analysis with Dynamic Profiling to Build the Performance Model of a DSP

    Get PDF
    Fast and accurate performance estimation is a key aspect of heterogeneous embedded systems design flow, since cycle-accurate simulators, when exist, are usually too slow to be used during design space exploration. Performance estimation techniques are usually based on combination of estimation of the single processing elements which compose the system. Architectural characteristics of Digital Signal Processors (DSP), such as the presence of Single Instruction Multiple Data operations or of special hardware units to control loop executions, introduce peculiar aspects in the performance estimation problem. In this paper we present a methodology to estimate the performance of a function on a given dataset on a DSP. Estimation is performed combining the host profiling data with the function GNU GCC GIMPLE representation. Starting from the results of this analysis, we build a performance model of a DSP by exploiting the Linear Regression Technique. Use of GIMPLE representation allows to take directly into account the target-independent optimizations performed by the DSP compiler. We validate our approach by building a performance model of the MagicV DSP and by testing the model on a set of significative benchmarks

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    Dynamic adaptation and distribution of binaries to heterogeneous architectures

    Get PDF
    Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are un- able to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be used to make it practical to use the processing power that networks of many-core architectures can provide. The research focus on solutions that can be included in our Parallel Processing Graphs (P2G) project. We have developed the foundation for network distribution in P2G, and we have suggested a viable solution for execution of workloads on heterogeneous multi- core architectures

    Performance estimation of embedded software with confidence levels

    Get PDF
    Since time constraints are a very critical aspect of an embedded system, performance evaluation can not be postponed to the end of the design flow, but it has to be introduced since its early stages. Estimation techniques based on mathematical models are usually preferred during this phase since they provide quite accurate estimation of the application performance in a fast way. However, the estimation error has to be considered during design space exploration to evaluate if a solution can be accepted (e.g., by discarding solutions whose estimated time is too close to constraint). Evaluate if the possible error can be significant analyzing a punctual estimation is not a trivial task. In this paper we propose a methodology, based on statistical analysis, that provides a prediction interval on the estimation and a confidence level on meeting a time constraint. This information can drive design space exploration reducing the number of solutions to be validated. The results show how the produced intervals effectively capture the estimation error introduced by a linear model

    A HyperNet Architecture

    Get PDF
    Network virtualization is becoming a fundamental building block of future Internet architectures. By adding networking resources into the “cloud”, it is possible for users to rent virtual routers from the underlying network infrastructure, connect them with virtual channels to form a virtual network, and tailor the virtual network (e.g., load application-specific networking protocols, libraries and software stacks on to the virtual routers) to carry out a specific task. In addition, network virtualization technology allows such special-purpose virtual networks to co-exist on the same set of network infrastructure without interfering with each other. Although the underlying network resources needed to support virtualized networks are rapidly becoming available, constructing a virtual network from the ground up and using the network is a challenging and labor-intensive task, one best left to experts. To tackle this problem, we introduce the concept of a HyperNet, a pre-built, pre-configured network package that a user can easily deploy or access a virtual network to carry out a specific task (e.g., multicast video conferencing). HyperNets package together the network topology configuration, software, and network services needed to create and deploy a custom virtual network. Users download HyperNets from HyperNet repositories and then “run” them on virtualized network infrastructure much like users download and run virtual appliances on a virtual machine. To support the HyperNet abstraction, we created a Network Hypervisor service that provides a set of APIs that can be called to create a virtual network with certain characteristics. To evaluate the HyperNet architecture, we implemented several example Hyper-Nets and ran them on our prototype implementation of the Network Hypervisor. Our experiments show that the Hypervisor API can be used to compose almost any special-purpose network – networks capable of carrying out functions that the current Internet does not provide. Moreover, the design of our HyperNet architecture is highly extensible, enabling developers to write high-level libraries (using the Network Hypervisor APIs) to achieve complicated tasks

    Fine grain analysis of simulators accuracy for calibrating performance models

    Get PDF
    In embedded system design, the tuning and validation of a cycle accurate simulator is a difficult task. The designer has to assure that the estimation error of the simulator meets the design constraints on every application. If an application is not correctly estimated, the designer has to identify on which parts of the application the simulator introduces an estimation error and consequently fix the simulator. However, detecting which are the mispredicted parts of a very large application can be a difficult process which requires a lot of time. In this paper we propose a methodology which helps the designer to fast and automatically isolate the portions of the application mispredicted by a simulator. This is accomplished by recursively analyzing the application source code trace highlighting the mispredicted sections of source code. The results obtained applying the methodology to the TSIM simulator show how our methodology is able to fast analyze large applications isolating small portions of mispredicted code
    • …
    corecore