33 research outputs found

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    A trace-driven approach for fast and accurate simulation of manycore architectures

    No full text
    International audienceThe evolution of manycore sytems, forecasted to feature hundreds of cores by the end of the decade calls for efficient solutions for design space exploration and debugging. Among the relevant existing solutions the well-known gem5 simu-lator provides a rich architecture description framework. However , these features come at the price of prohibitive simulation time that limits the scope of possible explorations to configurations made of tens of cores. To address this limitation, this paper proposes a novel trace-driven simulation approach for efficient exploration of manycore architectures

    Integrated Evaluation Platform for Secured Devices

    Get PDF
    International audienceIn this paper, we describe the structure of a FPGAsmart card emulator. The aim of such an emulator is to improvethe behaviour of the whole architecture when faults occur. Withinthis card, an embedded Advanced Encryption Standard (AES)protected against DFA is inserted as well as a fault injectionblock. We also present the microprocessor core which controlsthe whole card

    Security enhancements for FPGA-based MPSoCs: a boot-to-runtime protection flow for an embedded Linux-based system

    No full text
    International audienceNowadays, embedded systems become more and more complex: the hardware/software codesign approach is a method to create such systems in a single chip which can be based on reconfigurable technologies such as FPGAs (Field-Programmable Gate Arrays). In such systems, data exchanges are a key point as they convey critical and confidential information and data are transmitted between several hardware modules and software layers. In case of an FPGA development life cycle, OS (Operating System) / data updates as runtime communications can be done through an insecure link: attackers can use this medium to make the system misbehave (malicious injection) or retrieve bitstream-related information (eavesdropping). Recent works propose solutions to securely boot a bitstream and the associated OS while runtime transactions are not protected. This work proposes a full boot-to-runtime protection flow of an embedded Linux kernel during boot and confidentiality/integrity protection of the external memory containing the kernel and the main application code/data. This work shows that such a solution with hardware components induces an area occupancy of 10% of a xc6vlx240t Virtex-6 FPGA while having an improved throughput for Linux booting and lowlatency security for runtime protection

    Feedback-Based Admission Control for Firm Real-Time Task Allocation with Dynamic Voltage and Frequency Scaling

    Get PDF
    Feedback-based mechanisms can be employed to monitor the performance of Multiprocessor Systems-on-Chips (MPSoCs) and steer the task execution even if the exact knowledge of the workload is unknown a priori. In particular, traditional proportional-integral controllers can be used with firm real-time tasks to either admit them to the processing cores or reject in order not to violate the timeliness of the already admitted tasks. During periods with a lower computational power demand, dynamic voltage and frequency scaling (DVFS) can be used to reduce the dissipation of energy in the cores while still not violating the tasks’ time constraints. Depending on the workload pattern and weight, platform size and the granularity of DVFS, energy savings can reach even 60% at the cost of a slight performance degradation

    Exploiting memory allocations in clusterized many-core architectures

    Get PDF
    Power-efficient architectures have become the most important feature required for future embedded systems. Modern designs, like those released on mobile devices, reveal that clusterization is the way to improve energy efficiency. However, such architectures are still limited by the memory subsystem (i.e., memory latency problems). This work investigates an alternative approach that exploits on-chip data locality to a large extent, through distributed shared memory systems that permit efficient reuse of on-chip mapped data in clusterized many-core architectures. First, this work reviews the current literature on memory allocations and explore the limitations of cluster-based many-core architectures. Then, several memory allocations are introduced and benchmarked scalability, performance and energy-wise, compared to the conventional centralized shared memory solution to reveal which memory allocation is the most appropriate for future mobile architectures. Our results show that distributed shared memory allocations bring performance gains and opportunities to reduce energy consumption

    Extremely low overhead off-chip memory encryption

    Get PDF
    Over the last decade, advancements in performance and efficiency of portable computing devices have allowed them to provide much of the functionality previously restricted to larger computers. Instant communication, GPS navigation, remote banking, and even online shopping are only a few of the activities that can be performed from almost anywhere. However, these conveniences may come at the cost of physical security since portable devices are often operated in a public environment where there is a possibility of being physically exposed or obtained by untrustworthy users. While it is a common practice to secure the data that is transferred from one point to another, the contents of system memory often go unprotected. When physical access to a device is attained, this so called ``data-at-rest can be exploited to reveal private information. Emails, GPS location data, financial transactions, etc. could be harmful if revealed to the wrong party. This thesis investigates the design trade-offs of obscuring data stored within low latency memory on an embedded device. This was achieved by implementing a parameterizable system based on the keystream cache concept. While this solution could be implemented for almost any embedded system, the design was evaluated using reconfigurable hardware in order to reduce development costs. A prototype was built and tested on an Altera FPGA development board where parameters of the architecture were varied to find a solution that reduced performance overhead, while minimizing hardware usage. The resulting application benchmarks show as little as 1% performance overhead while using minimal hardware resources

    Design and development from single core reconfigurable accelerators to a heterogeneous accelerator-rich platform

    Get PDF
    The performance of a platform is evaluated based on its ability to deal with the processing of multiple applications of different nature. In this context, the platform under evaluation can be of homogeneous, heterogeneous or of hybrid architecture. The selection of an architecture type is generally based on the set of different target applications and performance parameters, where the applications can be of serial or parallel nature. The evaluation is normally based on different performance metrics, e.g., resource/area utilization, execution time, power and energy consumption. This process can also include high-level performance metrics, e.g., Operations Per Second (OPS), OPS/Watt, OPS/Hz, Watt/Area etc. An example of architecture selection can be related to a wireless communication system where the processing of computationally-intensive signal-processing algorithms has strict execution-time constraints and in this case, a platform with special-purpose accelerators is relatively more suitable than a typical homogeneous platform. A couple of decades ago, it was expensive to plant many special-purpose accelerators on a chip as the cost per unit area was relatively higher than today. The utilization wall is also becoming a limiting factor in homogeneous multicore scaling which means that all the cores on a platform cannot be operated at their maximum frequency due to a possible thermal meltdown. In this case, some of the processing cores have to be turned-off or to be operated at very low frequencies making most of the part of the chip to stay underutilized. A possible solution lies in the use of heterogeneous multicore platforms where many application-specific cores operate at lower frequencies, therefore reducing power dissipation density and increasing other performance parameters. However, to achieve maximum flexibility in processing, a general-purpose flavor can also be introduced by adding a few Reduced Instruction-Set Computing (RISC) cores. A power class of heterogeneous multicore platforms is an accelerator-rich platform where many application-specific accelerators are loosely connected with each other for work load distribution or to execute the tasks independently. This research work spans from the design and development of three different types of template-based Coarse-Grain Reconfigurable Arrays (CGRAs), i.e., CREMA, AVATAR and SCREMA to a Heterogeneous Accelerator-Rich Platform (HARP). The accelerators generated from the three CGRAs could perform different lengths and types of Fast Fourier Transform (FFT), real and complex Matrix-Vector Multiplication (MVM) algorithms. CREMA and AVATAR were fixed CGRAs with eight and sixteen number of Processing Element (PE) columns, respectively. SCREMA could flex between four, eight, sixteen and thirty two number of PE columns. Many case studies were conducted to evaluate the performance of the reconfigurable accelerators generated from these CGRA templates. All of these CGRAs work in a processor/coprocessor model tightly integrated with a Direct Memory Access (DMA) device. Apart from these platforms, a reconfigurable Application-Specific Instruction-set Processor (rASIP) is also designed, tested for FFT execution under IEEE-802.11n timing constraints and evaluated against a processor/coprocessor model. It was designed by integrating AVATAR generated radix-(2, 4) FFT accelerator into the datapath of a RISC processor. The instruction set of the RISC processor was extended to perform additional operations related to AVATAR. As mentioned earlier, the underutilized part of the chip, now-a-days called Dark Silicon is posing many challenges for the designers. Apart from software optimizations, clock gating, dynamic voltage/frequency scaling and other high-level techniques, one way of dealing with this problem is to use many application-specific cores. In an effort to maximize the number of reconfigurable processing resources on a platform, the accelerator-rich architecture HARP was designed and evaluated in terms of different performance metrics. HARP is constructed on a Network-on-Chip (NoC) of 3x3 nodes where with every node, a CGRA of application-specific size is integrated other than the central node which is attached to a RISC processor. The RISC establishes synchronization between the nodes for data transfer and also performs the supervisory control. While using the NoC as the backbone of communication between the cores, it becomes possible for all the cores to address each other and also perform execution simultaneously and independently of each other. The performance of accelerators generated from CREMA, AVATAR and SCREMA templates were evaluated individually and also when attached to HARP's NoC nodes. The individual CGRAs show promising results in their own capacity but when integrated all together in the framework of HARP, interesting comparisons were established in terms of overall execution times, resource utilization, operating frequencies, power and energy consumption. In evaluating HARP, estimates and measurements were also made in some advanced performance metrics, e.g., in MOPS/mW and MOPS/MHz. The overall research work promotes the idea of heterogeneous accelerator-rich platform as a solution to current problems and future needs of industry and academia

    Selective Noise Based Power-Efficient and Effective Countermeasure against Thermal Covert Channel Attacks in Multi-Core Systems

    Get PDF
    With increasing interest in multi-core systems, such as any communication systems, infra-structures can become targets for information leakages via covert channel communication. Covert channel attacks lead to leaking secret information and data. To design countermeasures against these threats, we need to have good knowledge about classes of covert channel attacks along with their properties. Temperature–based covert communication channel, known as Thermal Covert Channel (TCC), can pose a threat to the security of critical information and data. In this paper, we present a novel scheme against such TCC attacks. The scheme adds selective noise to the thermal signal so that any possible TCC attack can be wiped out. The noise addition only happens at instances when there are chances of correct information exchange to increase the bit error rate (BER) and keep the power consumption low. Our experiments have illustrated that the BER of a TCC attack can increase to 94% while having similar power consumption as that of state-of-the-art
    corecore