340 research outputs found
Recommended from our members
On thermal sensor calibration and software techniques for many-core thermal management
The high power density of a many-core processor results in increased temperature which negatively impacts system reliability and performance. Dynamic thermal management applies thermal-aware techniques at run time to avoid overheating using temperature information collected from on-chip thermal sensors. Temperature sensing and thermal control schemes are two critical technologies for successfully maintaining thermal safety. In this dissertation, on-line thermal sensor calibration schemes are developed to provide accurate temperature information.
Software-based dynamic thermal management techniques are proposed using calibrated thermal sensors. Due to process variation and silicon aging, on-chip thermal sensors require periodic calibration before use in DTM. However, the calibration cost for thermal sensors can be prohibitively high as the number of on-chip sensors increases. Linear models which are suitable for on-line calculation are employed to estimate temperatures at multiple sensor locations using performance counters. The estimated temperature and the actual sensor thermal profile show a very high similarity with correlation coefficient ~0.9 for SPLASH2 and SPEC2000 benchmarks.
A calibration approach is proposed to combine potentially inaccurate temperature values obtained from two sources: thermal sensor readings and temperature estimations. A data fusion strategy based on Bayesian inference, which combines information from these two sources, is demonstrated. The result shows the strategy can effectively recalibrate sensor readings in response to inaccuracies caused by process variation and environmental noise. The average absolute error of the corrected sensor temperature readings is
A dynamic task allocation strategy is proposed to address localized overheating in many-core systems. Our approach employs reinforcement learning, a dynamic machine learning algorithm that performs task allocation based on current temperatures and a prediction regarding which assignment will minimize the peak temperature. Our results show that the proposed technique is fast (scheduling performed in \u3c1 \u3ems) and can efficiently reduce peak temperature by up to 8 degree C in a 49-core processor (6% on average) versus a leading competing task allocation approach for a series of SPLASH-2 benchmarks. Reinforcement learning has also been applied to 3D integrated circuits to allocate tasks with thermal awareness
Recommended from our members
An Interconnection Network Topology Generation Scheme for Multicore Systems
Multi-Processor System on Chip (MPSoC) consisting of multiple processing cores connected via a Network on Chip (NoC) has gained prominence over the last decade. Most common way of mapping applications to MPSoCs is by dividing the application into small tasks and representing them in the form of a task graph where the edges connecting the tasks represent the inter task communication. Task scheduling involves mapping task to processor cores so as to meet a specified deadline for the application/task graph. With increase in system complexity and application parallelism, task communication times are tending towards task execution times. Hence the NoC which forms the communication backbone for the cores plays a critical role in meeting the deadlines. In this thesis we present an approach to synthesize a minimal network connecting a set of cores in a MPSoC in the presence of deadlines. Given a task graph and a corresponding task to processor schedule, we have developed a partitioning methodology to generate an efficient interconnection network for the cores. We adopt a 2-phase design flow where we synthesize the network in first phase and in second phase we perform statistical analysis of the network thus generated. We compare our model with a simulated annealing based scheme, a static graph based greedy scheme and the standard mesh topology. The proposed solution offers significant area and performance benefits over the alternate solutions compared in this work
Design Space Exploration for MPSoC Architectures
Multiprocessor system-on-chip (MPSoC) designs utilize the available technology and communication architectures to meet the requirements of the upcoming applications. In MPSoC, the communication platform is both the key enabler, as well as the key differentiator for realizing efficient MPSoCs. It provides product differentiation to meet a diverse, multi-dimensional set of design constraints, including performance, power, energy, reconfigurability, scalability, cost, reliability and time-to-market. The communication resources of a single interconnection platform cannot be fully utilized by all kind of applications, such as the availability of higher communication bandwidth for computation but not data intensive applications is often unfeasible in the practical implementation.
This thesis aims to perform the architecture-level design space exploration towards efficient and scalable resource utilization for MPSoC communication architecture. In order to meet the performance requirements within the design constraints, careful selection of MPSoC communication platform, resource aware partitioning and mapping of the application play important role. To enhance the utilization of communication resources, variety of techniques such as resource sharing, multicast to avoid re-transmission of identical data, and adaptive routing can be used. For implementation, these techniques should be customized according to the platform architecture. To address the resource utilization of MPSoC communication platforms, variety of architectures with different design parameters and performance levels, namely Segmented bus (SegBus), Network-on-Chip (NoC) and Three-Dimensional NoC (3D-NoC), are selected. Average packet latency and power consumption are the evaluation parameters for the proposed techniques.
In conventional computing architectures, fault on a component makes the connected fault-free components inoperative. Resource sharing approach can utilize the fault-free components to retain the system performance by reducing the impact of faults. Design space exploration also guides to narrow down the selection of MPSoC architecture, which can meet the performance requirements with design constraints.Siirretty Doriast
Recommended from our members
Smart Resource Sharing for Concurrency and Security
Different layers of the computer system, from the low-level hardware accelerators and networks-on-chip (NoC) in multi-core systems, to the upper-level operating systems and software applications, rely on the sharing of hardware computing resources. Unfortunately such sharing, when not carefully managed, can introduce a host of protection problems and sources of information leakage. We describe a set of methods by which it is possible to systematically scale performance via hardware sharing without exacerbating security properties by being aware of the design and characteristics of individual layers and components. The key to this is efficiently dealing with security vulnerabilities introduced by sharing in terms of time and space through the creation of new security-conscious sharing interfaces. In a systematic way is to first define coordination techniques into more detailed patterns, and by bridging the gap of less efficient universal measures with provably more performant and secure patterns.Specifically we demonstrate the usefulness of a sharing pattern for hardware and software systems where separation is of concern (interference and timing channel mitigation, etc). The most important insight is that in order to fully utilize computing resources (to improve performance and availability), the entities that share these resources must coordinate in a pre-calculated way. More dynamic approaches to improve performance and concurrency are likely to introduce new interference in the system. While we show that certain static scheduling measures in lower level hardware such as networks-on-chip can provably eliminate timing channels, the dynamic nature of software systems makes covert channels harder to be confined. Besides, software systems also face other types of security problems beyond side channels. To improve concurrency and performance without exacerbating security requires a slightly different approach.To study the obstacles that hinder software applications' scaling in a system because of security concerns, we delve into the Android operating system and its appification ecosystem structure. A prime avenue for attack is introduced because of its distributed sharing eco-pattern. We propose a centralized approach with a single reliable service as a method to enable computation reuse among applications. The proposed centralization technique favors well-protected application-to-system communications over vulnerable application-to-application communications. Thus not only computation concurrency is boosted but also the possibility of an app being attacked through the attack-prone Inter-Component Calls (ICCs) due to possible distributed computation sharing is eliminated. This approach further enables improvements to security with the addition of a novel application-centric grouping for isolation. We show through a prototype on Android how our approach supports and protects inter-app resource sharing, while improving concurrency at scale
A low-latency modular switch for CMP systems
[EN] As technology advances, the number of cores in Chip MultiProcessor systems and MultiProcessor Systems-on-Chips keeps increasing. The network must provide sustained throughput and ultra-low latencies. In this paper we propose new pipelined switch designs focused in reducing the switch latency. We identify the switch components that limit the switch frequency: the arbiter. Then, we simplify the arbiter logic by using multiple smaller arbiters, but increasing greatly the switch area. To solve this problem, a second design is presented where the routing traversal and arbitrations tasks are mixed. Results demonstrate a switch latency reduction ranging from 10% to 21%. Network latency is reduced in a range from 11% to 15%. © 2011 Elsevier B.V. All rights reserved.This work was supported by the Spanish MEC and MICINN, as well as European Commission FEDER funds, under Grants CSD2006-00046 and TIN2009-14475-C04. It was also partly supported by the project NaNoC (Project Label 248972) which is funded by the European Commission within the Research Programme FP7.Roca PĂ©rez, A.; Flich Cardo, J.; Silla JimĂ©nez, F.; Duato MarĂn, JF. (2011). A low-latency modular switch for CMP systems. Microprocessors and Microsystems. 35(8):742-754. https://doi.org/10.1016/j.micpro.2011.08.011S74275435
- …