1,642 research outputs found
Intelligent Management of Mobile Systems through Computational Self-Awareness
Runtime resource management for many-core systems is increasingly complex.
The complexity can be due to diverse workload characteristics with conflicting
demands, or limited shared resources such as memory bandwidth and power.
Resource management strategies for many-core systems must distribute shared
resource(s) appropriately across workloads, while coordinating the high-level
system goals at runtime in a scalable and robust manner.
To address the complexity of dynamic resource management in many-core
systems, state-of-the-art techniques that use heuristics have been proposed.
These methods lack the formalism in providing robustness against unexpected
runtime behavior. One of the common solutions for this problem is to deploy
classical control approaches with bounds and formal guarantees. Traditional
control theoretic methods lack the ability to adapt to (1) changing goals at
runtime (i.e., self-adaptivity), and (2) changing dynamics of the modeled
system (i.e., self-optimization).
In this chapter, we explore adaptive resource management techniques that
provide self-optimization and self-adaptivity by employing principles of
computational self-awareness, specifically reflection. By supporting these
self-awareness properties, the system can reason about the actions it takes by
considering the significance of competing objectives, user requirements, and
operating conditions while executing unpredictable workloads
Automatic synthesis and optimization of chip multiprocessors
The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed
FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels
Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach
A Survey of Research into Mixed Criticality Systems
This survey covers research into mixed criticality systems that has been published since Vestal’s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards
Castell: a heterogeneous cmp architecture scalable to hundreds of processors
Technology improvements and power constrains have taken multicore architectures to dominate
microprocessor designs over uniprocessors. At the same time, accelerator based architectures
have shown that heterogeneous multicores are very efficient and can provide high throughput for
parallel applications, but with a high-programming effort. We propose Castell a scalable chip
multiprocessor architecture that can be programmed as uniprocessors, and provides the high
throughput of accelerator-based architectures.
Castell relies on task-based programming models that simplify software development. These
models use a runtime system that dynamically finds, schedules, and adds hardware-specific features
to parallel tasks. One of these features is DMA transfers to overlap computation and data
movement, which is known as double buffering. This feature allows applications on Castell
to tolerate large memory latencies and lets us design the memory system focusing on memory
bandwidth.
In addition to provide programmability and the design of the memory system, we have used
a hierarchical NoC and added a synchronization module. The NoC design distributes memory
traffic efficiently to allow the architecture to scale. The synchronization module is a consequence
of the large performance degradation of application for large synchronization latencies.
Castell is mainly an architecture framework that enables the definition of domain-specific
implementations, fine-tuned to a particular problem or application. So far, Castell has been
successfully used to propose heterogeneous multicore architectures for scientific kernels, video
decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW).
It has also been used to explore a number of architecture optimizations such as enhanced DMA
controllers, and architecture support for task-based programming models.
ii
- …