Search CORE

425 research outputs found

Energy Saving in QoS Fog-supported Data Centers

Author: VINUEZA NARANJO PAOLA GABRIELA
Publication venue
Publication date: 22/02/2018
Field of study

One of the most important challenges that cloud providers face in the explosive growth of data is to reduce the energy consumption of their designed, modern data centers. The majority of current research focuses on energy-efficient resources management in the infrastructure as a service (IaaS) model through "resources virtualization" - virtual machines and physical machines consolidation. However, actual virtualized data centers are not supporting communication–computing intensive real-time applications, big data stream computing (info-mobility applications, real-time video co-decoding). Indeed, imposing hard-limits on the overall per-job computing-plus-communication delays forces the overall networked computing infrastructure to quickly adopt its resource utilization to the (possibly, unpredictable and abrupt) time fluctuations of the offered workload. Recently, Fog Computing centers are as promising commodities in Internet virtual computing platform that raising the energy consumption and making the critical issues on such platform. Therefore, it is expected to present some green solutions (i.e., support energy provisioning) that cover fog-supported delay-sensitive web applications. Moreover, the usage of traffic engineering-based methods dynamically keep up the number of active servers to match the current workload. Therefore, it is desirable to develop a flexible, reliable technological paradigm and resource allocation algorithm to pay attention the consumed energy. Furthermore, these algorithms could automatically adapt themselves to time-varying workloads, joint reconfiguration, and orchestration of the virtualized computing-plus-communication resources available at the computing nodes. Besides, these methods facilitate things devices to operate under real-time constraints on the allowed computing-plus-communication delay and service latency. The purpose of this thesis is: i) to propose a novel technological paradigm, the Fog of Everything (FoE) paradigm, where we detail the main building blocks and services of the corresponding technological platform and protocol stack; ii) propose a dynamic and adaptive energy-aware algorithm that models and manages virtualized networked data centers Fog Nodes (FNs), to minimize the resulting networking-plus-computing average energy consumption; and, iii) propose a novel Software-as-a-Service (SaaS) Fog Computing platform to integrate the user applications over the FoE. The emerging utilization of SaaS Fog Computing centers as an Internet virtual computing commodity is to support delay-sensitive applications. The main blocks of the virtualized Fog node, operating at the Middleware layer of the underlying protocol stack and comprises of: i) admission control of the offered input traffic; ii) balanced control and dispatching of the admitted workload; iii) dynamic reconfiguration and consolidation of the Dynamic Voltage and Frequency Scaling (DVFS)-enabled Virtual Machines (VMs) instantiated onto the parallel computing platform; and, iv) rate control of the traffic injected into the TCP/IP connection. The salient features of this algorithm are that: i) it is adaptive and admits distributed scalable implementation; ii) it has the capacity to provide hard QoS guarantees, in terms of minimum/maximum instantaneous rate of the traffic delivered to the client, instantaneous goodput and total processing delay; and, iii) it explicitly accounts for the dynamic interaction between computing and networking resources in order to maximize the resulting energy efficiency. Actual performance of the proposed scheduler in the presence of: i) client mobility; ii) wireless fading; iii) reconfiguration and two-thresholds consolidation costs of the underlying networked computing platform; and, iv) abrupt changes of the transport quality of the available TCP/IP mobile connection, is numerically tested and compared to the corresponding ones of some state-of-the-art static schedulers, under both synthetically generated and measured real-world workload traces

Archivio della ricerca- Università di Roma La Sapienza

Multiprocessor Image-Based Control: Model-Driven Optimisation

Author: Mohamed Sajid
Publication venue: Eindhoven University of Technology
Publication date: 20/12/2022
Field of study

Over the last years, cameras have become an integral component of modern cyber-physical systems due to their versatility, relatively low cost and multi-functionality. Camera sensors form the backbone of modern applications like advanced driver assistance systems (ADASs), visual servoing, telerobotics, autonomous systems, electron microscopes, surveillance and augmented reality. Image-based control (IBC) systems refer to a class of data-intensive feedback control systems whose feedback is provided by the camera sensor(s). IBC systems have become popular with the advent of efficient image-processing algorithms, low-cost complementary metal–oxide semiconductor (CMOS) cameras with high resolution and embedded multiprocessor computing platforms with high performance. The combination of the camera sensor(s) and image-processing algorithms can detect a rich set of features in an image. These features help to compute the states of the IBC system, such as relative position, distance, or depth, and support tracking of the object-of-interest. Modern industrial compute platforms offer high performance by allowing parallel and pipelined execution of tasks on their multiprocessors.The challenge, however, is that the image-processing algorithms are compute-intensive and result in an inherent relatively long sensing delay. State-of-the-art design methods do not fully exploit the IBC system characteristics and advantages of the multiprocessor platforms for optimising the sensing delay. The sensing delay of an IBC system is moreover variable with a significant degree of variation between the best-case and worst-case delay due to application-specific image-processing workload variations and the impact of platform resources. A long variable sensing delay degrades system performance and stability. A tight predictable sensing delay is required to optimise the IBC system performance and to guarantee the stability of the IBC system. Analytical computation of sensing delay is often pessimistic due to image-dependent workload variations or challenging platform timing analysis. Therefore, this thesis explores techniques to cope with the long variable sensing delay by considering application-specific IBC system characteristics and exploiting the benefits of the multiprocessor platforms. Effectively handling the long variable sensing delay helps to optimise IBC system performance while guaranteeing IBC system stability

Pure OAI Repository

Cooperative mobility maintenance techniques for information extraction from mobile wireless sensor networks

Author: Abuarqoub Abdelrahman
Publication venue: Manchester Metropolitan University
Publication date: 01/05/2014
Field of study

Recent advances in the development of microprocessors, microsensors, ad-hoc wireless networking and information fusion algorithms led to increasingly capable Wireless Sensor Networks (WSNs). Besides severe resource constraints, sensor nodes mobility is considered a fundamental characteristic of WSNs. Information Extraction (IE) is a key research area within WSNs that has been characterised in a variety of ways, ranging from a description of its purposes to reasonably abstract models of its processes and components. The problem of IE is a challenging task in mobile WSNs for several reasons including: the topology changes rapidly; calculation of trajectories and velocities is not a trivial task; increased data loss and data delivery delays; and other context and application specific challenges. These challenges offer fundamentally new research problems. There is a wide body of literature about IE from static WSNs. These approaches are proved to be effective and efficient. However, there are few attempts to address the problem of IE from mobile WSNs. These attempts dealt with mobility as the need arises and do not deal with the fundamental challenges and variations introduced by mobility on the WSNs. The aim of this thesis is to develop a solution for IE from mobile WSNs. This aim is achieved through the development of a middle-layer solution, which enables IE approaches that were designed for the static WSNs to operate in the presence of multiple mobile nodes. This thesis contributes toward the design of a new self-stabilisation algorithm that provides autonomous adaptability against nodes mobility in a transparent manner to both upper network layers and user applications. In addition, this thesis proposes a dynamic network partitioning protocol to achieve high quality of information, scalability and load balancing. The proposed solution is flexible, may be applied to different application domains, and less complex than many existing approaches. The simplicity of the solutions neither demands great computational efforts nor large amounts of energy conservation. Intensive simulation experiments with real-life parameters provide evidence of the efficiency of the proposed solution. Performance experimentations demonstrate that the integrated DNP/SS protocol outperforms its rival in the literature in terms of timeliness (by up to 22%), packet delivery ratio (by up to 13%), network scalability (by up to 25%), network lifetime (by up to 40.6%), and energy consumption (by up to 39.5%). Furthermore, it proves that DNP/SS successfully allows the deployment of static-oriented IE approaches in hybrid networks without any modifications or adaptations

E-space: Manchester Metropolitan University's Research Repository

From Traditional Adaptive Data Caching to Adaptive Context Caching: A Survey

Author: Abken Amin
Hassani Alireza
Loke Seng W.
Medvedev Alexey
Weerasinghe Shakthi
Zaslavsky Arkady
Publication venue
Publication date: 21/11/2022
Field of study

Context data is in demand more than ever with the rapid increase in the development of many context-aware Internet of Things applications. Research in context and context-awareness is being conducted to broaden its applicability in light of many practical and technical challenges. One of the challenges is improving performance when responding to large number of context queries. Context Management Platforms that infer and deliver context to applications measure this problem using Quality of Service (QoS) parameters. Although caching is a proven way to improve QoS, transiency of context and features such as variability, heterogeneity of context queries pose an additional real-time cost management problem. This paper presents a critical survey of state-of-the-art in adaptive data caching with the objective of developing a body of knowledge in cost- and performance-efficient adaptive caching strategies. We comprehensively survey a large number of research publications and evaluate, compare, and contrast different techniques, policies, approaches, and schemes in adaptive caching. Our critical analysis is motivated by the focus on adaptively caching context as a core research problem. A formal definition for adaptive context caching is then proposed, followed by identified features and requirements of a well-designed, objective optimal adaptive context caching strategy.Comment: This paper is currently under review with ACM Computing Surveys Journal at this time of publishing in arxiv.or

arXiv.org e-Print Archive

Adaptive radio resource management schemes for the downlink of the OFDMA-based wireless communication systems

Author: Isaac Bernard Nyakundi
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2014
Field of study

Includes bibliographical references.Due to its superior characteristics that make it suitable for high speed mobile wireless systems OFDMA has been adopted by next generation broadband wireless standards including Worldwide Interoperability for Microwave Access (WiMAX) and Long Term Evolution – Advanced (LTE-A). Intelligent and adaptive Radio Resource Management (RRM) schemes are a fundamental tool in the design of wireless systems to be able to fully and efficiently utilize the available scarce resources and be able to meet the user data rates and QoS requirements. Previous works were only concerned with maximizing system efficiency and thus used opportunistic algorithms that allocate resources to users with the best opportunities to optimize system capacity. Thus, only those users with good channel conditions were considered for resource allocation and users in bad channel conditions were left out to starve of resources. The main objective of our study is to design adaptive radio resource allocation (RRA) algorithms that distribute the scarce resources more fairly among network users while efficiently using the resources to maximize system throughput. Four scheduling algorithms have been formulated and analysed based on fairness, throughputs and delay. This was done for users demanding different services and QoS requirements. Two of the scheduling algorithms, Maximum Sum Rate (MSR) and Round Robin (RR) are used respectively, as references to analyze throughput and fairness among network users. The other two algorithms are Proportional Fair Scheduling (PFS) and Margin Adaptive Scheduling Scheme (MASS)

Cape Town University OpenUCT

Multi-objective resource optimization in space-aerial-ground-sea integrated networks

Author: Sharif Sana
Publication venue
Publication date: 01/01/2023
Field of study

Space-air-ground-sea integrated (SAGSI) networks are envisioned to connect satellite, aerial, ground, and sea networks to provide connectivity everywhere and all the time in sixth-generation (6G) networks. However, the success of SAGSI networks is constrained by several challenges including resource optimization when the users have diverse requirements and applications. We present a comprehensive review of SAGSI networks from a resource optimization perspective. We discuss use case scenarios and possible applications of SAGSI networks. The resource optimization discussion considers the challenges associated with SAGSI networks. In our review, we categorized resource optimization techniques based on throughput and capacity maximization, delay minimization, energy consumption, task offloading, task scheduling, resource allocation or utilization, network operation cost, outage probability, and the average age of information, joint optimization (data rate difference, storage or caching, CPU cycle frequency), the overall performance of network and performance degradation, software-defined networking, and intelligent surveillance and relay communication. We then formulate a mathematical framework for maximizing energy efficiency, resource utilization, and user association. We optimize user association while satisfying the constraints of transmit power, data rate, and user association with priority. The binary decision variable is used to associate users with system resources. Since the decision variable is binary and constraints are linear, the formulated problem is a binary linear programming problem. Based on our formulated framework, we simulate and analyze the performance of three different algorithms (branch and bound algorithm, interior point method, and barrier simplex algorithm) and compare the results. Simulation results show that the branch and bound algorithm shows the best results, so this is our benchmark algorithm. The complexity of branch and bound increases exponentially as the number of users and stations increases in the SAGSI network. We got comparable results for the interior point method and barrier simplex algorithm to the benchmark algorithm with low complexity. Finally, we discuss future research directions and challenges of resource optimization in SAGSI networks

Lakehead University Knowledge Commons

Recommended from our members

An Emergent Architecture for Scaling Decentralized Communication Systems (DCS)

Author: Vicente John Barbosa
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

With recent technological advancements now accelerating the mobile and wireless Internet solution space, a ubiquitous computing Internet is well within the research and industrial community's design reach - a decentralized system design, which is not solely driven by static physical models and sound engineering principals, but more dynamically, perhaps sub-optimally at initial deployment and socially-influenced in its evolution. To complement today's Internet system, this thesis proposes a Decentralized Communication System (DCS) architecture with the following characteristics: flat physical topologies with numerous compute oriented and communication intensive nodes in the network with many of these nodes operating in multiple functional roles; self-organizing virtual structures formed through alternative mobility scenarios and capable of serving ad hoc networking formations; emergent operations and control with limited dependency on centralized control and management administration. Today, decentralized systems are not commercially scalable or viable for broad adoption in the same way we have to come to rely on the Internet or telephony systems. The premise in this thesis is that DCS can reach high levels of resilience, usefulness, scale that the industry has come to experience with traditional centralized systems by exploiting the following properties: (i.) network density and topological diversity; (ii.) self-organization and emergent attributes; (iii.) cooperative and dynamic infrastructure; and (iv.) node role diversity. This thesis delivers key contributions towards advancing the current state of the art in decentralized systems. First, we present the vision and a conceptual framework for DCS. Second, the thesis demonstrates that such a framework and concept architecture is feasible by prototyping a DCS platform that exhibits the above properties or minimally, demonstrates that these properties are feasible through prototyped network services. Third, this work expands on an alternative approach to network clustering using hierarchical virtual clusters (HVC) to facilitate self-organizing network structures. With increasing network complexity, decentralized systems can generally lead to unreliable and irregular service quality, especially given unpredictable node mobility and traffic dynamics. The HVC framework is an architectural strategy to address organizational disorder associated with traditional decentralized systems. The proposed HVC architecture along with the associated promotional methodology organizes distributed control and management services by leveraging alternative organizational models (e.g., peer-to-peer (P2P), centralized or tiered) in hierarchical and virtual fashion. Through simulation and analytical modeling, we demonstrate HVC efficiencies in DCS structural scalability and resilience by comparing static and dynamic HVC node configurations against traditional physical configurations based on P2P, centralized or tiered structures. Next, an emergent management architecture for DCS exploiting HVC for self-organization, introduces emergence as an operational approach to scaling DCS services for state management and policy control. In this thesis, emergence scales in hierarchical fashion using virtual clustering to create multiple tiers of local and global separation for aggregation, distribution and network control. Emergence is an architectural objective, which HVC introduces into the proposed self-management design for scaling and stability purposes. Since HVC expands the clustering model hierarchically and virtually, a clusterhead (CH) node, positioned as a proxy for a specific cluster or grouped DCS nodes, can also operate in a micro-capacity as a peer member of an organized cluster in a higher tier. As the HVC promotional process continues through the hierarchy, each tier of the hierarchy exhibits emergent behavior. With HVC as the self-organizing structural framework, a multi-tiered, emergent architecture enables the decentralized management strategy to improve scaling objectives that traditionally challenge decentralized systems. The HVC organizational concept and the emergence properties align with and the view of the human brain's neocortex layering structure of sensory storage, prediction and intelligence. It is the position in this thesis, that for DCS to scale and maintain broad stability, network control and management must strive towards an emergent or natural approach. While today's models for network control and management have proven to lack scalability and responsiveness based on pure centralized models, it is unlikely that singular organizational models can withstand the operational complexities associated with DCS. In this work, we integrate emergence and learning-based methods in a cooperative computing manner towards realizing DCS self-management. However, unlike many existing work in these areas which break down with increased network complexity and dynamics, the proposed HVC framework is utilized to offset these issues through effective separation, aggregation and asynchronous processing of both distributed state and policy. Using modeling techniques, we demonstrate that such architecture is feasible and can improve the operational robustness of DCS. The modeling emphasis focuses on demonstrating the operational advantages of an HVC-based organizational strategy for emergent management services (i.e., reachability, availability or performance). By integrating the two approaches, the DCS architecture forms a scalable system to address the challenges associated with traditional decentralized systems. The hypothesis is that the emergent management system architecture will improve the operational scaling properties of DCS-based applications and services. Additionally, we demonstrate structural flexibility of HVC as an underlying service infrastructure to build and deploy DCS applications and layered services. The modeling results demonstrate that an HVC-based emergent management and control system operationally outperforms traditional structural organizational models. In summary, this thesis brings together the above contributions towards delivering a scalable, decentralized system for Internet mobile computing and communications

Columbia University Academic Commons

Cross-Layer Design of Highly Scalable and Energy-Efficient AI Accelerator Systems Using Photonic Integrated Circuits

Author: Sri Vatsavai Sairam
Publication venue: UKnowledge
Publication date: 01/01/2024
Field of study

Artificial Intelligence (AI) has experienced remarkable success in recent years, solving complex computational problems across various domains, including computer vision, natural language processing, and pattern recognition. Much of this success can be attributed to the advancements in deep learning algorithms and models, particularly Artificial Neural Networks (ANNs). In recent times, deep ANNs have achieved unprecedented levels of accuracy, surpassing human capabilities in some cases. However, these deep ANN models come at a significant computational cost, with billions to trillions of parameters. Recent trends indicate that the number of parameters per ANN model will continue to grow exponentially in the foreseeable future. To meet the escalating computational demands of ANN models, the hardware accelerators used for processing ANNs must offer lower latency and higher energy efficiency. Unfortunately, traditional electronic implementations of ANN hardware accelerators, including CPUs, Graphics Processing Units (GPUs), Application-Specific Integrated Circuits (ASICs), and Field Programmable Gate Arrays (FPGAs), have fallen short of meeting the latency and energy efficiency requirements for processing deep ANN models. Furthermore, the interconnection network subsystems in these electronic accelerator systems, designed to facilitate large-scale data transfers between processing cores and memory/control units within the accelerator systems, have become bottlenecks that hinder the throughput, latency, and energy efficiency of deep ANN model processing. Fortunately, Photonic Integrated Circuits (PICs)-based accelerator systems, featuring photonic network subsystems are promising alternatives to conventional electronic accelerators. PIC-based accelerator systems operate in the optical domain, delivering processing at the speed of light with ultra-low latency, minimal dynamic energy consumption, and high throughput. These advantages stem from the wavelength division multiplexing capabilities and the absence of distance-dependent impedance in PICs. Furthermore, these characteristics enable the implementation of high-performance photonic network subsystems within PIC-based accelerator systems. Additionally, PIC-based accelerator systems offer inherent optical nonlinearities. Despite these numerous advantages over electronic accelerators, PIC-based systems still encounter several challenges due to limited optical power budget, susceptibility to crosstalk and other sources of noise caused by the analog operation, high area consumption, and restricted functional flexibility of PICs. These challenges manifest in various ways. (i) The existence of a significant trade-off between the achievable processing core size and the supported bit precision that impedes the scalability of processing cores. (ii) The limited reconfigurability, in terms of supported computing size and precision, makes them less adaptable to modern ANN models with diverse computational and precision demands. (iii) The reliance on electronic adder networks for accumulation diminishes the latency and energy consumption benefits of PIC-based accelerator systems due to frequent analog-to-digital conversions and memory accesses involved in accumulations. My research has contributed several solutions that overcome a multitude of these challenges and improve the throughput, energy efficiency, and flexibility of PIC-based AI accelerator systems. I identified and analyzed factors that affect the scalability and reconfigurability of PIC-based AI accelerator systems. I proposed several novel PIC-based accelerator architectures with enhancements at the circuit level, architecture level, and system level to improve scalability, reconfigurability, and functional flexibility. At the circuit level, these enhancements serve to decrease optical signal losses, reduce control complexity, enable adaptability for various ANN processing tasks, and lower power and area consumption. The architecture-level improvements mitigate crosstalk noise, facilitate functional reconfigurability, enable in-situ and flexible spatio-temporal accumulation, and provide flexible support for different dataflows. The system-level enhancements involve the integration of stochastic computing with PIC-based accelerators to break the inherent trade-off between scalability and supported bit precision. Additionally, applying stochastic computing enhances the flexibility of PIC-based accelerators, allowing them to support mixed-precision ANN models. These cross-layer enhancements collectively contribute to the design of PIC-based AI accelerator systems, resulting in improved throughput, energy efficiency, scalability, and reconfigurability

University of Kentucky

Artificial intelligence empowered virtual network function deployment and service function chaining for next-generation networks

Author: Emu Mahzabeen
Publication venue
Publication date: 01/01/2021
Field of study

The entire Internet of Things (IoT) ecosystem is directing towards a high volume of diverse applications. From smart healthcare to smart cities, every ubiquitous digital sector provisions automation for an immersive experience. Augmented/Virtual reality, remote surgery, and autonomous driving expect high data rates and ultra-low latency. The Network Function Virtualization (NFV) based IoT infrastructure of decoupling software services from proprietary devices has been extremely popular due to cutting back significant deployment and maintenance expenditure in the telecommunication industry. Another substantially highlighted technological trend for delaysensitive IoT applications has emerged as multi-access edge computing (MEC). MEC brings NFV to the network edge (in closer proximity to users) for faster computation. Among the massive pool of IoT services in NFV context, the urgency for efficient edge service orchestration is constantly growing. The emerging challenges are addressed as collaborative optimization of resource utilities and ensuring Quality-ofService (QoS) with prompt orchestration in dynamic, congested, and resource-hungry IoT networks. Traditional mathematical programming models are NP-hard, hence inappropriate for time-sensitive IoT environments. In this thesis, we promote the need to go beyond the realms and leverage artificial intelligence (AI) based decision-makers for “smart” service management. We offer different methods of integrating supervised and reinforcement learning techniques to support future-generation wireless network optimization problems. Due to the combinatorial explosion of some service orchestration problems, supervised learning is more superior to reinforcement learning performance-wise. Unfortunately, open access and standardized datasets for this research area are still in their infancy. Thus, we utilize the optimal results retrieved by Integer Linear Programming (ILP) for building labeled datasets to train supervised models (e.g., artificial neural networks, convolutional neural networks). Furthermore, we find that ensemble models are better than complex single networks for control layer intelligent service orchestration. Contrarily, we employ Deep Q-learning (DQL) for heavily constrained service function chaining optimization. We carefully address key performance indicators (e.g., optimality gap, service time, relocation and communication costs, resource utilization, scalability intelligence) to evaluate the viability of prospective orchestration schemes. We envision that AI-enabled network management can be regarded as a pioneering tread to scale down massive IoT resource fabrication costs, upgrade profit margin for providers, and sustain QoS mutuall

Lakehead University Knowledge Commons

Towards Closing the Programmability-Efficiency Gap using Software-Defined Hardware

Author: Pal Subhankar
Publication venue
Publication date: 01/01/2021
Field of study

The past decade has seen the breakdown of two important trends in the computing industry: Moore’s law, an observation that the number of transistors in a chip roughly doubles every eighteen months, and Dennard scaling, that enabled the use of these transistors within a constant power budget. This has caused a surge in domain-specific accelerators, i.e. specialized hardware that deliver significantly better energy eﬀiciency than general-purpose processors, such as CPUs. While the performance and eﬀiciency of such accelerators are highly desirable, the fast pace of algorithmic innovation and non-recurring engineering costs have deterred their widespread use, since they are only programmable across a narrow set of applications. This has engendered a programmability-eﬀiciency gap across contemporary platforms. A practical solution that can close this gap is thus lucrative and is likely to engender broad impact in both academic research and the industry. This dissertation proposes such a solution with a reconfigurable Software-Defined Hardware (SDH) system that morphs parts of the hardware on-the-fly to tailor to the requirements of each application phase. This system is designed to deliver near-accelerator-level efficiency across a broad set of applications, while retaining CPU-like programmability. The dissertation first presents a fixed-function solution to accelerate sparse matrix multiplication, which forms the basis of many applications in graph analytics and scientific computing. The solution consists of a tiled hardware architecture, co-designed with the outer product algorithm for Sparse Matrix-Matrix multiplication (SpMM), that uses on-chip memory reconfiguration to accelerate each phase of the algorithm. A proof-of-concept is then presented in the form of a prototyped 40 nm Complimentary Metal-Oxide Semiconductor (CMOS) chip that demonstrates energy efficiency and performance per die area improvements of 12.6x and 17.1x over a high-end CPU, and serves as a stepping stone towards a full SDH system. The next piece of the dissertation enhances the proposed hardware with reconfigurability of the dataflow and resource sharing modes, in order to extend acceleration support to a set of common parallelizable workloads. This reconfigurability lends the system the ability to cater to discrete data access and compute patterns, such as workloads with extensive data sharing and reuse, workloads with limited reuse and streaming access patterns, among others. Moreover, this system incorporates commercial cores and a prototyped software stack for CPU-level programmability. The proposed system is evaluated on a diverse set of compute-bound and memory-bound kernels that compose applications in the domains of graph analytics, machine learning, image and language processing. The evaluation shows average performance and energy-efficiency gains of 5.0x and 18.4x over the CPU. The final part of the dissertation proposes a runtime control framework that uses low-cost monitoring of hardware performance counters to predict the next best configuration and reconfigure the hardware, upon detecting a change in phase or nature of data within the application. In comparison to prior work, this contribution targets multicore CGRAs, uses low-overhead decision tree based predictive models, and incorporates reconfiguration cost-awareness into its policies. Compared to the best-average static (non-reconfiguring) configuration, the dynamically reconfigurable system achieves a 1.6x improvement in performance-per-Watt in the Energy-Efficient mode of operation, or the same performance with 23% lower energy in the Power-Performance mode, for SpMM across a suite of real-world inputs. The proposed reconfiguration mechanism itself outperforms the state-of-the-art approach for dynamic runtime control by up to 2.9x in terms of energy-efficiency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169859/1/subh_1.pd

Deep Blue Documents at the University of Michigan