1,860 research outputs found
Hybrid clouds for data-Intensive, 5G-Enabled IoT applications: an overview, key issues and relevant architecture
Hybrid cloud multi-access edge computing (MEC) deployments have been proposed as efficient
means to support Internet of Things (IoT) applications, relying on a plethora of nodes and data. In this paper, an overview on the area of hybrid clouds considering relevant research areas is given, providing technologies and mechanisms for the formation of such MEC deployments, as well as emphasizing several key issues that should be tackled by novel approaches, especially under the 5G paradigm. Furthermore, a decentralized hybrid cloud MEC architecture, resulting in a Platform-as-a-Service (PaaS) is proposed and its main building blocks and layers are thoroughly described. Aiming to offer a broad perspective on the business potential of such a platform, the stakeholder ecosystem is also analyzed. Finally, two use cases in the context of smart cities and mobile health are presented, aimed at showing how the proposed PaaS enables the development of respective IoT applications.Peer ReviewedPostprint (published version
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Implementing embedded neural network processing at the edge requires
efficient hardware acceleration that couples high computational performance
with low power consumption. Driven by the rapid evolution of network
architectures and their algorithmic features, accelerator designs are
constantly updated and improved. To evaluate and compare hardware design
choices, designers can refer to a myriad of accelerator implementations in the
literature. Surveys provide an overview of these works but are often limited to
system-level and benchmark-specific performance metrics, making it difficult to
quantitatively compare the individual effect of each utilized optimization
technique. This complicates the evaluation of optimizations for new accelerator
designs, slowing-down the research progress. This work provides a survey of
neural network accelerator optimization approaches that have been used in
recent works and reports their individual effects on edge processing
performance. It presents the list of optimizations and their quantitative
effects as a construction kit, allowing to assess the design choices for each
building block separately. Reported optimizations range from up to 10'000x
memory savings to 33x energy reductions, providing chip designers an overview
of design choices for implementing efficient low power neural network
accelerators
A service broker for Intercloud computing
This thesis aims at assisting users in finding the most suitable Cloud resources taking into account their functional and non-functional SLA requirements. A key feature of the work is a Cloud service broker acting as mediator between consumers and Clouds. The research involves the implementation and evaluation of two SLA-aware match-making algorithms by use of a simulation environment. The work investigates also the optimal deployment of Multi-Cloud workflows on Intercloud environments
Optimized Biosignals Processing Algorithms for New Designs of Human Machine Interfaces on Parallel Ultra-Low Power Architectures
The aim of this dissertation is to explore Human Machine Interfaces (HMIs) in a variety of biomedical scenarios. The research addresses typical challenges in wearable and implantable devices for diagnostic, monitoring, and prosthetic purposes, suggesting a methodology for tailoring such applications to cutting edge embedded architectures.
The main challenge is the enhancement of high-level applications, also introducing Machine Learning (ML) algorithms, using parallel programming and specialized hardware to improve the performance.
The majority of these algorithms are computationally intensive, posing significant challenges for the deployment on embedded devices, which have several limitations in term of memory size, maximum operative frequency, and battery duration.
The proposed solutions take advantage of a Parallel Ultra-Low Power (PULP) architecture, enhancing the elaboration on specific target architectures, heavily optimizing the execution, exploiting software and hardware resources.
The thesis starts by describing a methodology that can be considered a guideline to efficiently implement algorithms on embedded architectures.
This is followed by several case studies in the biomedical field, starting with the analysis of a Hand Gesture Recognition, based on the Hyperdimensional Computing algorithm, which allows performing a fast on-chip re-training, and a comparison with the state-of-the-art Support Vector Machine (SVM); then a Brain Machine Interface (BCI) to detect the respond of the brain to a visual stimulus follows in the manuscript. Furthermore, a seizure detection application is also presented, exploring different solutions for the dimensionality reduction of the input signals. The last part is dedicated to an exploration of typical modules for the development of optimized ECG-based applications
Design and Implementation of Hardware Accelerators for Neural Processing Applications
Primary motivation for this work was the need to implement hardware
accelerators for a newly proposed ANN structure called Auto Resonance Network
(ARN) for robotic motion planning. ARN is an approximating feed-forward
hierarchical and explainable network. It can be used in various AI applications
but the application base was small. Therefore, the objective of the research
was twofold: to develop a new application using ARN and to implement a hardware
accelerator for ARN. As per the suggestions given by the Doctoral Committee, an
image recognition system using ARN has been implemented. An accuracy of around
94% was achieved with only 2 layers of ARN. The network also required a small
training data set of about 500 images. Publicly available MNIST dataset was
used for this experiment. All the coding was done in Python. Massive
parallelism seen in ANNs presents several challenges to CPU design. For a given
functionality, e.g., multiplication, several copies of serial modules can be
realized within the same area as a parallel module. Advantage of using serial
modules compared to parallel modules under area constraints has been discussed.
One of the module often useful in ANNs is a multi-operand addition. One problem
in its implementation is that the estimation of carry bits when the number of
operands changes. A theorem to calculate exact number of carry bits required
for a multi-operand addition has been presented in the thesis which alleviates
this problem. The main advantage of the modular approach to multi-operand
addition is the possibility of pipelined addition with low reconfiguration
overhead. This results in overall increase in throughput for large number of
additions, typically seen in several DNN configurations
On the Reliability Assessment of Artificial Neural Networks Running on AI-Oriented MPSoCs
Nowadays, the usage of electronic devices running artificial neural networks (ANNs)-based applications is spreading in our everyday life. Due to their outstanding computational capabilities, ANNs have become appealing solutions for safety-critical systems as well. Frequently, they are considered intrinsically robust and fault tolerant for being brain-inspired and redundant computing models. However, when ANNs are deployed on resource-constrained hardware devices, single physical faults may compromise the activity of multiple neurons. Therefore, it is crucial to assess the reliability of the entire neural computing system, including both the software and the hardware components. This article systematically addresses reliability concerns for ANNs running on multiprocessor system-on-a-chips (MPSoCs). It presents a methodology to assign resilience scores to individual neurons and, based on that, schedule the workload of an ANN on the target MPSoC so that critical neurons are neatly distributed among the available processing elements. This reliability-oriented methodology exploits an integer linear programming solver to find the optimal solution. Experimental results are given for three different convolutional neural networks trained on MNIST, SVHN, and CIFAR-10. We carried out a comprehensive assessment on an open-source artificial intelligence-based RISC-V MPSoC. The results show the reliability improvements of the proposed methodology against the traditional scheduling
Doctor of Philosophy
dissertationRecent breakthroughs in silicon photonics technology are enabling the integration of optical devices into silicon-based semiconductor processes. Photonics technology enables high-speed, high-bandwidth, and high-fidelity communications on the chip-scale-an important development in an increasingly communications-oriented semiconductor world. Significant developments in silicon photonic manufacturing and integration are also enabling investigations into applications beyond that of traditional telecom: sensing, filtering, signal processing, quantum technology-and even optical computing. In effect, we are now seeing a convergence of communications and computation, where the traditional roles of optics and microelectronics are becoming blurred. As the applications for opto-electronic integrated circuits (OEICs) are developed, and manufacturing capabilities expand, design support is necessary to fully exploit the potential of this optics technology. Such design support for moving beyond custom-design to automated synthesis and optimization is not well developed. Scalability requires abstractions, which in turn enables and requires the use of optimization algorithms and design methodology flows. Design automation represents an opportunity to take OEIC design to a larger scale, facilitating design-space exploration, and laying the foundation for current and future optical applications-thus fully realizing the potential of this technology. This dissertation proposes design automation for integrated optic system design. Using a buildingblock model for optical devices, we provide an EDA-inspired design flow and methodologies for optical design automation. Underlying these flows and methodologies are new supporting techniques in behavioral and physical synthesis, as well as device-resynthesis techniques for thermal-aware system integration. We also provide modeling for optical devices and determine optimization and constraint parameters that guide the automation techniques. Our techniques and methodologies are then applied to the design and optimization of optical circuits and devices. Experimental results are analyzed to evaluate their efficacy. We conclude with discussions on the contributions and limitations of the approaches in the context of optical design automation, and describe the tremendous opportunities for future research in design automation for integrated optics
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
Exploiting heterogeneity in Chip-Multiprocessor Design
In the past decade, semiconductor manufacturers are persistent in building faster and smaller transistors in order to boost the processor performance as projected by Moore’s Law. Recently, as we enter the deep submicron regime, continuing the same processor development pace becomes an increasingly difficult issue due to constraints on power, temperature, and the scalability of transistors. To overcome these challenges, researchers propose several innovations at both architecture and device levels that are able to partially solve the problems. These diversities in processor architecture and manufacturing materials provide solutions to continuing Moore’s Law by effectively exploiting the heterogeneity, however, they also introduce a set of unprecedented challenges that have been rarely addressed in prior works. In this dissertation, we present a series of in-depth studies to comprehensively investigate the design and optimization of future multi-core and many-core platforms through exploiting heteroge-neities. First, we explore a large design space of heterogeneous chip multiprocessors by exploiting the architectural- and device-level heterogeneities, aiming to identify the optimal design patterns leading to attractive energy- and cost-efficiencies in the pre-silicon stage. After this high-level study, we pay specific attention to the architectural asymmetry, aiming at developing a heterogeneity-aware task scheduler to optimize the energy-efficiency on a given single-ISA heterogeneous multi-processor. An advanced statistical tool is employed to facilitate the algorithm development. In the third study, we shift our concentration to the device-level heterogeneity and propose to effectively leverage the advantages provided by different materials to solve the increasingly important reliability issue for future processors
- …