Search CORE

20 research outputs found

Multiprocessor System-on-Chips based Wireless Sensor Network Energy Optimization

Author: Ali Haider
Publication venue: Department of Electronics, Computing and Mathematics
Publication date: 01/01/2020
Field of study

Wireless Sensor Network (WSN) is an integrated part of the Internet-of-Things (IoT) used to monitor the physical or environmental conditions without human intervention. In WSN one of the major challenges is energy consumption reduction both at the sensor nodes and network levels. High energy consumption not only causes an increased carbon footprint but also limits the lifetime (LT) of the network. Network-on-Chip (NoC) based Multiprocessor System-on-Chips (MPSoCs) are becoming the de-facto computing platform for computationally extensive real-time applications in IoT due to their high performance and exceptional quality-of-service. In this thesis a task scheduling problem is investigated using MPSoCs architecture for tasks with precedence and deadline constraints in order to minimize the processing energy consumption while guaranteeing the timing constraints. Moreover, energy-aware nodes clustering is also performed to reduce the transmission energy consumption of the sensor nodes. Three distinct problems for energy optimization are investigated given as follows: First, a contention-aware energy-efficient static scheduling using NoC based heterogeneous MPSoC is performed for real-time tasks with an individual deadline and precedence constraints. An offline meta-heuristic based contention-aware energy-efficient task scheduling is developed that performs task ordering, mapping, and voltage assignment in an integrated manner. Compared to state-of-the-art scheduling our proposed algorithm significantly improves the energy-efficiency. Second, an energy-aware scheduling is investigated for a set of tasks with precedence constraints deploying Voltage Frequency Island (VFI) based heterogeneous NoC-MPSoCs. A novel population based algorithm called ARSH-FATI is developed that can dynamically switch between explorative and exploitative search modes at run-time. ARSH-FATI performance is superior to the existing task schedulers developed for homogeneous VFI-NoC-MPSoCs. Third, the transmission energy consumption of the sensor nodes in WSN is reduced by developing ARSH-FATI based Cluster Head Selection (ARSH-FATI-CHS) algorithm integrated with a heuristic called Novel Ranked Based Clustering (NRC). In cluster formation parameters such as residual energy, distance parameters, and workload on CHs are considered to improve LT of the network. The results prove that ARSH-FATI-CHS outperforms other state-of-the-art clustering algorithms in terms of LT.University of Derby, Derby, U

UDORA - University of Derby Online Research Archive

Energy-aware scheduling of streaming applications on edge-devices in IoT based healthcare

Author: Ahmed Waqar
Ali Haider
Hardy James
Kazim Muhammad
Liu Lu
Tariq Umair Ullah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The reliance on Network-on-Chip (NoC) based Multiprocessor Systems-on-Chips (MPSoCs) is proliferating in modern embedded systems to satisfy the higher performance requirement of multimedia streaming applications. Task level coarse grained software pipeling also called re-timing when combined with Dynamic Voltage and Frequency Scaling (DVFS) has shown to be an effective approach in significantly reducing energy consumption of the multiprocessor systems at the expense of additional delay. In this paper we develop a novel energy-aware scheduler considering tasks with conditional constraints on Voltage Frequency Island (VFI) based heterogeneous NoC-MPSoCs deploying re-timing integrated with DVFS for real-time streaming applications. We propose a novel task level re-timing approach called R-CTG and integrate it with non linear programming based scheduling and voltage scaling approach referred to as ALI-EBAD. The R-CTG approach aims to minimize the latency caused by re-timing without compromising on energy-efficiency. Compared to R-DAG, the state-of-the-art approach designed for traditional Directed Acyclic Graph (DAG) based task graphs, R-CTG significantly reduces the re-timing latency because it only re-times tasks that free up the wasted slack. To validate our claims we performed experiments on using 12 real benchmarks, the results demonstrate that ALI-EBAD out performs CA-TMES-Search and CA-TMES-Quick task schedulers in terms of energy-efficiency.N/

Crossref

De Montfort University Open Research Archive

aCQUIRe

UDORA - University of Derby Online Research Archive

Leicester Research Archive

Energy-aware Successor Tree Consistent EDF Scheduling for PCTGs on MPSoCs

Author: Ali H.
Grandhi S.
Jan S.R.
Liu L.
Nadeem M.S.
Sabrina F.
Tariq UU.
Wang Z.
Publication venue: IEEE Xplore
Publication date: 01/01/2024
Field of study

Multiprocessor System-on-Chips (MPSoCs) computing architectures are gaining popularity due to their high-performance capabilities and exceptional Quality-of-Service (QoS), making them a particularly well-suited computing platform for computationally intensive workloads and applications.} Nonetheless, The scheduling and allocation of a single task set with precedence restrictions on MPSoCs have presented a persistent research challenge in acquiring energy-efficient solutions. The complexity of this scheduling problem escalates when subject to conditional precedence constraints between the tasks, creating what is known as a Conditional Task Graph (CTG). Scheduling sets of Periodic Conditional Task Graphs (PCTGs) on MPSoC platforms poses even more challenges. This paper focuses on tackling the scheduling challenge for a group of PCTGs on MPSoCs equipped with shared memory. The primary goal is to minimize the overall anticipated energy usage, considering two distinct power models: dynamic and static power models. To address this challenge, this paper introduces an innovative scheduling method named Energy Efficient Successor Tree Consistent Earliest Deadline First (EESEDF). The EESEDF approach is primarily designed to maximize the worst-case processor utilization. Once the tasks are assigned to processors, it leverages the earliest successor tree consistent deadline-first strategy to arrange tasks on each processor. To minimize the overall expected energy consumption, EESEDF solves a convex Non-Linear Program (NLP) to determine the optimal speed for each task. Additionally, the paper presents a highly efficient online Dynamic Voltage Scaling (DVS) heuristic, which operates in O(1) time complexity and dynamically adjusts the task speeds in real-time}. We achieved the average improvement, maximum improvement, and minimum improvement of EESEDF+Online-DVS 15%, 17%, and 12%, respectively compared to EESEDF alone. Furthermore, in the second set of experiments, we compared EESEDF against state-of-the-art techniques LESA and NCM. The results showed that EESEDF+Online-DVS outperformed these existing approaches, achieving notable energy efficiency improvements of 25% and 20% over LESA and NCM, respectively. \hl{Our proposed scheduler, EESEDF+Online-DVS, also achieves significant energy efficiency gains compared to existing methods. It outperforms IOETCS-Heuristic by approximately 13% while surpassing BESS and CAP-Online by impressive margins of 25% and 35%, respectively

UDORA - University of Derby Online Research Archive

Energy-efficient Static Task Scheduling on VFI based NoC-HMPSoCs for Intelligent Edge Devices in Cyber-Physical Systems

Author: Ali Haider
Liu Lu
Panneerselvam John
Ullah Tariq Umair
Zhai Xiaojun
Publication venue: Association for Computing Machinery
Publication date: 01/01/2019
Field of study

The interlinked processing units in the modern Cyber-Physical Systems (CPS) creates a large network of connected computing embedded systems. Network-on-Chip (NoC) based multiprocessor system-on-chip (MPSoC) architecture is becoming a de-facto computing platform for real-time applications due to its higher performance and Quality-of-Service (QoS). The number of processors has increased significantly on the multiprocessor systems in CPS therefore, Voltage Frequency Island (VFI) recently adopted for effective energy management mechanism in the large scale multiprocessor chip designs. In this paper, we investigate energy and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI based NoC-MPSoCs with DVFS-enabled processors. Unlike the existing population-based optimization algorithms, we propose a novel population-based algorithm called ARSH-FATI that can dynamically switch between explorative and exploitative search modes at run-time. Our static scheduler ARHS-FATI collectively performs task mapping, scheduling, and voltage scaling. Consequently, its performance is superior to the existing state-of-the-art approach proposed for homogeneous VFI based NoC-MPSoCs. We also developed a communication contention-aware Earliest Edge Consistent Deadline First (EECDF) scheduling algorithm and gradient descent inspired voltage scaling algorithm called Energy Gradient Decent (EGD). We have introduced a notion of Energy Gradient (EG) that guides EGD in its search for islands voltage settings and minimize the total energy consumption. We conducted the experiments on 8 real benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S). Our static scheduling approach ARSH-FATI outperformed state-of-the-art technique and achieved an average energy-efficiency of ~ 24% and ~ 30% over CA-TMES-Search and CA-TMES-Quick respectively

University of Essex Research Repository

aCQUIRe

ACQUIRE

Leicester Research Archive

Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

Author: Goens Jokisch Andres Wilhelm
Publication venue
Publication date: 14/05/2021
Field of study

Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis. We focus on Kahn Process Networks and dataﬂow applications as abstractions, both for programming and for deriving an eﬃcient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general. This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataﬂow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking. We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a ﬁrst-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataﬂow programming, we also focus on programmability. The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures

Technische Universität Dresden: Qucosa

Mixed Criticality Systems - A Review : (13th Edition, February 2022)

Author: Burns Alan
Davis Robert Ian
Publication venue
Publication date: 15/02/2022
Field of study

This review covers research on the topic of mixed criticality systems that has been published since Vestal’s 2007 paper. It covers the period up to end of 2021. The review is organised into the following topics: introduction and motivation, models, single processor analysis (including job-based, hard and soft tasks, fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, related topics, realistic models, formal treatments, systems issues, industrial practice and research beyond mixed-criticality. A list of PhDs awarded for research relating to mixed-criticality systems is also included

White Rose Research Online

Turku Centre for Computer Science – Annual Report 2013

Author: Ilona Tuominen
Ion Petre
Irmeli Laine
Johan Lilius
Outi Tuohi
Tomi Mäntylä
Publication venue: Society of Social and Economic Research in the Universities of Turku
Publication date: 28/10/2022
Field of study

Due to a major reform of organization and responsibilities of TUCS, its role, activities, and even structures have been under reconsideration in 2013. The traditional pillar of collaboration at TUCS, doctoral training, was reorganized due to changes at both universities according to the renewed national system for doctoral education. Computer Science and Engineering and Information Systems Science are now accompanied by Mathematics and Statistics in newly established doctoral programs at both University of Turku and Åbo Akademi University. Moreover, both universities granted sufficient resources to their respective programmes for doctoral training in these fields, so that joint activities at TUCS can continue. The outcome of this reorganization has the potential of proving out to be a success in terms of scientific profile as well as the quality and quantity of scientific and educational results.  International activities that have been characteristic to TUCS since its inception continue strong. TUCS’ participation in European collaboration through EIT ICT Labs Master’s and Doctoral School is now more active than ever. The new double degree programs at MSc and PhD level between University of Turku and Fudan University in Shaghai, P.R.China were succesfully set up and are  now running for their first year. The joint students will add to the already international athmosphere of the ICT House.  The four new thematic reseach programmes set up acccording to the decision by the TUCS Board have now established themselves, and a number of events and other activities saw the light in 2013. The TUCS Distinguished Lecture Series managed to gather a large audience with its several prominent speakers. The development of these and other research centre activities continue, and  new practices and structures will be initiated to support the tradition of close academic collaboration.  The TUCS’ slogan Where Academic Tradition Meets the Exciting Future has proven true throughout these changes. Despite of the dark clouds on the national and European economic sky, science and higher education in the field have managed to retain all the key ingredients for success. Indeed, the future of ICT and Mathematics in Turku seems exciting.</p

UTUPub

Overview of Intelligent Signal Processing Systems

Author: Chris Gwo Giun Lee
Kun-Chih (Jimmy) Chen
Wen-Hsiao Peng
Publication venue: Now Publishers
Publication date: 01/01/2023
Field of study

Directory of Open Access Journals

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Author: Bussolino B.
Capra M.
Marchisio A.
Martina M.
Masera G.
Shafique M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute-and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Power and Energy Aware Heterogeneous Computing Platform

Author: Nouri Sajjad
Publication venue: Tampere University of Technology
Publication date: 01/01/2018
Field of study

During the last decade, wireless technologies have experienced signiﬁcant development, most notably in the form of mobile cellular radio evolution from GSM to UMTS/HSPA and thereon to Long-Term Evolution (LTE) for increasing the capacity and speed of wireless data networks. Considering the real-time constraints of the new wireless standards and their demands for parallel processing, reconﬁgurable architectures and in particular, multicore platforms are part of the most successful platforms due to providing high computational parallelism and throughput. In addition to that, by moving toward Internet-of-Things (IoT), the number of wireless sensors and IP-based high throughput network routers is growing at a rapid pace. Despite all the progression in IoT, due to power and energy consumption, a single chip platform for providing multiple communication standards and a large processing bandwidth is still missing.The strong demand for performing different sets of operations by the embedded systems and increasing the computational performance has led to the use of heterogeneous multicore architectures with the help of accelerators for computationally-intensive data-parallel tasks acting as coprocessors. Currently, highly heterogeneous systems are the most power-area efﬁcient solution for performing complex signal processing systems. Additionally, the importance of IoT has increased signiﬁcantly the need for heterogeneous and reconﬁgurable platforms.On the other hand, subsequent to the breakdown of the Dennardian scaling and due to the enormous heat dissipation, the performance of a single chip was obstructed by the utilization wall since all cores cannot be clocked at their maximum operating frequency. Therefore, a thermal melt-down might be happened as a result of high instantaneous power dissipation. In this context, a large fraction of the chip, which is switched-off (Dark) or operated at a very low frequency (Dim) is called Dark Silicon. The Dark Silicon issue is a constraint for the performance of computers, especially when the up-coming IoT scenario will demand a very high performance level with high energy efﬁciency. Among the suggested solution to combat the problem of Dark-Silicon, the use of application-speciﬁc accelerators and in particular Coarse-Grained Reconﬁgurable Arrays (CGRAs) are the main motivation of this thesis work.This thesis deals with design and implementation of Software Deﬁned Radio (SDR) as well as High Efﬁciency Video Coding (HEVC) application-speciﬁc accelerators for computationally intensive kernels and data-parallel tasks. One of the most important data transmission schemes in SDR due to its ability of providing high data rates is Orthogonal Frequency Division Multiplexing (OFDM). This research work focuses on the evaluation of Heterogeneous Accelerator-Rich Platform (HARP) by implementing OFDM receiver blocks as designs for proof-of-concept. The HARP template allows the designer to instantiate a heterogeneous reconﬁgurable platform with a very large amount of custom-tailored computational resources while delivering a high performance in terms of many high-level metrics. The availability of this platform lays an excellent foundation to investigate techniques and methods to replace the Dark or Dim part of chip with high-performance silicon dissipating very low power and energy. Furthermore, this research work is also addressing the power and energy issues of the embedded computing systems by tailoring the HARP for self-aware and energy-aware computing models. In this context, the instantaneous power dissipation and therefore the heat dissipation of HARP are mitigated on FPGA/ASIC by using Dynamic Voltage and Frequency Scaling (DVFS) to minimize the dark/dim part of the chip. Upgraded HARP for self-aware and energy-aware computing can be utilized as an energy-efﬁcient general-purpose transceiver platform that is cognitive to many radio standards and can provide high throughput while consuming as little energy as possible. The evaluation of HARP has shown promising results, which makes it a suitable platform for avoiding Dark Silicon in embedded computing platforms and also for diverse needs of IoT communications.In this thesis, the author designed the blocks of OFDM receiver by crafting templatebased CGRA devices and then attached them to HARP’s Network-on-Chip (NoC) nodes. The performance of application-speciﬁc accelerators generated from templatebased CGRAs, the performance of the entire platform subsequent to integrating the CGRA nodes on HARP and the NoC trafﬁc are recorded in terms of several highlevel performance metrics. In evaluating HARP on FPGA prototype, it delivers a performance of 0.012 GOPS/mW. Because of the scalability and regularity in HARP, the author considered its value as architectural constant. In addition to showing the gain and the beneﬁts of maximizing the number of reconﬁgurable processing resources on a platform in comparison to the scaled performance of several state-of-the-art platforms, HARP’s architectural constant ensures application-independent ﬁgure of merit. HARP is further evaluated by implementing various sizes of Discrete Cosine transform (DCT) and Discrete Sine Transform (DST) dedicated for HEVC standard, which showed its ability to sustain Full HD 1080p format at 30 fps on FPGA. The author also integrated self-aware computing model in HARP to mitigate the power dissipation of an OFDM receiver. In the case of FPGA implementation, the total power dissipation of the platform showed 16.8% reduction due to employing the Feedback Control System (FCS) technique with Dynamic Frequency Scaling (DFS). Furthermore, by moving to ASIC technology and scaling both frequency and voltage simultaneously, signiﬁcant dynamic power reduction (up to 82.98%) was achieved, which proved the DFS/DVFS techniques as one step forward to mitigate the Dark Silicon issue

Trepo - Institutional Repository of Tampere University