82 research outputs found

    Multiprocessor System-on-Chips based Wireless Sensor Network Energy Optimization

    Get PDF
    Wireless Sensor Network (WSN) is an integrated part of the Internet-of-Things (IoT) used to monitor the physical or environmental conditions without human intervention. In WSN one of the major challenges is energy consumption reduction both at the sensor nodes and network levels. High energy consumption not only causes an increased carbon footprint but also limits the lifetime (LT) of the network. Network-on-Chip (NoC) based Multiprocessor System-on-Chips (MPSoCs) are becoming the de-facto computing platform for computationally extensive real-time applications in IoT due to their high performance and exceptional quality-of-service. In this thesis a task scheduling problem is investigated using MPSoCs architecture for tasks with precedence and deadline constraints in order to minimize the processing energy consumption while guaranteeing the timing constraints. Moreover, energy-aware nodes clustering is also performed to reduce the transmission energy consumption of the sensor nodes. Three distinct problems for energy optimization are investigated given as follows: First, a contention-aware energy-efficient static scheduling using NoC based heterogeneous MPSoC is performed for real-time tasks with an individual deadline and precedence constraints. An offline meta-heuristic based contention-aware energy-efficient task scheduling is developed that performs task ordering, mapping, and voltage assignment in an integrated manner. Compared to state-of-the-art scheduling our proposed algorithm significantly improves the energy-efficiency. Second, an energy-aware scheduling is investigated for a set of tasks with precedence constraints deploying Voltage Frequency Island (VFI) based heterogeneous NoC-MPSoCs. A novel population based algorithm called ARSH-FATI is developed that can dynamically switch between explorative and exploitative search modes at run-time. ARSH-FATI performance is superior to the existing task schedulers developed for homogeneous VFI-NoC-MPSoCs. Third, the transmission energy consumption of the sensor nodes in WSN is reduced by developing ARSH-FATI based Cluster Head Selection (ARSH-FATI-CHS) algorithm integrated with a heuristic called Novel Ranked Based Clustering (NRC). In cluster formation parameters such as residual energy, distance parameters, and workload on CHs are considered to improve LT of the network. The results prove that ARSH-FATI-CHS outperforms other state-of-the-art clustering algorithms in terms of LT.University of Derby, Derby, U

    A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics

    Get PDF
    There is no doubt that we are entering the era of big data. The challenge is on how to store, search, and analyze the huge amount of data that is being generated per second. One of the main obstacles to the big data researchers is how to find the appropriate big data analysis platform. The basic aim of this work is to present a complete investigation of all the available platforms for big data analysis in terms of vertical and horizontal scaling, and its compatible framework and applications in detail. Finally, this article will outline some research trends and other open issues in big data analytic

    High Performance Embedded Computing

    Get PDF
    Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platforms Programming models Mapping and scheduling of parallel computations Timing and schedulability analysis Runtimes and operating systemsThe work reflected in this book was done in the scope of the European project P SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things

    On the Design of Future Communication Systems with Coded Transport, Storage, and Computing

    Get PDF
    Communication systems are experiencing a fundamental change. There are novel applications that require an increased performance not only of throughput but also latency, reliability, security, and heterogeneity support from these systems. To fulfil the requirements, future systems understand communication not only as the transport of bits but also as their storage, processing, and relation. In these systems, every network node has transport storage and computing resources that the network operator and its users can exploit through virtualisation and softwarisation of the resources. It is within this context that this work presents its results. We proposed distributed coded approaches to improve communication systems. Our results improve the reliability and latency performance of the transport of information. They also increase the reliability, flexibility, and throughput of storage applications. Furthermore, based on the lessons that coded approaches improve the transport and storage performance of communication systems, we propose a distributed coded approach for the computing of novel in-network applications such as the steering and control of cyber-physical systems. Our proposed approach can increase the reliability and latency performance of distributed in-network computing in the presence of errors, erasures, and attackers

    High-Performance and Time-Predictable Embedded Computing

    Get PDF
    Nowadays, the prevalence of computing systems in our lives is so ubiquitous that we live in a cyber-physical world dominated by computer systems, from pacemakers to cars and airplanes. These systems demand for more computational performance to process large amounts of data from multiple data sources with guaranteed processing times. Actuating outside of the required timing bounds may cause the failure of the system, being vital for systems like planes, cars, business monitoring, e-trading, etc. High-Performance and Time-Predictable Embedded Computing presents recent advances in software architecture and tools to support such complex systems, enabling the design of embedded computing devices which are able to deliver high-performance whilst guaranteeing the application required timing bounds. Technical topics discussed in the book include: Parallel embedded platforms Programming models Mapping and scheduling of parallel computations Timing and schedulability analysis Runtimes and operating systems The work reflected in this book was done in the scope of the European project P SOCRATES, funded under the FP7 framework program of the European Commission. High-performance and time-predictable embedded computing is ideal for personnel in computer/communication/embedded industries as well as academic staff and master/research students in computer science, embedded systems, cyber-physical systems and internet-of-things.info:eu-repo/semantics/publishedVersio

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    CPU Utilization Improvement of Multiple-Core Processors Through Cache Management and Task Scheduling

    Get PDF
    RÉSUMÉ De nos jours, les architectures multicœurs et multiprocesseurs sont largement utilisées dans les centres de données. Une telle utilisation fournit les performances requises pour diverses tâches, telles que le C-RAN(accès radio par info-nuagique). Le traitement du signal sans fil en bande de base(wireless baseband) pour les normes 4G et 5G désigne un ensemble de tâches qui doivent être exécutées dans un intervalle de temps spécifique. Par exemple, la pile de liaison montante(up-link stack) pour une station de base 4G virtualisée a été décomposée en plus de 1000 tâches exécutables en 5 ms. Avec la 5G, la latence cible dans un scénario de bout en bout avec utilisation à latence très faible(ultra-low latency) est de 1 ms, tandis que la complexité de calcul est d’un à deux ordres de grandeur plus élevée que celle de la 4G. Le défi à surmonter c’est de répondre à ces objectifs en terme de complexité de traitement. Pour ce faire, il est crucial de caractériser la variabilité du temps de traitement par rapport aux caractéristiques du modèle de mémoire afin de garantir un temps de traitement donné dans les grappes d’ordinateurs classiques.En outre, la planification des tâches sur les systèmes multicœurs reste un problème ouvert. Un tel problème doit être analysé afin d’utiliser pleinement la capacité de traitement d’un système multicoeur d’un système multicoeur et d’atteindre une faible latence. Afin de remédier à l’utilisation ineÿcace des cœurs de processeur, un schéma d’ordonnancement des tâches basé sur la mise en file d’attente, qui se focalise sur le calcul parallèle local, est proposé. Dans cette mémoire, on introduit la gestion multi-files pour la planification dynamique des tâches afin de cibler une utilisation complète à 100% des cœurs de CPU locaux pour des tâches d’entrée suÿsantes. Plusieurs simulations sont faites pour vérifier le schéma de planification des tâches proposé. Les résultats rapportés confirment sa viabilité et son efficacité. De plus, l’utilisation de la mémoire cache est l’une des principales sources de variabilité du temps d’exécution. En outre, une gestion ineÿcace de la mémoire cache s’avère probléma-tique dans les systèmes avec WCET. Une approche eÿcace de gestion de la cache doit prendre en compte simultanément la planification des tâches et la gestion de la cache. L’approche optimale de gestion de la cache oblige de manière critique à prendre en compte les priorités associées à toutes les tâches; la connaissance de ces priorités est essentielle pour détecter et éviter les goulots d’étranglement dans le système.----------ABSTRACT Nowadays, modern multiprocessor and multicore architectures are widely used in data centers. Such usage provides the required performance for a variety of tasks, such as the C-RAN(Cloud-Radio-Access-Network). Wireless baseband signal processing for the 4G and 5G standards designates a range of tasks which must be executed in a specific time slot. For instance, the up-link stack of one 4G virtualized-base station was decomposed in more than 1000 tasks executable within 5ms. In 5G, the expected target latency for ultra-low latency use cases is 1ms in an end-to-end scenario; while the computational complexity is expected to be one to two orders of magnitude higher than that of 4G. It remains to be seen whether and how reaching such computational complexity is feasible. It is a crucial factor to characterize processing time variability besides features of memory model to guarantee a given processing time in mainstream computer clusters.Besides, the task scheduling on multicore systems is still an open issue. Such a problem needs to be analyzed in order to fully utilize the processing capacity and to achieve low processing latency. In order to tackle the inefficient utilization of CPU cores, a queueing-based data-driven task scheduling scheme, which focuses on local parallel computing, is proposed in this thesis. This thesis introduces multi-queue management for dynamic task scheduling to target 100% utilization of local CPU cores for sufficient input tasks. Finally, the thesis entails several simulations to verify the proposed task scheduling scheme. The reported results confirm its viability and efficiency. Moreover, cache memory usage is one of the primary sources of execution time variability. Besides, inefficient management of cache memory proves to be problematic in systems with which WCET(Worst-Case-Execution-Time) is of concern. An efficient cache managing approach needs to take both task scheduling and cache management into account simultaneously. Optimal cache-management imposes considering priorities associating with all tasks; the knowledge of such priorities is essential for detecting and avoiding system bottlenecks. Such approach proposes allocating adequate resources to such a critical task to facilitate better management. The work starts with the introduction of a simple, scalable, and configurable test method called an Array of Counters, the purpose of which is to characterize the processing time variations of multicore architectures. The technique helps to find system bottlenecks. Such help is conducive to a more optimized and enhanced cache-management algorithm

    Compilers for portable programming of heterogeneous parallel & approximate computing systems

    Get PDF
    Programming heterogeneous systems such as the System-on-chip (SoC) processors in modern mobile devices can be extremely complex because a single system may include multiple different parallelism models, instruction sets, memory hierarchies, and systems use different combinations of these features. This is further complicated by software and hardware approximate computing optimizations. Different compute units on an SoC use different approximate computing methods and an application would usually be composed of multiple compute kernels, each one specialized to run on a different hardware. Determining how best to map such an application to a modern heterogeneous system is an open research problem. First, we propose a parallel abstraction of heterogeneous hardware that is a carefully chosen combination of well-known parallel models and is able to capture the parallelism in a wide range of popular parallel hardware. This abstraction uses a hierarchical dataflow graph with side effects and vector SIMD instructions. We use this abstraction to define a parallel program representation called HPVM that aims to address both functional portability and performance portability across heterogeneous systems. Second, we further extend HPVM representation to enable accuracy-aware performance and energy tuning on heterogeneous systems with multiple compute units and approximation methods. We call it ApproxHPVM, and it automatically translates end-to-end application-level accuracy constraints into accuracy requirements for individual operations. ApproxHPVM uses a hardware-agnostic accuracy-tuning phase to do this translation, which greatly speeds up the analysis, enables greater portability, and enables future capabilities like accuracy-aware dynamic scheduling and design space exploration. We have implemented a prototype HPVM system, defining the HPVM IR as an extension of the LLVM compiler IR, compiler optimizations that operate directly on HPVM graphs, and code generators that translate the virtual ISA to NVIDIA GPUs, Intel’s AVX vector units, and to multicore X86-64 processors. Experimental results show that HPVM optimizations achieve significant performance improvements, HPVM translators achieve performance competitive with manually developed OpenCL code for both GPUs and vector hardware, and that runtime scheduling policies can make use of both program and runtime information to exploit the flexible compilation capabilities. Furthermore, our evaluation of ApproxHPVM shows that our framework can offload chunks of approximable computations to special purpose accelerators that provide significant gains in performance and energy, while staying within a user-specified application-level accuracy constraint with high probability
    • …
    corecore