11 research outputs found

    A fuzzy logic based dynamic reconfiguration scheme for optimal energy and throughput in symmetric chip multiprocessors

    Get PDF
    Embedded systems architectures have traditionally often been investigated and designed in order to achieve a greater throughput combined with minimum energy consumption. With the advent of reconfigurable architectures it is now possible to support algorithms to find optimal solutions for an improved energy and throughput balance. As a result of ongoing research several online and offline techniques and algorithm have been proposed for hardware adaptation. This paper presents a novel coarse-grained reconfigurable symmetric chip multiprocessor (SCMP) architecture managed by a fuzzy logic engine that balances performance and energy consumption. The architecture incorporates reconfigurable level 1 (L1) caches, power gated cores and adaptive on-chip network routers to allow minimizing leakage energy effects for inactive components. A coarse grained architecture was selected as to be a focus for this study as it typically allows for fast reconfiguration as compared to the fine-grained architectures, thus making it more feasible to be used for runtime adaption schemes. The presented architecture is analyzed using a set of OpenMP based parallel benchmarks and the results show significant improvements in performance while maintaining minimum energy consumption

    A Fuzzy Logic Reconfiguration Engine for Symmetric Chip Multiprocessors

    Get PDF
    Recent developments in reconfigurable multiprocessor system on chip (MPSoC) have offered system designers a great amount of flexibility to exploit task concurrency with higher throughput and less energy consumption. This paper presents a novel fuzzy logic reconfiguration engine (FLRE) for coarse grain MPSoC reconfiguration that facilitates to identify an optimum balance between power and performance of the system. The FLRE is composed on two levels of abstraction layers. The system selects an optimal configuration of Level 1 / Level 2 cache size and Associativity, processor operating frequency and voltage, the number of cores based on miss rate, and energy and throughput information of the system both at core and SoC level. An 8-core symmetric chip multiprocessor has been used to evaluate the proposed scheme. The results show an overall decrease of energy consumption with not more than 30% decrease in the throughput

    A performance, energy consumption and reliability evaluation of workload distribution on heterogeneous devices

    Get PDF
    The constant need of higher performances and reduced power consumption has lead vendors to design heterogeneous devices that embed traditional Central Process Unit (CPU) and an accelerator, like a Graphics Processing Unit (GPU) or Field-programmable Gate Array (FPGA). When the CPU and the accelerator are used collaboratively the device computational performances reach their peak. However, the higher amount of resources employed for computation has, potentially, the side effect of increasing soft error rate. This thesis evaluates the reliability behaviour of AMD Kaveri Accelerated Processing Units (APU) executing four heterogeneous applications, each one representing an algorithm class. The workload is gradually distributed from the CPU to the GPU and both the energy consumption and execution time are measured. Then, an accelerated neutron beam was used to measure the realistic error rates of the different workload distributions. Finally, we evaluate which configuration provides the lowest error rate or allows the computation of the highest amount of data before experiencing a failure. As is shown in this thesis, energy consumption and execution time are mold by the same trend while error rates highly depend on algorithm class and workload distribution. Additionally, we show that, in most cases, the most reliable workload distribution is the one that delivers the highest performances. As experimentally proven, by choosing the correct workload distribution the device reliability can increase of up to 90x.A constante necessidade de maior desempenho e menor consumo de energia levou aos fabricantes a projetar dispositivos heterogêneos que incorporam uma Unidade Central de Processameno (CPU) tradicional e um acelerador, como uma Unidade de Processamento Gráfico (GPU) ou um Arranjo de Portas Programáveis em Campo (FPGA). Quando a CPU e o acelerador são usados de forma colaborativa, o desempenho computacional do dispositivo atinge seu pico. No entanto, a maior quantidade de recursos empregados para o cálculo tem, potencialmente, o efeito colateral de aumentar a taxa de erros. Esta tese avalia a confiabilidade das AMD Kaveri "Accelerated Processing Units"(APUs) executando quatro aplicações heterogêneas, cada uma representando uma classe de algoritmos. A carga de trabalho é gradualmente distribuída da CPU para a GPU e o consumo de energia e o tempo de execução são medidos. Em seguida, um feixe de neutrões é utilizado para medir as taxas de erro reais das diferentes distribuições de carga de trabalho. Por fim, avalia-se qual configuração fornece a menor taxa de erro ou permite o cálculo da maior quantidade de dados antes de ocorrer uma falha. Como é mostrado nesta tese, o consumo de energia e o tempo de execução são moldados pela mesma tendência, enquanto as taxas de erro dependem da classe de algoritmos e da distribuição da carga de trabalho. Além disso, é mostrado que, na maioria dos casos, a distribuição de carga de trabalho mais confiável é a que fornece o maior desempenho. Como comprovado experimentalmente, ao escolher a distribuição de carga de trabalho correta, a confiabilidade do dispositivo pode aumentar até 9 vezes

    Weighted Random Sampling - Alias Tables on the GPU

    Get PDF

    Shadow Price Guided Genetic Algorithms

    Get PDF
    The Genetic Algorithm (GA) is a popular global search algorithm. Although it has been used successfully in many fields, there are still performance challenges that prevent GA’s further success. The performance challenges include: difficult to reach optimal solutions for complex problems and take a very long time to solve difficult problems. This dissertation is to research new ways to improve GA’s performance on solution quality and convergence speed. The main focus is to present the concept of shadow price and propose a two-measurement GA. The new algorithm uses the fitness value to measure solutions and shadow price to evaluate components. New shadow price Guided operators are used to achieve good measurable evolutions. Simulation results have shown that the new shadow price Guided genetic algorithm (SGA) is effective in terms of performance and efficient in terms of speed

    Energy Saving in QoS Fog-supported Data Centers

    Get PDF
    One of the most important challenges that cloud providers face in the explosive growth of data is to reduce the energy consumption of their designed, modern data centers. The majority of current research focuses on energy-efficient resources management in the infrastructure as a service (IaaS) model through "resources virtualization" - virtual machines and physical machines consolidation. However, actual virtualized data centers are not supporting communication–computing intensive real-time applications, big data stream computing (info-mobility applications, real-time video co-decoding). Indeed, imposing hard-limits on the overall per-job computing-plus-communication delays forces the overall networked computing infrastructure to quickly adopt its resource utilization to the (possibly, unpredictable and abrupt) time fluctuations of the offered workload. Recently, Fog Computing centers are as promising commodities in Internet virtual computing platform that raising the energy consumption and making the critical issues on such platform. Therefore, it is expected to present some green solutions (i.e., support energy provisioning) that cover fog-supported delay-sensitive web applications. Moreover, the usage of traffic engineering-based methods dynamically keep up the number of active servers to match the current workload. Therefore, it is desirable to develop a flexible, reliable technological paradigm and resource allocation algorithm to pay attention the consumed energy. Furthermore, these algorithms could automatically adapt themselves to time-varying workloads, joint reconfiguration, and orchestration of the virtualized computing-plus-communication resources available at the computing nodes. Besides, these methods facilitate things devices to operate under real-time constraints on the allowed computing-plus-communication delay and service latency. The purpose of this thesis is: i) to propose a novel technological paradigm, the Fog of Everything (FoE) paradigm, where we detail the main building blocks and services of the corresponding technological platform and protocol stack; ii) propose a dynamic and adaptive energy-aware algorithm that models and manages virtualized networked data centers Fog Nodes (FNs), to minimize the resulting networking-plus-computing average energy consumption; and, iii) propose a novel Software-as-a-Service (SaaS) Fog Computing platform to integrate the user applications over the FoE. The emerging utilization of SaaS Fog Computing centers as an Internet virtual computing commodity is to support delay-sensitive applications. The main blocks of the virtualized Fog node, operating at the Middleware layer of the underlying protocol stack and comprises of: i) admission control of the offered input traffic; ii) balanced control and dispatching of the admitted workload; iii) dynamic reconfiguration and consolidation of the Dynamic Voltage and Frequency Scaling (DVFS)-enabled Virtual Machines (VMs) instantiated onto the parallel computing platform; and, iv) rate control of the traffic injected into the TCP/IP connection. The salient features of this algorithm are that: i) it is adaptive and admits distributed scalable implementation; ii) it has the capacity to provide hard QoS guarantees, in terms of minimum/maximum instantaneous rate of the traffic delivered to the client, instantaneous goodput and total processing delay; and, iii) it explicitly accounts for the dynamic interaction between computing and networking resources in order to maximize the resulting energy efficiency. Actual performance of the proposed scheduler in the presence of: i) client mobility; ii) wireless fading; iii) reconfiguration and two-thresholds consolidation costs of the underlying networked computing platform; and, iv) abrupt changes of the transport quality of the available TCP/IP mobile connection, is numerically tested and compared to the corresponding ones of some state-of-the-art static schedulers, under both synthetically generated and measured real-world workload traces
    corecore