89 research outputs found

    Task Oriented Programming for the RC64 Manycore DSP

    Get PDF
    RC64 is a rad-hard manycore DSP combining 64 VLIW/SIMD DSP cores, lock-free shared memory, a hardware scheduler and a task-based programming model. The hardware scheduler enables fast scheduling and allocation of fine grain tasks to all cores. Parallel programming is based on Tasks

    A Simple MPI Library for Lightweight Manycore Processors

    Get PDF
    TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.Nas últimas décadas, melhorar o desempenho de núcleos individuais e aumentar o nú- mero de núcleos de alta potência por chip foram as principais tendências na construção de processadores. No entanto, esta combinação levou não apenas a um aumento no poder computacional, mas também a um aumento considerável no seu consumo de energia. Há uma preocupação crescente entre a comunidade científica a respeito da eficiência ener- gética dos supercomputadores modernos. Nos últimos anos, muitos esforços têm sido feitos em pesquisas, buscando soluções alternativas capazes de resolver este problema de escalabilidade e eficiência energética. O desempenho e a eficiência energética providos pelos manycores leves são inegáveis. Contudo, a falta de suporte avançado e portátil para esses processadores, como interfaces padrão de alto desempenho para o desenvolvi- mento de código portável, torna o desenvolvimento de software um desafio. Atualmente, duas abordagens são empregadas tentando aumentar a programabilidade em manycores leves: Sistemas operacionais (SOs) e sistemas de execução (runtimes). A primeira fornece portabilidade mas expõe interfaces de programação complexas no nível do SO aos desen- volvedores. Já a segunda se concentra em fornecer interfaces ricas e de alto desempenho, as quais são específicas do fabricante e resultam em software não portável. Portanto, as soluções existentes forçam os desenvolvedores a escolher entre a portabilidade do software ou um processo de desenvolvimento mais rápido. Para resolver esse dilema, neste traba- lho é proposta uma biblioteca MPI leve e portável (LWMPI) projetada do zero para lidar com as restrições e complexidades dos manycores leves. A LWMPI foi integrada a um SO direcionado a esses processadores, oferecendo assim uma melhor programabilidade e portabilidade implícita para manycores leves, sem incorrer em sobrecargas de desempe- nho excessivas que inviabilizariam o seu uso. Para fornecer uma avaliação abrangente da LWMPI, foram utilizadas três aplicações de uma suíte de benchmarking representativa, usada para avaliar o desempenho de manycores leves, além de um benchmark sintético. Os resultados obtidos no processador Kalray MPPA-256 revelaram que a LWMPI atinge uma performance e uma escalabilidade de desempenho melhor do que uma solução feita especificamente para essa análise e que se utiliza puramente das abstrações de IPC do Nanvix, ao mesmo tempo em que oferece uma interface de programação mais rica.In the last decades, improving the performance of individual cores and increasing the number of high power cores per chip were the main trends in the construction of proces- sors. However, this combination led not only to an increase in the computing capacity, but also to a considerable growth in energy consumption. There is a crescent concern among the scientific community about the energy efficiency of modern supercomputers. In the last years, many efforts have been made in research, searching for alternative solutions capable of solving this problem of scalability and energy efficiency. The performance and energy efficiency provided by lightweight manycores is undeniable. Although, the lack of rich and portable support for these processors, such as high-performance standard inter- faces that deliver portable source codes, makes software development a challenging task. Currently, two approaches are employed trying to improve programmability in lightweight manycores: Operating Systems (OSes) and baremetal runtime systems. The former pro- vides portability but exposes complex OS-level programming interfaces to developers. The latter focuses on providing rich and high performance interfaces, which are vendor- specific and yield to non-portable software. Thus, the existing solutions force software engineers to choose between software portability or a faster development process. To address this dilemma, we propose a portable and lightweight MPI library (LWMPI) de- signed from scratch to cope with restrictions and intricacies of lightweight manycores. We integrated LWMPI into a distributed OS that targets these processors, thus featuring bet- ter programmability and implicit portability for lightweight manycores, without incurring excessive performance overheads that could hinder its use. To deliver a comprehensive evaluation of LWMPI, we relied on three applications from a representative benchmark suite used to assess the performance of lightweight manycores, and a synthetic benchmark. Our results obtained on the Kalray MPPA-256 processor unveiled that LWMPI present better performance and scalability when compared with a specifically made solution that uses the raw Nanvix Inter-Process Communication (IPC) abstractions, while exposing a richer programming interface

    Learning-based run-time power and energy management of multi/many-core systems: current and future trends

    Get PDF
    Multi/Many-core systems are prevalent in several application domains targeting different scales of computing such as embedded and cloud computing. These systems are able to fulfil the everincreasing performance requirements by exploiting their parallel processing capabilities. However, effective power/energy management is required during system operations due to several reasons such as to increase the operational time of battery operated systems, reduce the energy cost of datacenters, and improve thermal efficiency and reliability. This article provides an extensive survey of learning-based run-time power/energy management approaches. The survey includes a taxonomy of the learning-based approaches. These approaches perform design-time and/or run-time power/energy management by employing some learning principles such as reinforcement learning. The survey also highlights the trends followed by the learning-based run-time power management approaches, their upcoming trends and open research challenges

    Inter-cluster Thread-to-core Mapping and DVFS on Heterogeneous Multi-cores

    Get PDF
    Heterogeneous multi-core platforms that contain different types of cores, organized as clusters, are emerging, e.g. ARM's big.LITTLE architecture. These platforms often need to deal with multiple applications, having different performance requirements, executing concurrently. This leads to generation of varying and mixed workloads (e.g. compute and memory intensive) due to resource sharing. Run-time management is required for adapting to such performance requirements and workload variabilities and to achieve energy efficiency. Moreover, the management becomes challenging when the applications are multi-threaded and the heterogeneity needs to be exploited. The existing run-time management approaches do not efficiently exploit cores situated in different clusters simultaneously (referred to as inter-cluster exploitation) and DVFS potential of cores, which is the aim of this paper. Such exploitation might help to satisfy the performance requirement while achieving energy savings at the same time. Therefore, in this paper, we propose a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization, without trading-off application performance. For thread-to-core mapping, offline profiled results are used, which contain performance and energy characteristics of applications when executed on the heterogeneous platform by using different types of cores in various possible combinations. For an application, thread-to-core mapping process defines the number of used cores and their type, which are situated in different clusters. The online adaptation process classifies the inherent workload characteristics of concurrently executing applications, incurring a lower overhead than existing learning-based approaches as demonstrated in this paper. The classification of workload is performed using the metric Memory Reads Per Instruction (MRPI). The adaptation process pro-actively selects an appropriate V-f pair for a predicted workload. Subsequently, it monitors the workload prediction error and performance loss, quantified by instructions per second (IPS), and adjusts the chosen V-f to compensate. We validate the proposed run-time management approach on a hardware platform, the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches while meeting the performance requirements

    Computing and communications for the software-defined metamaterial paradigm: a context analysis

    Get PDF
    Metamaterials are artificial structures that have recently enabled the realization of novel electromagnetic components with engineered and even unnatural functionalities. Existing metamaterials are specifically designed for a single application working under preset conditions (e.g., electromagnetic cloaking for a fixed angle of incidence) and cannot be reused. Software-defined metamaterials (SDMs) are a much sought-after paradigm shift, exhibiting electromagnetic properties that can be reconfigured at runtime using a set of software primitives. To enable this new technology, SDMs require the integration of a network of controllers within the structure of the metamaterial, where each controller interacts locally and communicates globally to obtain the programmed behavior. The design approach for such controllers and the interconnection network, however, remains unclear due to the unique combination of constraints and requirements of the scenario. To bridge this gap, this paper aims to provide a context analysis from the computation and communication perspectives. Then, analogies are drawn between the SDM scenario and other applications both at the micro and nano scales, identifying possible candidates for the implementation of the controllers and the intra-SDM network. Finally, the main challenges of SDMs related to computing and communications are outlined.Peer ReviewedPostprint (published version

    Combined on-line lifetime-energy optimization for asymmetric multicores

    Get PDF
    In this paper we present an architectural and on-line resource management solution to optimize lifetime reliability of asymmetric multicores while minimizing the system energy consumption, targeting both single nodes (multicores) as well as multiple ones (cluster of multicores). The solution exploits the different characteristics of the computing resources to achieve the desired performance while optimizing the lifetime/energy trade-off. The experimental results show that a combined optimization of energy and lifetime allows for achieving an extended lifetime (similar to the one pursued by lifetime-only optimization solutions) with a marginal energy consumption detriment (less than 2%) with respect to energy-aware but aging-unaware systems

    Dynamic Energy and Thermal Management of Multi-Core Mobile Platforms: A Survey

    Get PDF
    Multi-core mobile platforms are on rise as they enable efficient parallel processing to meet ever-increasing performance requirements. However, since these platforms need to cater for increasingly dynamic workloads, efficient dynamic resource management is desired mainly to enhance the energy and thermal efficiency for better user experience with increased operational time and lifetime of mobile devices. This article provides a survey of dynamic energy and thermal management approaches for multi-core mobile platforms. These approaches do either proactive or reactive management. The upcoming trends and open challenges are also discussed

    Power, Energy, and Thermal Management for Clustered Manycores

    Get PDF
    Efficient and effective system-level power, energy, and thermal management are very important issues in modern computing systems, for which clustered architectures with multiple voltage islands are an expected compromise between global and per-core DVFS. In this dissertation, we focus on two of the most relevant problems for such architectures, specifically, optimizing performance under power/thermal constraints, and minimizing energy under performance constraints
    corecore