12 research outputs found

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    Improvement Energy Efficiency for a Hybrid Multibank Memory in Energy Critical Applications

    Get PDF
    High performance, low power multiprocessor/multibank memory system requires a compiler that provides efficient data partitioning and mapping procedures. This paper introduced two compiler techniques for the data mapping to multibank memory, since data mapping is still an open problem and needs a better solution. The multibank memory can be consisted of volatile and non-volatile memory components to support ultra-low powered wearable devices. This hybrid memory system including volatile and non-volatile memory components yields higher complexity to map data onto it. To efficiently solve this mapping problem, we formulate it to a simple decision problem. Based on the problem definition, we proposed two efficient algorithms to determine the placement of data to the multibank memory. The proposed techniques consider the characteristic of the non-volatile memory that its write operation consumes more energy than the same operation of a volatile memory even though it provides ultra-low operation power and nearly zero leakage current. The proposed technique solves this negative effect of non-volatile memory by using efficient data placement technique and hybrid memory architecture. In experimental section, the result shows that the proposed techniques improve energy saving up to 59.5% for the hybrid multibank memory architecture

    Efficient coherence and consistency for specialized memory hierarchies

    Get PDF
    As the benefits from transistor scaling slow down, specialization is becoming increasingly important for a wide range of applications. Although traditional heterogeneous systems work well for streaming, data parallel applications, they are inefficient for emerging applications, like graph analytics workloads, with fine-grained synchronization, relaxed atomics, and more general sharing patterns. Heterogeneous systems are also difficult to program, which makes it harder for programmers to take advantage of the potential benefits of specialization. This thesis redesigns the memory hierarchy of heterogeneous systems to make heterogeneous systems more efficient and easier to use. In particular, we focus on three key sources of inefficiency in the memory hierarchy of modern heterogeneous systems: (1) a unified global address space, (2) the cache coherence protocol, and (3) the memory consistency model. A unified global address space makes it easier to write programs for heterogeneous systems. Although industry has recently begun to provide a unified global address space across CPUs and accelerators (primarily GPUs), there are many inefficiencies. For example, emerging applications with fine-grained synchronization need better support for coherence and consistency. We find that simple coherence and complex consistency are key sources of inefficiency. To resolve this problem, we adjust the division of complexity between the cache coherence protocol and memory consistency model: we introduce DeNovo for accelerators (DeNovoA), which extends DeNovo’s hybrid, software-driven hardware coherence protocol to heterogeneous systems. Unlike current coherence protocols for heterogeneous systems, DeNovoA obtains ownership for written data, enables heterogeneous systems to use the simpler sequentially consistent for data-race-free (SC-for-DRF, or DRF) memory consistency model, and provides both efficiency and programmability. Across a wide variety of applications, DeNovoA with a DRF memory consistency model either outperforms or provides comparable efficiency to a the state-of-the-art approach. Although DRF is easier to use and works well for most applications, there are some corner cases where its overheads are unnecessary and hurt performance. This led to the introduction of relaxed atomics in the memory consistency models for multi-core CPUs and heterogeneous systems. Although relaxed atomics can significantly improve performance, they are very difficult to use correctly. We address the impact of relaxed atomics on memory consistency models for heterogeneous systems by creating a new memory consistency model, Data-Race-Free-Relaxed or DRFrlx. DRFrlx extends the existing DRF memory consistency models to provide SC-centric semantics for all common uses of relaxed atomics in heterogeneous systems while retaining their efficiency benefits. Thus, DRFrlx makes it easier for programmers to safely use relaxed atomics. Although current heterogeneous systems are adopting unified global address spaces, specialized memories such as scratchpads still exist in disjoint, private address spaces. This increases programming complexity and causes inefficiencies that negate some of the benefits of specialization. We introduce a new memory organization, stash, that mitigates the inefficiencies of specialized memories by integrating them into the coherent, globally visible address space. Stash makes it easier for programmers to use specialized memories and retains their efficiency benefits. Finally, to better understand the tradeoffs and scalability of different coherence protocols and consistency models, we created a suite of synchronization microbenchmarks, HeteroSync. HeteroSync contains various fine-grained synchronization and relaxed atomics algorithms. Moreover, HeteroSync is highly configurable and provides a standard set of fine-grained synchronization microbenchmarks to compare the efficiency of different approaches. In summary, this thesis questions the state-of-the-art approaches for designing memory hierarchies of heterogeneous systems, and shows that the current techniques provide neither efficiency nor programmability for emerging workloads. We demonstrate how DeNovoA with a DRFrlx memory consistency model improves efficiency and programmability for many heterogeneous applications and makes it easier for programmers to use heterogeneous systems

    Gestión de jerarquías de memoria híbridas a nivel de sistema

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática y de Ku Leuven, Arenberg Doctoral School, Faculty of Engineering Science, leída el 11/05/2017.In electronics and computer science, the term ‘memory’ generally refers to devices that are used to store information that we use in various appliances ranging from our PCs to all hand-held devices, smart appliances etc. Primary/main memory is used for storage systems that function at a high speed (i.e. RAM). The primary memory is often associated with addressable semiconductor memory, i.e. integrated circuits consisting of silicon-based transistors, used for example as primary memory but also other purposes in computers and other digital electronic devices. The secondary/auxiliary memory, in comparison provides program and data storage that is slower to access but offers larger capacity. Examples include external hard drives, portable flash drives, CDs, and DVDs. These devices and media must be either plugged in or inserted into a computer in order to be accessed by the system. Since secondary storage technology is not always connected to the computer, it is commonly used for backing up data. The term storage is often used to describe secondary memory. Secondary memory stores a large amount of data at lesser cost per byte than primary memory; this makes secondary storage about two orders of magnitude less expensive than primary storage. There are two main types of semiconductor memory: volatile and nonvolatile. Examples of non-volatile memory are ‘Flash’ memory (sometimes used as secondary, sometimes primary computer memory) and ROM/PROM/EPROM/EEPROM memory (used for firmware such as boot programs). Examples of volatile memory are primary memory (typically dynamic RAM, DRAM), and fast CPU cache memory (typically static RAM, SRAM, which is fast but energy-consuming and offer lower memory capacity per are a unit than DRAM). Non-volatile memory technologies in Si-based electronics date back to the 1990s. Flash memory is widely used in consumer electronic products such as cellphones and music players and NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. The rapid increase of leakage currents in Silicon CMOS transistors with scaling poses a big challenge for the integration of SRAM memories. There is also the case of susceptibility to read/write failure with low power schemes. As a result of this, over the past decade, there has been an extensive pooling of time, resources and effort towards developing emerging memory technologies like Resistive RAM (ReRAM/RRAM), STT-MRAM, Domain Wall Memory and Phase Change Memory(PRAM). Emerging non-volatile memory technologies promise new memories to store more data at less cost than the expensive-to build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. These new memory technologies combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the non-volatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. The research and information on these Non-Volatile Memory (NVM) technologies has matured over the last decade. These NVMs are now being explored thoroughly nowadays as viable replacements for conventional SRAM based memories even for the higher levels of the memory hierarchy. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional(3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years...En el campo de la informática, el término ‘memoria’ se refiere generalmente a dispositivos que son usados para almacenar información que posteriormente será usada en diversos dispositivos, desde computadoras personales (PC), móviles, dispositivos inteligentes, etc. La memoria principal del sistema se utiliza para almacenar los datos e instrucciones de los procesos que se encuentre en ejecución, por lo que se requiere que funcionen a alta velocidad (por ejemplo, DRAM). La memoria principal está implementada habitualmente mediante memorias semiconductoras direccionables, siendo DRAM y SRAM los principales exponentes. Por otro lado, la memoria auxiliar o secundaria proporciona almacenaje(para ficheros, por ejemplo); es más lenta pero ofrece una mayor capacidad. Ejemplos típicos de memoria secundaria son discos duros, memorias flash portables, CDs y DVDs. Debido a que estos dispositivos no necesitan estar conectados a la computadora de forma permanente, son muy utilizados para almacenar copias de seguridad. La memoria secundaria almacena una gran cantidad de datos aun coste menor por bit que la memoria principal, siendo habitualmente dos órdenes de magnitud más barata que la memoria primaria. Existen dos tipos de memorias de tipo semiconductor: volátiles y no volátiles. Ejemplos de memorias no volátiles son las memorias Flash (algunas veces usadas como memoria secundaria y otras veces como memoria principal) y memorias ROM/PROM/EPROM/EEPROM (usadas para firmware como programas de arranque). Ejemplos de memoria volátil son las memorias DRAM (RAM dinámica), actualmente la opción predominante a la hora de implementar la memoria principal, y las memorias SRAM (RAM estática) más rápida y costosa, utilizada para los diferentes niveles de cache. Las tecnologías de memorias no volátiles basadas en electrónica de silicio se remontan a la década de1990. Una variante de memoria de almacenaje por carga denominada como memoria Flash es mundialmente usada en productos electrónicos de consumo como telefonía móvil y reproductores de música mientras NAND Flash solid state disks(SSDs) están progresivamente desplazando a los dispositivos de disco duro como principal unidad de almacenamiento en computadoras portátiles, de escritorio e incluso en centros de datos. En la actualidad, hay varios factores que amenazan la actual predominancia de memorias semiconductoras basadas en cargas (capacitivas). Por un lado, se está alcanzando el límite de integración de las memorias Flash, lo que compromete su escalado en el medio plazo. Por otra parte, el fuerte incremento de las corrientes de fuga de los transistores de silicio CMOS actuales, supone un enorme desafío para la integración de memorias SRAM. Asimismo, estas memorias son cada vez más susceptibles a fallos de lectura/escritura en diseños de bajo consumo. Como resultado de estos problemas, que se agravan con cada nueva generación tecnológica, en los últimos años se han intensificado los esfuerzos para desarrollar nuevas tecnologías que reemplacen o al menos complementen a las actuales. Los transistores de efecto campo eléctrico ferroso (FeFET en sus siglas en inglés) se consideran una de las alternativas más prometedores para sustituir tanto a Flash (por su mayor densidad) como a DRAM (por su mayor velocidad), pero aún está en una fase muy inicial de su desarrollo. Hay otras tecnologías algo más maduras, en el ámbito de las memorias RAM resistivas, entre las que cabe destacar ReRAM (o RRAM), STT-RAM, Domain Wall Memory y Phase Change Memory (PRAM)...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    An efficient cloud storage system for tele-health services

    Full text link
    [EN] Healthcare service is a critical aspect of our daily lives. Enabled by technologies such as wearable devices and wireless sensor networks, tele-health has becoming a promising new field in IT industry. Wearable devices, which detect real-time human body conditions, form body sensor networks (BSNs) for patients. In a cloud-enabled tele-health ecosystem, health data are collected by the BSN and sent to mobile devices such as smart phones and tablets. These embedded devices process the data and forward them to remote data centers. Due to the energy and time constraints of embedded systems, the effectiveness of storage systems become a critical issue. For years, memory technologies such as SRAMs and DRAMs have been widely used in computer systems. SRAMs are fast while DRAMs have high density. However, SRAMs have the disadvantage of power leakage and low density. DRAMs are slower in read and write operations. New memory technology for embedded tele-health is needed. In the paper, we propose a hybrid memory system for embedded tele-health. We combine phase-change memory PCM with flash memory to meet energy and latency requirement while reducing capital expenditure. Moreover, the data allocation and storage on server side is also a challenging problem in tele-health. Effective storage system designs are desired to efficiently store and manage health care data from users. Therefore, in the paper, we design a ecosystem for tele-health including the memory storage for embedded devices and data storage for tele-health data centers. To fully utilize the proposed ecosystem, we design several resource allocation algorithms with dynamic programming and heuristics. The experiments show that our approaches can achieve up to 30% performance enhancement compared to greedy approaches.This work has been partially supported by the Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China ICT1600236 (Prof. Meikang Qiu)Chen, L.; Qiu, M.; Dai, W.; Hassan Mohamed, H. (2017). An efficient cloud storage system for tele-health services. The Journal of Supercomputing. 73(7):2949-2965. https://doi.org/10.1007/s11227-017-1977-yS29492965737Guthaus MR (2001) MiBench: a free, commercially representative embedded benchmark suite. In: IEEE WWC, pp 3–14Hu J (2012) Optimizing data allocation and memory configuration for non-volatile memory based hybrid SPM on embedded CMPs. In: IPDPSW. Shanghai, China, pp 982–989IHS (2012) Medical Devices & Healthcare IT. https://technology.ihs.com/researchareas/450450Lai S (2003) Current status of the phase change memory and its future. In: IEEE International on Electron Devices Meeting, 2003. IEDM’03 Technical DigestLi J, Qiu M (2011) Resource allocation robustness in multi-core embedded systems with inaccurate information. J Syst Archit 57(9):840–849Meza J (2012) Enabling efficient and scalable hybrid memories using fine-granularity DRAM cache management. IEEE Comput Archit Lett 11(2):61–64Okhonin S (2008) Ultra-scaled Z-RAM cell. In: Proceedings of the IEEE International SOI Conference, pp 157–158Qiu M, Chen Z (2014) Energy-aware data allocation with hybrid memory for mobile cloud systems. Syst J IEEE PP(99):1–10Qiu M, Ming Z (2015) Phase-change memory optimization for green cloud with genetic algorithm. IEEE Trans Comput 64(12):3528–3540Ramos LE (2011) Page placement in hybrid memory systems. In: Proceedings of the International Conference on Supercomputing, pp 85–95Shanavas A (2012) Zero capacitor RAM. http://www.edutalks.org/downloads/zram.pdfTian W (2013) Task allocation on nonvolatile-memory-based hybrid main memory. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(7):1271–1284Wilton SJE, Jouppi NP (1996) CACTI: an enhanced cache access and cycle time model. IEEE J Solid-State Circuits 31(5):677–688Wong H (2010) Phase change memory. Proc IEEE 98(12):2201–2227Zhang L, Qiu M (2010) Variable partitioning and scheduling for MPSoC with virtually shared scratch pad memory. J Signal Process Syst 58(2):247–26

    Análise da produção científica sobre o papel do escritório de projetos na gestão do conhecimento período de 2004 a 2014

    Get PDF
    Orientador : Prof. Dr. José Simão de Paula PintoDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Sociais Aplicadas, Programa de Pós-Graduação em Ciência, Gestão e Tecnologia da Informação. Defesa: Curitiba, 27/02/2015Inclui referênciasResumo: Analisa a produção científica referente ao período de 2004 a 2014, de artigos publicados em periódicos científicos indexados por bases indexadoras, sobre gestão do conhecimento e gestão de projetos, programas e portfólio (Grupo I) e sobre gestão do conhecimento em projetos, programas e portfólio e o escritório de gestão de projetos, programas e portfólio (Grupo II). Foi utilizado método informétrico baseado na lei de Zipf (frequência das palavras), Luhn (posicionamento das palavras de maior conteúdo semântico) e Goffman (determinação do ponto de transição) para identificar os termos que melhor descrevem o conteúdo dos dois grupos de artigos. Para caracterizar a produção científica, foram utilizados métodos bibliométricos baseado nas leis de Lotka (produtividade de autores) e Bradford (produtividade de periódicos), e cientométrico (países e instituições com maior produção e citações), com a utilização do software HistCite para as análises bibliométricas. O experimento informétrico baseado na lei de Zipf para o Grupo I identificou 237 palavras-chave as quais apresentaram 79% de compatibilidade quando comparadas às palavras-chave fornecidas pelos autores. O experimento informétrico baseado na lei de Zipf para o Grupo II identificou 8 palavras-chave as quais apresentaram 63% de compatibilidade comparadas às palavras chave fornecidas pelos autores. Para o Grupo I foram identificados 41 autores com 3 ou mais contribuições no período e para o Grupo II foi identificado 1 autor com 3 contribuições no período. Palavras-chave: Informetria. Bibliometria. Cientometria. Gestão do conhecimento. Gestão de projetos. Gestão de programas de projetos. Gestão de portfólio de projetos. Escritório de gerenciamento de projetos.Abstract: This research analyzes the scientific production of papers published in scientific journals from 2004 to 2014 and indexed by , on knowledge management and project, program and portfolio management, (referred as Group I) and on knowledge management, project, program and portfolio management, and project management office (referred as Group II). The informetric method used was based on Zipf's law (words frequency), Luhn (position of the words with higher semantic content) and Goffman (transition point) to identify terms that better describe the contents of the two groups of articles. Bibliometric methods were used in order to characterize the scientific production, were used bibliometric methods based on the laws of Lotka (author productivity) and Bradford (journal's productivity), and scientometric (countries and institutions with higher production and quotes), using the HistCite software. The Group I papers informetric experiment, based on Zipf's law, has identified 237 keywords which showed 79% compatibility compared to the keywords provided by the authors. The Group II informetric experiment, based on Zipf's law, identified 8 keywords which showed 63% compatibility compared to the keywords provided by the authors. For Group I there are 41 authors identified with 3 or more contributions in the period and the Group II was identified one author with three contributions in the period. Key-words: Informetrics. Bibliometrics. Scientometrics. Knowledge management. Project management. Project program management. Project portfolio management. Project management office

    Resource Allocation for Software Pipelines in Many-core Systems

    Get PDF
    Many-core systems integrate a growing number of cores on a single chip and are expected to integrate hundreds and even thousands of cores soon. Despite their massive processing power, it is crucial to employ their resources efficiently to benefit from parallel processing. This dissertation tackles a major challenge, resource allocation, for complex, memory-intensive applications. The proposed methods allow to significantly improve the performance over the state of the art in many scenarios

    Programming models for many-core architectures: a co-design approach

    Get PDF
    Common many-core processors contain tens of cores and distributed memory. Compared to a multicore system, which only has a few tightly coupled cores sharing a single bus and memory, several complex problems arise. Notably, many cores require many parallel tasks to fully utilize the cores, and communication happens in a distributed and decentralized way. Therefore, programming such a processor requires the application to exhibit concurrency. In contrast to a single-core application, a concurrent application has to deal with memory state changes with an observable (non-deterministic) intermediate state. The complexity introduced by these problems makes programming a many-core system with a single-core-based programming approach notoriously hard.\ud \ud The central concept of this thesis is that abstractions, which are related to (many-core) programming, are structured in a single platform model. A platform is a layered view of the hardware, a memory model, a concurrency model, a model of computation, and compile-time and run-time tooling. Then, a programming model is a specific view on this platform, which is used by a programmer. In this view, some details can be hidden from the programmer's perspective, some details cannot. For example, an operating system presents an infinite number of parallel virtual execution units to the application whilst it hides details regarding scheduling. On the other hand, a programmer usually has balance workload among threads by hand.\ud \ud This thesis presents modifications to different abstraction layers of a many-core architecture, in order to make the system as a whole more efficient, and to reduce the programming complexity. These modifications influence other abstractions in the platform, and especially the programming model. Therefore, this thesis applies co-design on all models. Notably, co-design of the memory model, concurrency model, and model of computation is required for a scalable implementation of lambda-calculus. Moreover, only the combination of requirements of the many-core hardware from one side and the concurrency model from the other leads to a memory model abstraction. Hence, this thesis shows that to cope with the current trends in many-core architectures from a programming perspective, it is essential and feasible to inspect and adapt all abstractions collectively

    Power-Efficient and Low-Latency Memory Access for CMP Systems with Heterogeneous Scratchpad On-Chip Memory

    Get PDF
    The gradually widening speed disparity of between CPU and memory has become an overwhelming bottleneck for the development of Chip Multiprocessor (CMP) systems. In addition, increasing penalties caused by frequent on-chip memory accesses have raised critical challenges in delivering high memory access performance with tight power and latency budgets. To overcome the daunting memory wall and energy wall issues, this thesis focuses on proposing a new heterogeneous scratchpad memory architecture which is configured from SRAM, MRAM, and Z-RAM. Based on this architecture, we propose two algorithms, a dynamic programming and a genetic algorithm, to perform data allocation to different memory units, therefore reducing memory access cost in terms of power consumption and latency. Extensive and intensive experiments are performed to show the merits of the heterogeneous scratchpad architecture over the traditional pure memory system and the effectiveness of the proposed algorithms
    corecore