164 research outputs found

    Hardware schemes for early register release

    Get PDF
    Register files are becoming one of the critical components of current out-of-order processors in terms of delay and power consumption, since their potential to exploit instruction-level parallelism is quite related to the size and number of ports of the register file. In conventional register renaming schemes, register releasing is conservatively done only after the instruction that redefines the same register is committed. Instead, we propose a scheme that releases registers as soon as the processor knows that there will be no further use of them. We present two early releasing hardware implementations with different performance/complexity trade-offs. Detailed cycle-level simulations show either a significant speedup for a given register file size, or a reduction in register file size for a given performance level.Peer ReviewedPostprint (published version

    Late allocation and early release of physical registers

    Get PDF
    The register file is one of the critical components of current processors in terms of access time and power consumption. Among other things, the potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register renaming schemes, both register allocation and releasing are conservatively done, the former at the rename stage, before registers are loaded with values, and the latter at the commit stage of the instruction redefining the same register, once registers are not used any more. We introduce VP-LAER, a renaming scheme that allocates registers later and releases them earlier than conventional schemes. Specifically, physical registers are allocated at the end of the execution stage and released as soon as the processor realizes that there will be no further use of them. VP-LAER enhances register utilization, that is, the fraction of allocated registers having a value to be read in the future. Detailed cycle-level simulations show either a significant speedup for a given register file size or a reduction in the register file size for a given performance level, especially for floating-point codes, where the register file pressure is usually high.Peer ReviewedPostprint (published version

    A comparison of cache hierarchies for SMT processors

    Get PDF
    In the multithread and multicore era, programs are forced to share part of the processor structures. On one hand, the state of the art in multithreading describes how efficiently manage and distribute inner resources such as reorder buffer or issue windows. On the other hand, there is a substantial body of works focused on outer resources, mainly on how to effectively share last level caches in multicores. Between these ends, first and second level caches have remained apart even if they are shared in most commercial multithreaded processors. This work analyzes multiprogrammed workloads as the worst-case scenario for cache sharing among threads. In order to obtain representative results, we present a sampling-based methodology that for multiple metrics such as STP, ANTT, IPC throughput, or fairness, reduces simulation time up to 4 orders of magnitude when running 8-thread workloads with an error lower than 3% and a confidence level of 97%. With the above mentioned methodology, we compare several state-of-the-art cache hierarchies, and observe that Light NUCA provides performance benefits in SMT processors regardless the organization of the last level cache. Most importantly, Light NUCA gains are consistent across the entire number of simulated threads, from one to eight.Peer ReviewedPostprint (author's final draft

    Concertina: Squeezing in cache content to operate at near-threshold voltage

    Get PDF
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Scaling supply voltage to values near the threshold voltage allows a dramatic decrease in the power consumption of processors; however, the lower the voltage, the higher the sensitivity to process variation, and, hence, the lower the reliability. Large SRAM structures, like the last-level cache (LLC), are extremely vulnerable to process variation because they are aggressively sized to satisfy high density requirements. In this paper, we propose Concertina, an LLC designed to enable reliable operation at low voltages with conventional SRAM cells. Based on the observation that for many applications the LLC contains large amounts of null data, Concertina compresses cache blocks in order that they can be allocated to cache entries with faulty cells, enabling use of 100 percent of the LLC capacity. To distribute blocks among cache entries, Concertina implements a compression- and fault-aware insertion/replacement policy that reduces the LLC miss rate. Concertina reaches the performance of an ideal system implementing an LLC that does not suffer from parameter variation with a modest storage overhead. Specifically, performance degrades by less than 2 percent, even when using small SRAM cells, which implies over 90 percent of cache entries having defective cells, and this represents a notable improvement on previously proposed techniques.Peer ReviewedPostprint (author's final draft

    Feminización de las migraciones en México

    Get PDF
    En los últimos tiempos estamos constatando un aumento de la presencia de la mujer en el proceso migratorio. A pesar de que hay elementos que invisibilizan su presencia, ésta implica la consideración de nuevos aspectos que nos permiten acuñar el término feminización de las migraciones. Término que alude no sólo al aumento cuantitativo de la mujer en el proceso sino también a la consideración de las causas y consecuencias del proyecto migratorio de la misma que adquiere características diferentes al de los hombres. Aunque una de las causas principales de la migración femenina siga siendo secundar el proyecto de los hombres, cada vez son más las mujeres que inician ellas solas el proceso migratorio para mejorar su vida y la de sus hijos/as. El fenómeno de la feminización de las migraciones responde a transformaciones socioeconómicas que afectan tanto a los países de partida como de acogida. En estos últimos, la demanda de mujeres que atiendan el cuidado de personas, cada vez es mayor. En los países de origen, el deseo de autonomía de la mujer, les impulsa a partir. Estas características comunes a las migraciones internacionales y nacionales, son las que hemos encontrado en nuestra investigación realizada en Tamaulipas (frontera México con EEUU)

    L2C2: Last-level compressed-contents non-volatile cache and a procedure to forecast performance and lifetime

    Get PDF
    Several emerging non-volatile (NV) memory technologies are rising as interesting alternatives to build the Last-Level Cache (LLC). Their advantages, compared to SRAM memory, are higher density and lower static power, but write operations wear out the bitcells to the point of eventually losing their storage capacity. In this context, this paper presents a novel LLC organization designed to extend the lifetime of the NV data array and a procedure to forecast in detail the capacity and performance of such an NV-LLC over its lifetime. From a methodological point of view, although different approaches are used in the literature to analyze the degradation of an NV-LLC, none of them allows to study in detail its temporal evolution. In this sense, this work proposes a forecasting procedure that combines detailed simulation and prediction, allowing an accurate analysis of the impact of different cache control policies and mechanisms (replacement, wear-leveling, compression, etc.) on the temporal evolution of the indices of interest, such as the effective capacity of the NV-LLC or the system IPC. We also introduce L2C2, a LLC design intended for implementation in NV memory technology that combines fault tolerance, compression, and internal write wear leveling for the first time. Compression is not used to store more blocks and increase the hit rate, but to reduce the write rate and increase the lifetime during which the cache supports near-peak performance. In addition, to support byte loss without performance drop, L2C2 inherently allows N redundant bytes to be added to each cache entry. Thus, L2C2+N, the endurance-scaled version of L2C2, allows balancing the cost of redundant capacity with the benefit of longer lifetime. For instance, as a use case, we have implemented the L2C2 cache with STT-RAM technology. It has affordable hardware overheads compared to that of a baseline NV-LLC without compression in terms of area, latency and energy consumption, and increases up to 6-37 times the time in which 50% of the effective capacity is degraded, depending on the variability in the manufacturing process. Compared to L2C2, L2C2+6 which adds 6 bytes of redundant capacity per entry, that means 9.1% of storage overhead, can increase up to 1.4-4.3 times the time in which the system gets its initial peak performance degraded

    Virtual registers

    Get PDF
    The number of physical registers is one of the critical issues of current superscalar out-of-order processors. Conventional architectures allocate, in the decoding stage, a new storage location (e.g. a physical register) for each operation that has a destination register. When an instruction is committed, it frees the physical register allocated to the previous instruction that had the same destination logical register. Thus, an additional register (i.e. in addition to the number of logical registers) is used for each instruction with a destination register from the time it is decoded until it is committed. In this paper, we propose a novel register organization that allocates physical registers when instructions complete their execution. In this way, the register pressure is significantly reduced, since the additional register is only used from the time execution completes until the instruction is committed. For some long-latency instructions (e.g. load with a cache miss) and for parts of the code with a small amount of parallelism, the savings could be very high. We have evaluated the new scheme for a superscalar processor and obtained a significant speedup.Peer ReviewedPostprint (published version

    Gender studies: Knowledge transfer betweeen Europe and Latin America

    Full text link
    In this paper we present the proyect (GENDERCIT) "Gender and Citizenship" coordinated for the University Pablo de Olavide and funded by the People Programme of European Union, IRSES, one of Marie Curie actions, Seventh Frame work Programme (FP7/ 2007 2013/under grantagreement 318960). Its main objective is to create an international scientific network to promote interdisciplinary studies, original and innovative. Since this proyect will study and analyze gender realtions in various countries in Europe and Latin America, focussing to four areas: a) Citizenship Equality of social and political rights; b) violenc of gender; c) education; e) migration. The transfer of knowledge between Euorope (Spain, Portugal, France and Italy) an Latin America (Argentina and Mexico), was done through academic stays, jointn training and research on the subject. Through the following pages analyze the need for gender studies in different areas, and present the design of our project and the results obtained so far seeking to strengthen the network of researches that is developing internationally. Moreover we intend to promote gender studies min contexts where patriarchy mis still deeply rooted in society, and in areas where gender inequalities are still very presentEn este artículo presentamos el proyecto GENDERCIT (Género y Ciudadanía) coordinado por la Universidad Pablo de Olavide y financiado por el Programa People de la Unión Europea, IRSES, una de las acciones del Marie Curie, séptimo programa marco (FP7/2007-2013/under grantagreement 318960) Su objetivo principal es crear una red científica internacional para impulsar estudios de género interdisciplinares, originales e innovadores. Desde este proyecto se estudian y analizan las relaciones de género en diversos países de Europa y América Latina, centándonos en cuatro ámbitos a)Ciudadanía. Igualdad de derechos sociales y políticos b);violencia de género; c) educación; d) migraciones. La transferencia de conocimiento entre Europa (España, Portugal, Francia e Italia) y América Latina (Argentina y Méjico), se realiza a través de estancias académicas, investigaciones conjuntas y formación sobre la temática A través de las siguientes páginas analizamos la necesidad realizar estudios género en distintos ámbitos y presentamos el diseño de nuestro proyecto y los resultados obtenidos hasta momento, buscando potenciar la red de investigadoras/es que se está desarrollando a nivel internacional . Por otro lado, pretendemos impulsar estudios de género en aquellos contextos donde el el patriarcado está aún muy arraigado en la sociedad, y en aquellos ámbitos donde las desigualdades de género siguen estando muy presente

    Light NUCA: a proposal for bridging the inter-cache latency gap

    Get PDF
    To deal with the “memory wall” problem, microprocessors include large secondary on-chip caches. But as these caches enlarge, they originate a new latency gap between them and fast L1 caches (inter-cache latency gap). Recently, Non-Uniform Cache Architectures (NUCAs) have been proposed to sustain the size growth trend of secondary caches that is threatened by wire-delay problems. NUCAs are size-oriented, and they were not conceived to close the inter-cache latency gap. To tackle this problem, we propose Light NUCAs (L-NUCAs) leveraging on-chip wire density to interconnect small tiles through specialized networks, which convey packets with distributed and dynamic routing. Our design reduces the tile delay (cache access plus one-hop routing) to a single processor cycle and places cache lines at a finer-granularity than conventional caches reducing cache latency. Our evaluations show that in general, L-NUCA improves simultaneously performance, energy, and area when integrated into both conventional or D-NUCA hierarchies.Postprint (author’s final draft
    corecore