145 research outputs found

    Blasting Through The Front-End Bottleneck With Shotgun

    Get PDF

    Warming Up a Cold Front-End with Ignite

    Get PDF
    Serverless computing is a popular software deployment model for the cloud, in which applications are designed as a collection of stateless tasks. Developers are charged for the CPU time and memory footprint during the execution of each  serverless function, which incentivizes them to reduce both runtime and memory usage. As a result, functions tend to be short (often on the order of a few milliseconds) and compact (128–256 MB). Cloud providers can pack thousands of such functions on a server, resulting in frequent context switches and a tremendous degree of interleaving. As a result, when a given memory-resident function is re-invoked, it commonly finds its on-chip microarchitectural state completelycold due to thrashing by other functions — a phenomenon termed lukewarm invocation. Our analysis shows that the cold microarchitectural state due to lukewarm invocations is highly detrimental to performance, which corroborates prior work. The main source of performance degradation is the front-end, composed of instruction delivery, branch identification via the BTB and the conditional branch prediction. State-of-the-art front-end prefetchers show only limited effectiveness on lukewarm invocations, falling considerably short of an ideal front-end. We demonstrate that the reason for this is the cold microarchitectural state of the branch identification and prediction units. In response, we introduce Ignite, a comprehensive restoration mechanism for front-end microarchitectural state targeting instructions, BTB and branch predictor via unified metadata. Ignite records an invocation’s control flow graph in compressed format and uses that to restore the front-end structures the next time the function is invoked. Ignite outperforms state-of-the-art front-end prefetchers, improving performance by an average of 43% by significantly reducing instruction, BTB and branch predictor MPKI

    Molecular Mechanisms of Crop Domestication Revealed by Comparative Analysis of the Transcriptomes Between Cultivated and Wild Soybeans

    Get PDF
    Soybean is one of the key crops necessary to meet the food requirement of the increasing global population. However, in order to meet this need, the quality and quantity of soybean yield must be greatly enhanced. Soybean yield advancement depends on the presence of favorable genes in the genome pool that have significantly changed during domestication. To make use of those domesticated genes, this study involved seven cultivated, G. max, and four wild-type, G. soja, soybeans. Their genomes were studied from developing pods to decipher the molecular mechanisms underlying crop domestication. Specifically, their transcriptomes were analyzed comparatively to previous related studies, with the intention of contributing further to the literature. For these goals, several bioinformatics applications were utilized, including De novo transcriptome assembly, transcriptome abundance quantification, and discovery of differentially expressed genes (DEGs) and their functional annotations and network visualizations. The results revealed 1,247 DEGs, 916 of which were upregulated in the cultivated soybean in comparison to wild type. Findings were mostly corresponded to literature review results, especially regarding genes affecting two focused, domesticated-related pod-shattering resistance and seed size traits. These traits were shown to be upregulated in cultivated soybeans and down-regulated in wild type. However, the opposite trend was shown in disease-related genes, which were down-regulated or not even present in the cultivated soybean genome. Further, 47 biochemical functions of the identified DEGs at the cellular level were revealed, providing some knowledge about the molecular mechanisms of genes related to the two aforementioned subjected traits. While our findings provide valuable insight about the molecular mechanisms of soybean domestication attributed to annotation of differentially expressed genes and transcripts, these results must be dissected further and/or reprocessed with a higher number of samples in order to advance the field

    Morrigan: A Composite Instruction TLB Prefetcher

    Get PDF
    The effort to reduce address translation overheads has typically targeted data accesses since they constitute the overwhelming portion of the second-level TLB (STLB) misses in desktop and HPC applications. The address translation cost of instruction accesses has been relatively neglected due to historically small instruction footprints. However, state-of-the-art datacenter and server applications feature massive instruction footprints owing to deep software stacks, resulting in high STLB miss rates for instruction accesses. This paper demonstrates that instruction address translation is a performance bottleneck in server workloads. In response, we propose Morrigan, a microarchitectural instruction STLB prefetcher whose design is based on new insights regarding instruction STLB misses. At the core of Morrigan there is an ensemble of table-based Markov prefetchers that build and store variable length Markov chains out of the instruction STLB miss stream. Morrigan further employs a sequential prefetcher and a scheme that exploits page table locality to maximize miss coverage. An important contribution of the work is showing that access frequency is more important than access recency when choosing replacement candidates. Based on this insight, Morrigan introduces a new replacement policy that identifies victims in the Markov prefetchers using a frequency stack while adapting to phase-change behavior. On a set of 45 industrial server workloads, Morrigan eliminates 69% of the memory references in demand page walks triggered by instruction STLB misses and improves geometric mean performance by 7.6%.This work is partially supported by the Spanish Ministry of Science and Technology through the PID2019-107255GB project, the Generalitat de Catalunya (contract 2017-SGR-1414), the NSF grant CCF-1912617, the Semiconductor Research Corporation grant 2936.001, and generous gifts from Intel Labs. Georgios Vavouliotis has been supported by the Spanish Ministry of Economy, Industry and Competitiveness and the European Social Fund under the FPI fellowship No. PRE2018-087046. Marc Casas has been supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Ramon y Cajal fellowship No. RYC-2017-23269.Peer ReviewedPostprint (author's final draft

    Morrigan: A composite instruction TLB prefetcher

    Get PDF
    The effort to reduce address translation overheads has typically targeted data accesses since they constitute the overwhelming portion of the second-level TLB (STLB) misses in desktop and HPC applications. The address translation cost of instruction accesses has been relatively neglected due to historically small instruction footprints. However, state-of-the-art datacenter and server applications feature massive instruction footprints owing to deep software stacks, resulting in high STLB miss rates for instruction accesses. This paper demonstrates that instruction address translation is a performance bottleneck in server workloads. In response, we propose Morrigan, a microarchitectural instruction STLB prefetcher whose design is based on new insights regarding instruction STLB misses. At the core of Morrigan there is an ensemble of table-based Markov prefetchers that build and store variable length Markov chains out of the instruction STLB miss stream. Morrigan further employs a sequential prefetcher and a scheme that exploits page table locality to maximize miss coverage. An important contribution of the work is showing that access frequency is more important than access recency when choosing replacement candidates. Based on this insight, Morrigan introduces a new replacement policy that identifies victims in the Markov prefetchers using a frequency stack while adapting to phase-change behavior. On a set of 45 industrial server workloads, Morrigan eliminates 69% of the memory references in demand page walks triggered by instruction STLB misses and improves geometric mean performance by 7.6%.This work is partially supported by the Spanish Ministry of Science and Technology through the PID2019-107255GB project, the Generalitat de Catalunya (contract 2017-SGR-1414), the NSF grant CCF-1912617, the Semiconductor Research Corporation grant 2936.001, and generous gifts from Intel Labs. Georgios Vavouliotis has been supported by the Spanish Ministry of Economy, Industry and Competitiveness and the European Social Fund under the FPI fellowship No. PRE2018-087046. Marc Casas has been supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Ramon y Cajal fellowship No. RYC-2017-23269.Peer ReviewedPostprint (author's final draft

    A Storage-Effective BTB Organization for Servers

    Get PDF

    Wrong-Path-Aware Entangling Instruction Prefetcher

    Get PDF
    © 2023.IEEE. This document is made available under the CC-BY 4.0 license http://creativecommons.org/licenses/by /4.0/ This document is the Accepted version of a Published Work that appeared in final form in IEEE Transactions on Computers. To access the final edited and published work see DOI 10.1109/TC.2023.3337308Instruction prefetching is instrumental for guaranteeing a high flow of instructions through the processor front end for applications whose working set does not fit in the lowerlevel caches. Examples of such applications are server workloads, whose instruction footprints are constantly growing. There are two main techniques to mitigate this problem: fetch directed prefetching (or decoupled front end) and instruction cache (L1I) prefetching. This work extends the state-of-the-art Entangling prefetcher to avoid training during wrong-path execution. Our Entangling wrong-path-aware prefetcher is equipped with microarchitectural techniques that eliminate more than 99% of wrong-path pollution, thus reaching 98.9% of the performance of an ideal wrongpath-aware solution. Next, we propose two microarchitectural optimizations able to further increase performance benefits by 1.8%, on average. All this is achieved with just 304 bytes. Finally, we study the interplay between the L1I prefetcher and a decoupled front end. Our analysis shows that due to pollution caused by wrong-path instructions, the degree of decoupling cannot be increased unlimitedly without negative effects on the energy-delay product (EDP). Furthermore, the closer to ideal is the L1I prefetcher, the less decoupling is required. For example, our Entangling prefetcher reaches an optimal EDP with a decoupling degree of 64 instructions

    Ancient DNA studies: new perspectives on old samples

    Get PDF
    In spite of past controversies, the field of ancient DNA is now a reliable research area due to recent methodological improvements. A series of recent large-scale studies have revealed the true potential of ancient DNA samples to study the processes of evolution and to test models and assumptions commonly used to reconstruct patterns of evolution and to analyze population genetics and palaeoecological changes. Recent advances in DNA technologies, such as next-generation sequencing make it possible to recover DNA information from archaeological and paleontological remains allowing us to go back in time and study the genetic relationships between extinct organisms and their contemporary relatives. With the next-generation sequencing methodologies, DNA sequences can be retrieved even from samples (for example human remains) for which the technical pitfalls of classical methodologies required stringent criteria to guaranty the reliability of the results. In this paper, we review the methodologies applied to ancient DNA analysis and the perspectives that next-generation sequencing applications provide in this field
    • 

    corecore