87,934 research outputs found

    Software trace cache

    Get PDF
    We explore the use of compiler optimizations, which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance; the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized, codes have some special characteristics that make them more amenable for high-performance instruction fetch. They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.Peer ReviewedPostprint (published version

    Inverse Uncertainty Quantification using the Modular Bayesian Approach based on Gaussian Process, Part 2: Application to TRACE

    Full text link
    Inverse Uncertainty Quantification (UQ) is a process to quantify the uncertainties in random input parameters while achieving consistency between code simulations and physical observations. In this paper, we performed inverse UQ using an improved modular Bayesian approach based on Gaussian Process (GP) for TRACE physical model parameters using the BWR Full-size Fine-Mesh Bundle Tests (BFBT) benchmark steady-state void fraction data. The model discrepancy is described with a GP emulator. Numerical tests have demonstrated that such treatment of model discrepancy can avoid over-fitting. Furthermore, we constructed a fast-running and accurate GP emulator to replace TRACE full model during Markov Chain Monte Carlo (MCMC) sampling. The computational cost was demonstrated to be reduced by several orders of magnitude. A sequential approach was also developed for efficient test source allocation (TSA) for inverse UQ and validation. This sequential TSA methodology first selects experimental tests for validation that has a full coverage of the test domain to avoid extrapolation of model discrepancy term when evaluated at input setting of tests for inverse UQ. Then it selects tests that tend to reside in the unfilled zones of the test domain for inverse UQ, so that one can extract the most information for posterior probability distributions of calibration parameters using only a relatively small number of tests. This research addresses the "lack of input uncertainty information" issue for TRACE physical input parameters, which was usually ignored or described using expert opinion or user self-assessment in previous work. The resulting posterior probability distributions of TRACE parameters can be used in future uncertainty, sensitivity and validation studies of TRACE code for nuclear reactor system design and safety analysis

    Analyzing collaborative learning processes automatically

    Get PDF
    In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multidimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in

    Instruction fetch architectures and code layout optimizations

    Get PDF
    The design of higher performance processors has been following two major trends: increasing the pipeline depth to allow faster clock rates, and widening the pipeline to allow parallel execution of more instructions. Designing a higher performance processor implies balancing all the pipeline stages to ensure that overall performance is not dominated by any of them. This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode enough instructions to keep the pipeline full and the functional units busy. This paper explores the challenges faced by the instruction fetch stage for a variety of processor designs, from early pipelined processors, to the more aggressive wide issue superscalars. We describe the different fetch engines proposed in the literature, the performance issues involved, and some of the proposed improvements. We also show how compiler techniques that optimize the layout of the code in memory can be used to improve the fetch performance of the different engines described Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks per cycle: the evolution of the mechanism surrounding the instruction cache, and the different compiler optimizations used to better employ these mechanisms.Peer ReviewedPostprint (published version
    • …
    corecore