16,132 research outputs found

    Trace-level reuse

    Get PDF
    Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, the instructions that make up such traces have the same source operand values. The execution of such traces will obviously produce the same outcome and thus, their execution can be skipped if the processor records the outcome of previous executions. This paper presents an analysis of the performance potential of trace-level reuse and discusses a preliminary realistic implementation. Like instruction-level reuse, trace-level reuse can improve performance by decreasing resource contention and the latency of some instructions. However, we show that trace-level reuse is more effective than instruction-level reuse because the former can avoid fetching the instructions of reused traces. This has two important benefits: it reduces the fetch bandwidth requirements, and it increases the effective instruction window size since these instructions do not occupy window entries. Moreover, trace-level reuse can compute all at once the result of a chain of dependent instructions, which may allow the processor to avoid the serialization caused by data dependences and thus, to potentially exceed the dataflow limit.Peer ReviewedPostprint (published version

    An empirical study on behavioural intention to reuse e-learning systems in rural China

    Get PDF
    The learner’s acceptance of e-learning systems has received extensive attention in prior studies, but how their experience of using e-learning systems impacts on their behavioural intention to reuse those systems has attracted limited research. As the applications of e-learning are still gaining momentum in developing countries, such as China, it is necessary to examine the relationships between e-learners’ experience and perceptions and their behavioural intention to reuse, because it is argued that system reuse is an important indicator of the system’s success. Therefore, a better understanding of the multiple factors affecting the e-learner’s intention to reuse could help e-learning system researchers and providers to develop more effective and acceptable e-learning systems. Underpinned by the information system success model, technology acceptance model and self-efficacy theory, a theoretical framework was developed to investigate the learner’s behavioural intention to reuse e-learning systems. A total of 280 e-learners were surveyed to validate the measurements and proposed research model. The results demonstrated that e-learning service quality, course quality, perceived usefulness, perceived ease of use and self-efficacy had direct effects on users’ behavioural intention to reuse. System functionality and system response have an indirect effect, but system interactivity had no significant effect. Furthermore, self-efficacy affected perceived ease of use that positively influenced perceived usefulness

    Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft

    Full text link
    Analytic performance models are essential for understanding the performance characteristics of loop kernels, which consume a major part of CPU cycles in computational science. Starting from a validated performance model one can infer the relevant hardware bottlenecks and promising optimization opportunities. Unfortunately, analytic performance modeling is often tedious even for experienced developers since it requires in-depth knowledge about the hardware and how it interacts with the software. We present the "Kerncraft" tool, which eases the construction of analytic performance models for streaming kernels and stencil loop nests. Starting from the loop source code, the problem size, and a description of the underlying hardware, Kerncraft can ideally predict the single-core performance and scaling behavior of loops on multicore processors using the Roofline or the Execution-Cache-Memory (ECM) model. We describe the operating principles of Kerncraft with its capabilities and limitations, and we show how it may be used to quickly gain insights by accelerated analytic modeling.Comment: 11 pages, 4 figures, 8 listing

    A Functional Architecture Approach to Neural Systems

    Get PDF
    The technology for the design of systems to perform extremely complex combinations of real-time functionality has developed over a long period. This technology is based on the use of a hardware architecture with a physical separation into memory and processing, and a software architecture which divides functionality into a disciplined hierarchy of software components which exchange unambiguous information. This technology experiences difficulty in design of systems to perform parallel processing, and extreme difficulty in design of systems which can heuristically change their own functionality. These limitations derive from the approach to information exchange between functional components. A design approach in which functional components can exchange ambiguous information leads to systems with the recommendation architecture which are less subject to these limitations. Biological brains have been constrained by natural pressures to adopt functional architectures with this different information exchange approach. Neural networks have not made a complete shift to use of ambiguous information, and do not address adequate management of context for ambiguous information exchange between modules. As a result such networks cannot be scaled to complex functionality. Simulations of systems with the recommendation architecture demonstrate the capability to heuristically organize to perform complex functionality

    Exploring the mathematics of motion through construction and collaboration

    Get PDF
    In this paper we give a detailed account of the design principles and construction of activities designed for learning about the relationships between position, velocity and acceleration, and corresponding kinematics graphs. Our approach is model-based, that is, it focuses attention on the idea that students constructed their own models – in the form of programs – to formalise and thus extend their existing knowledge. In these activities, students controlled the movement of objects in a programming environment, recording the motion data and plotting corresponding position-time and velocity-time graphs. They shared their findings on a specially-designed web-based collaboration system, and posted cross-site challenges to which others could react. We present learning episodes that provide evidence of students making discoveries about the relationships between different representations of motion. We conjecture that these discoveries arose from their activity in building models of motion and their participation in classroom and online communities

    Multicore-Aware Reuse Distance Analysis

    Get PDF
    This paper presents and validates methods to extend reuse distance analysis of application locality characteristics to shared-memory multicore platforms by accounting for invalidation-based cache-coherence and inter-core cache sharing. Existing reuse distance analysis methods track the number of distinct addresses referenced between reuses of the same address by a given thread, but do not model the effects of data references by other threads. This paper shows several methods to keep reuse stacks consistent so that they account for invalidations and cache sharing, either as references arise in a simulated execution or at synchronization points. These methods are evaluated against a Simics-based coherent cache simulator running several OpenMP and transaction-based benchmarks. The results show that adding multicore-awareness substantially improves the ability of reuse distance analysis to model cache behavior, reducing the error in miss ratio prediction (relative to cache simulation for a specific cache size) by an average of 69% for per-core caches and an average of 84% for shared caches
    corecore