1,005 research outputs found

    The FermiFab Toolbox for Fermionic Many-Particle Quantum Systems

    Full text link
    This paper introduces the FermiFab toolbox for many-particle quantum systems. It is mainly concerned with the representation of (symbolic) fermionic wavefunctions and the calculation of corresponding reduced density matrices (RDMs). The toolbox transparently handles the inherent antisymmetrization of wavefunctions and incorporates the creation/annihilation formalism. Thus, it aims at providing a solid base for a broad audience to use fermionic wavefunctions with the same ease as matrices in Matlab, say. Leveraging symbolic computation, the toolbox can greatly simply tedious pen-and-paper calculations for concrete quantum mechanical systems, and serves as "sandbox" for theoretical hypothesis testing. FermiFab (including full source code) is freely available as a plugin for both Matlab and Mathematica.Comment: 17 pages, 5 figure

    Annual report

    Get PDF

    Global citizenship report

    Get PDF

    Exploiting asynchrony from exact forward recovery for DUE in iterative solvers

    Get PDF
    This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page level, which enables the use of simple algorithmic redundancies to correct errors. Such redundancies would be inapplicable under coarse grain error detection, but become very powerful when the hardware is able to precisely detect errors. Relations straightforwardly extracted from the solver allow to recover lost data exactly. This method is free of the overheads of backwards recoveries like checkpointing, and does not compromise mathematical convergence properties of the solver as restarting would do. We apply this recovery to three widely used Krylov subspace methods, CG, GMRES and BiCGStab, and their preconditioned versions. We implement our resilience techniques on CG considering scenarios from small (8 cores) to large (1024 cores) scales, and demonstrate very low overheads compared to state-of-the-art solutions. We deploy our recovery techniques either by overlapping them with algorithmic computations or by forcing them to be in the critical path of the application. A trade-off exists between both approaches depending on the error rate the solver is suffering. Under realistic error rates, overlapping decreases overheads from 5.37% down to 3.59% for a non-preconditioned CG on 8 cores.This work has been partially supported by the European Research Council under the European Union's 7th FP, ERC Advanced Grant 321253, and by the Spanish Ministry of Science and Innovation under grant TIN2012-34557. L. Jaulmes has been partially supported by the Spanish Ministry of Education, Culture and Sports under grant FPU2013/06982. M. Moreto has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the European Union's 7th FP (contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

    Disengaged Scheduling for Fair, Protected Access to Fast Computational Accelerators

    Get PDF
    Today’s operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4 % overhead on average (max 18%) compared to direct devic

    Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory

    Get PDF
    We propose an axiomatic generic framework for modelling weak memory. We show how to instantiate this framework for SC, TSO, C++ restricted to release-acquire atomics, and Power. For Power, we compare our model to a preceding operational model in which we found a flaw. To do so, we define an operational model that we show equivalent to our axiomatic model. We also propose a model for ARM. Our testing on this architecture revealed a behaviour later acknowl-edged as a bug by ARM, and more recently 31 additional anomalies. We offer a new simulation tool, called herd, which allows the user to specify the model of his choice in a concise way. Given a specification of a model, the tool becomes a simulator for that model. The tool relies on an axiomatic description; this choice allows us to outperform all previous simulation tools. Additionally, we confirm that verification time is vastly improved, in the case of bounded model checking. Finally, we put our models in perspective, in the light of empirical data obtained by analysing the C and C++ code of a Debian Linux distribution. We present our new analysis tool, called mole, which explores a piece of code to find the weak memory idioms that it uses

    MoonGen: A Scriptable High-Speed Packet Generator

    Full text link
    We present MoonGen, a flexible high-speed packet generator. It can saturate 10 GbE links with minimum sized packets using only a single CPU core by running on top of the packet processing framework DPDK. Linear multi-core scaling allows for even higher rates: We have tested MoonGen with up to 178.5 Mpps at 120 Gbit/s. We move the whole packet generation logic into user-controlled Lua scripts to achieve the highest possible flexibility. In addition, we utilize hardware features of Intel NICs that have not been used for packet generators previously. A key feature is the measurement of latency with sub-microsecond precision and accuracy by using hardware timestamping capabilities of modern commodity NICs. We address timing issues with software-based packet generators and apply methods to mitigate them with both hardware support on commodity NICs and with a novel method to control the inter-packet gap in software. Features that were previously only possible with hardware-based solutions are now provided by MoonGen on commodity hardware. MoonGen is available as free software under the MIT license at https://github.com/emmericp/MoonGenComment: Published at IMC 201

    Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

    Get PDF
    The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

    An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

    Get PDF
    Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations

    The EAP competencies in a group case study project as revealed by a task analysis

    Get PDF
    In EAP in the UK, preparing students for real academic assignments in higher education is a central focus. As assessed group project assignments are becoming increasingly popular in UK universities, this study investigates the demands of a first year undergraduate group case study project in business. Task based syllabuses are common in EAP, so a task analysis framework developed from task-based learning was used to examine the project documents, the requirements of the project and its chain of integrated tasks. The findings show that the project document was dense and multifaceted and the integrated tasks were highly interactive and extremely complex in terms of cognitive and code complexity and communicative stress. In this particular instance, the difficulties faced by international students are examined and the demands in terms of team dynamics and management,language and EAP competencies are identified. The study recommends that EAP practitioners need to be more aware of the academic competencies and group dynamics involved in both complex group projects and case study projects in the receiving disciplines and programmes
    corecore