Search CORE

1,005 research outputs found

The FermiFab Toolbox for Fermionic Many-Particle Quantum Systems

Author: Anderson
Ando
Christian B. Mendl
Coleman
Dirac
Friesecke
Friesecke
Hohenberg
Intel
Intel
Intel
Kohn
Loewdin
Mazziotti
Mendl
Mendl
Peskin
von Neumann
Publication venue: 'Elsevier BV'
Publication date: 04/03/2011
Field of study

This paper introduces the FermiFab toolbox for many-particle quantum systems. It is mainly concerned with the representation of (symbolic) fermionic wavefunctions and the calculation of corresponding reduced density matrices (RDMs). The toolbox transparently handles the inherent antisymmetrization of wavefunctions and incorporates the creation/annihilation formalism. Thus, it aims at providing a solid base for a broad audience to use fermionic wavefunctions with the same ease as matrices in Matlab, say. Leveraging symbolic computation, the toolbox can greatly simply tedious pen-and-paper calculations for concrete quantum mechanical systems, and serves as "sandbox" for theoretical hypothesis testing. FermiFab (including full source code) is freely available as a plugin for both Matlab and Mathematica.Comment: 17 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Annual report

Author: Intel Corporation
Publication venue: [S.l.] : Intel,
Publication date
Field of study

Diposit Digital de Documents de la UAB

Global citizenship report

Author: Intel Corporation
Publication venue: [S.l.] : Intel,
Publication date
Field of study

Diposit Digital de Documents de la UAB

Exploiting asynchrony from exact forward recovery for DUE in iterative solvers

Author: Architectures Software Developer's Intel®
Berry M.
Degalahal V.
Family Intel® Xeon®
Kleen A.
Li X.
Manual Architecture Programmer's
Shewchuk J. R.
Sorin D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page level, which enables the use of simple algorithmic redundancies to correct errors. Such redundancies would be inapplicable under coarse grain error detection, but become very powerful when the hardware is able to precisely detect errors. Relations straightforwardly extracted from the solver allow to recover lost data exactly. This method is free of the overheads of backwards recoveries like checkpointing, and does not compromise mathematical convergence properties of the solver as restarting would do. We apply this recovery to three widely used Krylov subspace methods, CG, GMRES and BiCGStab, and their preconditioned versions. We implement our resilience techniques on CG considering scenarios from small (8 cores) to large (1024 cores) scales, and demonstrate very low overheads compared to state-of-the-art solutions. We deploy our recovery techniques either by overlapping them with algorithmic computations or by forcing them to be in the critical path of the application. A trade-off exists between both approaches depending on the error rate the solver is suffering. Under realistic error rates, overlapping decreases overheads from 5.37% down to 3.59% for a non-preconditioned CG on 8 cores.This work has been partially supported by the European Research Council under the European Union's 7th FP, ERC Advanced Grant 321253, and by the Spanish Ministry of Science and Innovation under grant TIN2012-34557. L. Jaulmes has been partially supported by the Spanish Ministry of Education, Culture and Sports under grant FPU2013/06982. M. Moreto has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the European Union's 7th FP (contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

UPCommons (Universitat Politècnica de Catalunya)

Disengaged Scheduling for Fair, Protected Access to Fast Computational Accelerators

Author: Dwarakinath A.
GPU
Gupta V.
Intel Corporation
Kato S.
Kato S.
Kyriazis G.
Menychtas K.
Shen K.
Soares L.
Publication venue
Publication date: 04/09/2014
Field of study

Today’s operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4 % overhead on average (max 18%) compared to direct devic

CiteSeerX

Crossref

Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory

Author: Alglave Jade
Bertot Yves
Boudol Gérard
Burckhardt Sebastian
Collier William
Compaq Computer Corp. 2002.
Grisenthwaite Richard
Howells David
IBM Corp. 2009.
Intel Corp. 2002.
Intel Corp. 2009.
Kuperstein Michael
Ltd ARM
Ltd ARM
Nardelli Francesco Zappa
Neiger Gil
Paul
SPARC International Inc. 1992.
SPARC International Inc. 1994.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose an axiomatic generic framework for modelling weak memory. We show how to instantiate this framework for SC, TSO, C++ restricted to release-acquire atomics, and Power. For Power, we compare our model to a preceding operational model in which we found a flaw. To do so, we define an operational model that we show equivalent to our axiomatic model. We also propose a model for ARM. Our testing on this architecture revealed a behaviour later acknowl-edged as a bug by ARM, and more recently 31 additional anomalies. We offer a new simulation tool, called herd, which allows the user to specify the model of his choice in a concise way. Given a specification of a model, the tool becomes a simulator for that model. The tool relies on an axiomatic description; this choice allows us to outperform all previous simulation tools. Additionally, we confirm that verification time is vastly improved, in the case of bounded model checking. Finally, we put our models in perspective, in the light of empirical data obtained by analysing the C and C++ code of a Debian Linux distribution. We present our new analysis tool, called mole, which explores a piece of code to find the weak memory idioms that it uses

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Queen Mary Research Online

HAL: Hyper Article en Ligne

MoonGen: A Scriptable High-Speed Packet Generator

Author: Datasheet Intel Ethernet
Datasheet Intel Ethernet
Gallenmüller Sebastian
Gigabit Ethernet Controller Datasheet Intel
IEEE
Rizzo Luigi
Salim Jamal Hadi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/03/2016
Field of study

We present MoonGen, a flexible high-speed packet generator. It can saturate 10 GbE links with minimum sized packets using only a single CPU core by running on top of the packet processing framework DPDK. Linear multi-core scaling allows for even higher rates: We have tested MoonGen with up to 178.5 Mpps at 120 Gbit/s. We move the whole packet generation logic into user-controlled Lua scripts to achieve the highest possible flexibility. In addition, we utilize hardware features of Intel NICs that have not been used for packet generators previously. A key feature is the measurement of latency with sub-microsecond precision and accuracy by using hardware timestamping capabilities of modern commodity NICs. We address timing issues with software-based packet generators and apply methods to mitigate them with both hardware support on commodity NICs and with a novel method to control the inter-packet gap in software. Features that were previously only possible with hardware-based solutions are now provided by MoonGen on commodity hardware. MoonGen is available as free software under the MIT license at https://github.com/emmericp/MoonGenComment: Published at IMC 201

arXiv.org e-Print Archive

Crossref

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

Author: Bolz J.
Brandvik T.
Buck I.
Elsen E.
Fan Z.
Goodnight N.
Griebel M.
Gropp W.
Göddeke D.
Göddeke D.
Göddeke D.
Harris M.J.
Hempel R.
Intel
Kindratenko V.
Krüger J.
Liu Y.
Owens J.D.
Schive H.
Showerman M.
Simek V.
Tölke J.
Wan D.C.
Zhao Y.
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2010
Field of study

Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations

Crossref

Boise State University - ScholarWorks

The EAP competencies in a group case study project as revealed by a task analysis

Author: Anderson
Ann F.V. Smith
Ashraf
BALEAP
Basturkmen
Brindley
Brown
Bygate
Canale
Candlin
Celce-Murcia
Chapman
Chen
Child
Colbeck
Coxhead
Crookes
Crossley
Davies
Dudley-Evans
Duff
Ede
Ellis
Ellis
Evans
Finkbeiner
Flowerdew
Foster
Garcia-Mayo
Gillett
Hansen
Hunt
Hyland
Hyland
Hyland
Hyland
Intel.
Jackson
Jackson
Johns
Juliet Thondhlana
Kagan
Kim
Leki
Leki
Long
Long
Lowry
Mohan
Morton-Holmes
Nation
Nunan
Nunan
Pamlin
Pica
Prabhu
Robinson
Robinson
Robinson
School of Business
Shepperd
Skehan
Skehan
Skehan
Skehan
Smith
Spencer-Oatey
Summers
Thondhlana
Thondhlana
Tuckman
Van den Branden
Victoria University of Wellington
Watson Todd
Willis
Young
Publication venue: 'Elsevier BV'
Publication date: 01/12/2015
Field of study

In EAP in the UK, preparing students for real academic assignments in higher education is a central focus. As assessed group project assignments are becoming increasingly popular in UK universities, this study investigates the demands of a first year undergraduate group case study project in business. Task based syllabuses are common in EAP, so a task analysis framework developed from task-based learning was used to examine the project documents, the requirements of the project and its chain of integrated tasks. The findings show that the project document was dense and multifaceted and the integrated tasks were highly interactive and extremely complex in terms of cognitive and code complexity and communicative stress. In this particular instance, the difficulties faced by international students are examined and the demands in terms of team dynamics and management,language and EAP competencies are identified. The study recommends that EAP practitioners need to be more aware of the academic competencies and group dynamics involved in both complex group projects and case study projects in the receiving disciplines and programmes

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham