883 research outputs found
The FermiFab Toolbox for Fermionic Many-Particle Quantum Systems
This paper introduces the FermiFab toolbox for many-particle quantum systems.
It is mainly concerned with the representation of (symbolic) fermionic
wavefunctions and the calculation of corresponding reduced density matrices
(RDMs). The toolbox transparently handles the inherent antisymmetrization of
wavefunctions and incorporates the creation/annihilation formalism. Thus, it
aims at providing a solid base for a broad audience to use fermionic
wavefunctions with the same ease as matrices in Matlab, say. Leveraging
symbolic computation, the toolbox can greatly simply tedious pen-and-paper
calculations for concrete quantum mechanical systems, and serves as "sandbox"
for theoretical hypothesis testing. FermiFab (including full source code) is
freely available as a plugin for both Matlab and Mathematica.Comment: 17 pages, 5 figure
MoonGen: A Scriptable High-Speed Packet Generator
We present MoonGen, a flexible high-speed packet generator. It can saturate
10 GbE links with minimum sized packets using only a single CPU core by running
on top of the packet processing framework DPDK. Linear multi-core scaling
allows for even higher rates: We have tested MoonGen with up to 178.5 Mpps at
120 Gbit/s. We move the whole packet generation logic into user-controlled Lua
scripts to achieve the highest possible flexibility. In addition, we utilize
hardware features of Intel NICs that have not been used for packet generators
previously. A key feature is the measurement of latency with sub-microsecond
precision and accuracy by using hardware timestamping capabilities of modern
commodity NICs. We address timing issues with software-based packet generators
and apply methods to mitigate them with both hardware support on commodity NICs
and with a novel method to control the inter-packet gap in software. Features
that were previously only possible with hardware-based solutions are now
provided by MoonGen on commodity hardware. MoonGen is available as free
software under the MIT license at https://github.com/emmericp/MoonGenComment: Published at IMC 201
Exploiting asynchrony from exact forward recovery for DUE in iterative solvers
This paper presents a method to protect iterative solvers from Detected and Uncorrected Errors (DUE) relying on error detection techniques already available in commodity hardware. Detection operates at the memory page level, which enables the use of simple algorithmic redundancies to correct errors. Such redundancies would be inapplicable under coarse grain error detection, but become very powerful when the hardware is able to precisely detect errors.
Relations straightforwardly extracted from the solver allow to recover lost data exactly. This method is free of the overheads of backwards recoveries like checkpointing, and does not compromise mathematical convergence properties of the solver as restarting would do. We apply this recovery to three widely used Krylov subspace methods, CG, GMRES and BiCGStab, and their preconditioned versions.
We implement our resilience techniques on CG considering scenarios from small (8 cores) to large (1024 cores) scales, and demonstrate very low overheads compared to state-of-the-art solutions. We deploy our recovery techniques either by overlapping them with algorithmic computations or by forcing them to be in the critical path of the application. A trade-off exists between both approaches depending on the error rate the solver is suffering. Under realistic error rates, overlapping decreases overheads from 5.37% down to 3.59% for a non-preconditioned CG on 8 cores.This work has been partially supported by the European Research Council under the European Union's 7th FP, ERC Advanced Grant 321253, and by the Spanish Ministry of Science and Innovation under grant TIN2012-34557. L. Jaulmes has been partially supported by the Spanish Ministry of Education, Culture and Sports under grant FPU2013/06982.
M. Moreto has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la
Cierva postdoctoral fellowship JCI-2012-15047. M. Casas
has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the European Union's 7th FP (contract 2013 BP
B 00243).Peer ReviewedPostprint (author's final draft
Disengaged Scheduling for Fair, Protected Access to Fast Computational Accelerators
Today’s operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4 % overhead on average (max 18%) compared to direct devic
Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory
We propose an axiomatic generic framework for modelling weak memory. We show how to instantiate this framework for SC, TSO, C++ restricted to release-acquire atomics, and Power. For Power, we compare our model to a preceding operational model in which we found a flaw. To do so, we define an operational model that we show equivalent to our axiomatic model. We also propose a model for ARM. Our testing on this architecture revealed a behaviour later acknowl-edged as a bug by ARM, and more recently 31 additional anomalies. We offer a new simulation tool, called herd, which allows the user to specify the model of his choice in a concise way. Given a specification of a model, the tool becomes a simulator for that model. The tool relies on an axiomatic description; this choice allows us to outperform all previous simulation tools. Additionally, we confirm that verification time is vastly improved, in the case of bounded model checking. Finally, we put our models in perspective, in the light of empirical data obtained by analysing the C and C++ code of a Debian Linux distribution. We present our new analysis tool, called mole, which explores a piece of code to find the weak memory idioms that it uses
Towards optimal packed string matching
a r t i c l e i n f o a b s t r a c t Dedicated to Professor Gad M. Landau, on the occasion of his 60th birthday Keywords: String matching Word-RAM Packed strings In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. The main word-size string-matching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern pre-processing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work. (b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string-matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in O (1) time
デルとインテルの戦略的パートナーシップ
1 本研究と本稿について13;
2 驚異的な成長と「インテル・インサイド」の謎13;
3 デルのエンタープライズ・ソリューション事業13;
4 製品戦略と品質管理13;
5 民事訴訟: 05-44113;
6 システム・ロックイン戦略13;
7 小
Robot object manipulation using stereoscopic vision and conformal geometric algebra
Abstract. This paper uses geometric algebra to formulate, in a single framework, the kinematics of a three finger robotic hand, a binocular robotic head, and the interactions between 3D objects, all of which are seen in stereo images. The main objective is the formulation of a kinematic control law to close the loop between perception and actions, which allows to perform a smooth visually guided object manipulation
- …