605 research outputs found
Distributed Simulation of Statevectors and Density Matrices
Classical simulation of quantum computers is an irreplaceable step in the
design of quantum algorithms. Exponential simulation costs demand the use of
high-performance computing techniques, and in particular distribution, whereby
the quantum state description is partitioned between a network of cooperating
computers - necessary for the exact simulation of more than approximately 30
qubits. Distributed computing is notoriously difficult, requiring bespoke
algorithms dissimilar to their serial counterparts with different resource
considerations, and which appear to restrict the utilities of a quantum
simulator. This manuscript presents a plethora of novel algorithms for
distributed full-state simulation of gates, operators, noise channels and other
calculations in digital quantum computers. We show how a simple, common but
seemingly restrictive distribution model actually permits a rich set of
advanced facilities including Pauli gadgets, many-controlled many-target
general unitaries, density matrices, general decoherence channels, and partial
traces. These algorithms include asymptotically, polynomially improved
simulations of exotic gates, and thorough motivations for high-performance
computing techniques which will be useful for even non-distributed simulators.
Our results are derived in language familiar to a quantum information theory
audience, and our algorithms formalised for the scientific simulation
community. We have implemented all algorithms herein presented into an
isolated, minimalist C++ project, hosted open-source on Github with a
permissive MIT license, and extensive testing. This manuscript aims both to
significantly improve the high-performance quantum simulation tools available,
and offer a thorough introduction to, and derivation of, full-state simulation
techniques.Comment: 56 pages, 18 figures, 28 algorithms, 1 tabl
Quantifying pervasive authentication: the case of the Hancke-Kuhn protocol
As mobile devices pervade physical space, the familiar authentication
patterns are becoming insufficient: besides entity authentication, many
applications require, e.g., location authentication. Many interesting protocols
have been proposed and implemented to provide such strengthened forms of
authentication, but there are very few proofs that such protocols satisfy the
required security properties. The logical formalisms, devised for reasoning
about security protocols on standard computer networks, turn out to be
difficult to adapt for reasoning about hybrid protocols, used in pervasive and
heterogenous networks.
We refine the Dolev-Yao-style algebraic method for protocol analysis by a
probabilistic model of guessing, needed to analyze protocols that mix weak
cryptography with physical properties of nonstandard communication channels.
Applying this model, we provide a precise security proof for a proximity
authentication protocol, due to Hancke and Kuhn, that uses a subtle form of
probabilistic reasoning to achieve its goals.Comment: 31 pages, 2 figures; short version of this paper appeared in the
Proceedings of MFPS 201
JANUS: an FPGA-based System for High Performance Scientific Computing
This paper describes JANUS, a modular massively parallel and reconfigurable
FPGA-based computing system. Each JANUS module has a computational core and a
host. The computational core is a 4x4 array of FPGA-based processing elements
with nearest-neighbor data links. Processors are also directly connected to an
I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for,
but not limited to, the requirements of a class of hard scientific applications
characterized by regular code structure, unconventional data manipulation
instructions and not too large data-base size. We discuss the architecture of
this configurable machine, and focus on its use on Monte Carlo simulations of
statistical mechanics. On this class of application JANUS achieves impressive
performances: in some cases one JANUS processing element outperfoms high-end
PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of
scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted
to Computing in Science & Engineerin
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview
Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation
A general and efficient divide-and-conquer algorithm framework for multi-core clusters
This is a post-peer-review, pre-copyedit version of an article published in Cluster Computing.
The final authenticated version is available online at: https://doi.org/10.1007/s10586-017-0766-y[Abstract]Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributed-shared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.Ministerio de EconomĂa y Competitividad; TIN2013-42148-PMinisterio de EconomĂa y Competitividad; TIN2016-75845-PXunta de Galicia; GRC2013/05
Exploring New Computing Paradigms for Data-Intensive Applications
L'abstract è presente nell'allegato / the abstract is in the attachmen
GPU Accelerated protocol analysis for large and long-term traffic traces
This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark
External Verification of SCADA System Embedded Controller Firmware
Critical infrastructures such as oil and gas pipelines, the electric power grid, and railways, rely on the proper operation of supervisory control and data acquisition (SCADA) systems. Current SCADA systems, however, do not have sufficient tailored electronic security solutions. Solutions available are developed primarily for information technology (IT) systems. Indeed, the toolkit for SCADA incident prevention and response is unavailing as the operating parameters associated with SCADA systems are different from IT systems. The unique environment necessitates tailored solutions. Consider the programmable logic controllers (PLCs) that directly connect to end physical systems for control and monitoring of operating parameters -- the compromise of a PLC could result in devastating physical consequences. Yet PLCs remain particularly vulnerable due to a lack of firmware auditing capabilities. This research presents a tool we developed specifically for the SCADA environment to verify PLC firmware. The tool does not require any modifications to the SCADA system and can be implemented on a variety of systems and platforms. The tool captures serial data during firmware uploads and then verifies them against a known good firmware baseline. Attempts to inject modified and/or malicious firmware are identified by the tool. Additionally, the tool can replay and analyze captured data by emulating a PLC during firmware upload. The emulation capability enables verification of the firmware upload from an interface computer without requiring modifications to or interactions with the operational SCADA system. The ability to isolate the tool from production systems and verify the validity of firmware makes the tool a viable application for SCADA incident response teams and security engineers
- …