605 research outputs found

    Distributed Simulation of Statevectors and Density Matrices

    Full text link
    Classical simulation of quantum computers is an irreplaceable step in the design of quantum algorithms. Exponential simulation costs demand the use of high-performance computing techniques, and in particular distribution, whereby the quantum state description is partitioned between a network of cooperating computers - necessary for the exact simulation of more than approximately 30 qubits. Distributed computing is notoriously difficult, requiring bespoke algorithms dissimilar to their serial counterparts with different resource considerations, and which appear to restrict the utilities of a quantum simulator. This manuscript presents a plethora of novel algorithms for distributed full-state simulation of gates, operators, noise channels and other calculations in digital quantum computers. We show how a simple, common but seemingly restrictive distribution model actually permits a rich set of advanced facilities including Pauli gadgets, many-controlled many-target general unitaries, density matrices, general decoherence channels, and partial traces. These algorithms include asymptotically, polynomially improved simulations of exotic gates, and thorough motivations for high-performance computing techniques which will be useful for even non-distributed simulators. Our results are derived in language familiar to a quantum information theory audience, and our algorithms formalised for the scientific simulation community. We have implemented all algorithms herein presented into an isolated, minimalist C++ project, hosted open-source on Github with a permissive MIT license, and extensive testing. This manuscript aims both to significantly improve the high-performance quantum simulation tools available, and offer a thorough introduction to, and derivation of, full-state simulation techniques.Comment: 56 pages, 18 figures, 28 algorithms, 1 tabl

    Quantifying pervasive authentication: the case of the Hancke-Kuhn protocol

    Full text link
    As mobile devices pervade physical space, the familiar authentication patterns are becoming insufficient: besides entity authentication, many applications require, e.g., location authentication. Many interesting protocols have been proposed and implemented to provide such strengthened forms of authentication, but there are very few proofs that such protocols satisfy the required security properties. The logical formalisms, devised for reasoning about security protocols on standard computer networks, turn out to be difficult to adapt for reasoning about hybrid protocols, used in pervasive and heterogenous networks. We refine the Dolev-Yao-style algebraic method for protocol analysis by a probabilistic model of guessing, needed to analyze protocols that mix weak cryptography with physical properties of nonstandard communication channels. Applying this model, we provide a precise security proof for a proximity authentication protocol, due to Hancke and Kuhn, that uses a subtle form of probabilistic reasoning to achieve its goals.Comment: 31 pages, 2 figures; short version of this paper appeared in the Proceedings of MFPS 201

    JANUS: an FPGA-based System for High Performance Scientific Computing

    Get PDF
    This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted to Computing in Science & Engineerin

    Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

    Get PDF
    Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation

    A general and efficient divide-and-conquer algorithm framework for multi-core clusters

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Cluster Computing. The final authenticated version is available online at: https://doi.org/10.1007/s10586-017-0766-y[Abstract]Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributed-shared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.Ministerio de EconomĂ­a y Competitividad; TIN2013-42148-PMinisterio de EconomĂ­a y Competitividad; TIN2016-75845-PXunta de Galicia; GRC2013/05

    Exploring New Computing Paradigms for Data-Intensive Applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    GPU Accelerated protocol analysis for large and long-term traffic traces

    Get PDF
    This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark

    External Verification of SCADA System Embedded Controller Firmware

    Get PDF
    Critical infrastructures such as oil and gas pipelines, the electric power grid, and railways, rely on the proper operation of supervisory control and data acquisition (SCADA) systems. Current SCADA systems, however, do not have sufficient tailored electronic security solutions. Solutions available are developed primarily for information technology (IT) systems. Indeed, the toolkit for SCADA incident prevention and response is unavailing as the operating parameters associated with SCADA systems are different from IT systems. The unique environment necessitates tailored solutions. Consider the programmable logic controllers (PLCs) that directly connect to end physical systems for control and monitoring of operating parameters -- the compromise of a PLC could result in devastating physical consequences. Yet PLCs remain particularly vulnerable due to a lack of firmware auditing capabilities. This research presents a tool we developed specifically for the SCADA environment to verify PLC firmware. The tool does not require any modifications to the SCADA system and can be implemented on a variety of systems and platforms. The tool captures serial data during firmware uploads and then verifies them against a known good firmware baseline. Attempts to inject modified and/or malicious firmware are identified by the tool. Additionally, the tool can replay and analyze captured data by emulating a PLC during firmware upload. The emulation capability enables verification of the firmware upload from an interface computer without requiring modifications to or interactions with the operational SCADA system. The ability to isolate the tool from production systems and verify the validity of firmware makes the tool a viable application for SCADA incident response teams and security engineers
    • …
    corecore