127,561 research outputs found

    Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name

    Get PDF
    An `obfuscation' for encrypted computing is quantified exactly here, leading to an argument that security against polynomial-time attacks has been achieved for user data via the deliberately `chaotic' compilation required for security properties in that environment. Encrypted computing is the emerging science and technology of processors that take encrypted inputs to encrypted outputs via encrypted intermediate values (at nearly conventional speeds). The aim is to make user data in general-purpose computing secure against the operator and operating system as potential adversaries. A stumbling block has always been that memory addresses are data and good encryption means the encrypted value varies randomly, and that makes hitting any target in memory problematic without address decryption, yet decryption anywhere on the memory path would open up many easily exploitable vulnerabilities. This paper `solves (chaotic) compilation' for processors without address decryption, covering all of ANSI C while satisfying the required security properties and opening up the field for the standard software tool-chain and infrastructure. That produces the argument referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout paper, and recasts abstract and Intro and other sections of the text for better access by cryptologists. To the same end it introduces the polynomial time defense argument explicitly in the final section, having now set that denouement out in the abstract and intr

    Parallel/distributed direct method for solving linear systems

    Get PDF
    A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes

    Synchronization Landscapes in Small-World-Connected Computer Networks

    Full text link
    Motivated by a synchronization problem in distributed computing we studied a simple growth model on regular and small-world networks, embedded in one and two-dimensions. We find that the synchronization landscape (corresponding to the progress of the individual processors) exhibits Kardar-Parisi-Zhang-like kinetic roughening on regular networks with short-range communication links. Although the processors, on average, progress at a nonzero rate, their spread (the width of the synchronization landscape) diverges with the number of nodes (desynchronized state) hindering efficient data management. When random communication links are added on top of the one and two-dimensional regular networks (resulting in a small-world network), large fluctuations in the synchronization landscape are suppressed and the width approaches a finite value in the large system-size limit (synchronized state). In the resulting synchronization scheme, the processors make close-to-uniform progress with a nonzero rate without global intervention. We obtain our results by ``simulating the simulations", based on the exact algorithmic rules, supported by coarse-grained arguments.Comment: 20 pages, 22 figure

    Integrated spatial multiplexing of heralded single photon sources

    Full text link
    The non-deterministic nature of photon sources is a key limitation for single photon quantum processors. Spatial multiplexing overcomes this by enhancing the heralded single photon yield without enhancing the output noise. Here the intrinsic statistical limit of an individual source is surpassed by spatially multiplexing two monolithic silicon correlated photon pair sources, demonstrating a 62.4% increase in the heralded single photon output without an increase in unwanted multi-pair generation. We further demonstrate the scalability of this scheme by multiplexing photons generated in two waveguides pumped via an integrated coupler with a 63.1% increase in the heralded photon rate. This demonstration paves the way for a scalable architecture for multiplexing many photon sources in a compact integrated platform and achieving efficient two photon interference, required at the core of optical quantum computing and quantum communication protocols.Comment: 10 pages, 3 figures, comments welcom

    Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

    Get PDF
    Energy efficiency is becoming increasingly important for computing systems, in particular for large scale HPC facilities. In this work we evaluate, from an user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS) techniques, assisted by the power and energy monitoring capabilities of modern processors in order to tune applications for energy efficiency. We run selected kernels and a full HPC application on two high-end processors widely used in the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate the available trade-offs between energy-to-solution and time-to-solution, attempting a function-by-function frequency tuning. We finally estimate the benefits obtainable running the full code on a HPC multi-GPU node, with respect to default clock frequency governors. We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running. We analyze our results on the different architectures using a simple energy-performance model, and derive a number of energy saving strategies which can be easily adopted on recent high-end HPC systems for generic applications

    DLP acceleration on general purpose cores

    Get PDF
    High-performance and power-efficient multimedia computing drives the design of modern and increasingly utilized mobile devices. State-of-the-art low power processors already utilize chip multiprocessors (CMP) that add dedicated DLP accelerators for emerging multimedia applications and 3D games. Such heterogeneous processors deliver desired performance and efficiency at the cost of extra hardware specialized accelerators. In this paper, we propose dynamically-tuned vector execution (DVX) by morphing one or more available cores in a CMP into a DLP accelerator. DVX improves performance and power efficiency of the CMP, without additional costs for dedicated accelerators

    rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

    Full text link
    Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depend on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decrease quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally intensive scientific applications. Results show that rDLB enables the tolerance of up to (P minus one) processor failures, where P is the number of processors executing an application. In the presence of perturbations, rDLB boosted the robustness of DLS techniques up to 30 times and decreased application execution time up to 7 times compared to their counterparts without rDLB
    • …
    corecore