127,561 research outputs found
Chaotic Compilation for Encrypted Computing: Obfuscation but Not in Name
An `obfuscation' for encrypted computing is quantified exactly here, leading
to an argument that security against polynomial-time attacks has been achieved
for user data via the deliberately `chaotic' compilation required for security
properties in that environment. Encrypted computing is the emerging science and
technology of processors that take encrypted inputs to encrypted outputs via
encrypted intermediate values (at nearly conventional speeds). The aim is to
make user data in general-purpose computing secure against the operator and
operating system as potential adversaries. A stumbling block has always been
that memory addresses are data and good encryption means the encrypted value
varies randomly, and that makes hitting any target in memory problematic
without address decryption, yet decryption anywhere on the memory path would
open up many easily exploitable vulnerabilities. This paper `solves (chaotic)
compilation' for processors without address decryption, covering all of ANSI C
while satisfying the required security properties and opening up the field for
the standard software tool-chain and infrastructure. That produces the argument
referred to above, which may also hold without encryption.Comment: 31 pages. Version update adds "Chaotic" in title and throughout
paper, and recasts abstract and Intro and other sections of the text for
better access by cryptologists. To the same end it introduces the polynomial
time defense argument explicitly in the final section, having now set that
denouement out in the abstract and intr
Parallel/distributed direct method for solving linear systems
A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes
Synchronization Landscapes in Small-World-Connected Computer Networks
Motivated by a synchronization problem in distributed computing we studied a
simple growth model on regular and small-world networks, embedded in one and
two-dimensions. We find that the synchronization landscape (corresponding to
the progress of the individual processors) exhibits Kardar-Parisi-Zhang-like
kinetic roughening on regular networks with short-range communication links.
Although the processors, on average, progress at a nonzero rate, their spread
(the width of the synchronization landscape) diverges with the number of nodes
(desynchronized state) hindering efficient data management. When random
communication links are added on top of the one and two-dimensional regular
networks (resulting in a small-world network), large fluctuations in the
synchronization landscape are suppressed and the width approaches a finite
value in the large system-size limit (synchronized state). In the resulting
synchronization scheme, the processors make close-to-uniform progress with a
nonzero rate without global intervention. We obtain our results by ``simulating
the simulations", based on the exact algorithmic rules, supported by
coarse-grained arguments.Comment: 20 pages, 22 figure
Integrated spatial multiplexing of heralded single photon sources
The non-deterministic nature of photon sources is a key limitation for single
photon quantum processors. Spatial multiplexing overcomes this by enhancing the
heralded single photon yield without enhancing the output noise. Here the
intrinsic statistical limit of an individual source is surpassed by spatially
multiplexing two monolithic silicon correlated photon pair sources,
demonstrating a 62.4% increase in the heralded single photon output without an
increase in unwanted multi-pair generation. We further demonstrate the
scalability of this scheme by multiplexing photons generated in two waveguides
pumped via an integrated coupler with a 63.1% increase in the heralded photon
rate. This demonstration paves the way for a scalable architecture for
multiplexing many photon sources in a compact integrated platform and achieving
efficient two photon interference, required at the core of optical quantum
computing and quantum communication protocols.Comment: 10 pages, 3 figures, comments welcom
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
Energy efficiency is becoming increasingly important for computing systems,
in particular for large scale HPC facilities. In this work we evaluate, from an
user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS)
techniques, assisted by the power and energy monitoring capabilities of modern
processors in order to tune applications for energy efficiency. We run selected
kernels and a full HPC application on two high-end processors widely used in
the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate
the available trade-offs between energy-to-solution and time-to-solution,
attempting a function-by-function frequency tuning. We finally estimate the
benefits obtainable running the full code on a HPC multi-GPU node, with respect
to default clock frequency governors. We instrument our code to accurately
monitor power consumption and execution time without the need of any additional
hardware, and we enable it to change CPUs and GPUs clock frequencies while
running. We analyze our results on the different architectures using a simple
energy-performance model, and derive a number of energy saving strategies which
can be easily adopted on recent high-end HPC systems for generic applications
DLP acceleration on general purpose cores
High-performance and power-efficient multimedia
computing drives the design of modern and increasingly utilized
mobile devices. State-of-the-art low power processors already utilize
chip multiprocessors (CMP) that add dedicated DLP accelerators
for emerging multimedia applications and 3D games. Such
heterogeneous processors deliver desired performance and efficiency
at the cost of extra hardware specialized accelerators. In this paper,
we propose dynamically-tuned vector execution (DVX) by morphing
one or more available cores in a CMP into a DLP accelerator. DVX
improves performance and power efficiency of the CMP, without
additional costs for dedicated accelerators
rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Scientific applications often contain large and computationally intensive
parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a
balanced load execution of such applications on high performance computing
(HPC) systems. Large HPC systems are vulnerable to processors or node failures
and perturbations in the availability of resources. Most self-scheduling
approaches do not consider fault-tolerant scheduling or depend on failure or
perturbation detection and react by rescheduling failed tasks. In this work, a
robust dynamic load balancing (rDLB) approach is proposed for the robust self
scheduling of independent tasks. The proposed approach is proactive and does
not depend on failure or perturbation detection. The theoretical analysis of
the proposed approach shows that it is linearly scalable and its cost decrease
quadratically by increasing the system size. rDLB is integrated into an MPI DLS
library to evaluate its performance experimentally with two computationally
intensive scientific applications. Results show that rDLB enables the tolerance
of up to (P minus one) processor failures, where P is the number of processors
executing an application. In the presence of perturbations, rDLB boosted the
robustness of DLS techniques up to 30 times and decreased application execution
time up to 7 times compared to their counterparts without rDLB
- …