238,662 research outputs found
Byzantine Generals in the Permissionless Setting
Consensus protocols have traditionally been studied in a setting where all
participants are known to each other from the start of the protocol execution.
In the parlance of the 'blockchain' literature, this is referred to as the
permissioned setting. What differentiates Bitcoin from these previously studied
protocols is that it operates in a permissionless setting, i.e. it is a
protocol for establishing consensus over an unknown network of participants
that anybody can join, with as many identities as they like in any role. The
arrival of this new form of protocol brings with it many questions. Beyond
Bitcoin, what can we prove about permissionless protocols in a general sense?
How does recent work on permissionless protocols in the blockchain literature
relate to the well-developed history of research on permissioned protocols in
distributed computing?
To answer these questions, we describe a formal framework for the analysis of
both permissioned and permissionless systems. Our framework allows for
"apples-to-apples" comparisons between different categories of protocols and,
in turn, the development of theory to formally discuss their relative merits. A
major benefit of the framework is that it facilitates the application of a rich
history of proofs and techniques in distributed computing to problems in
blockchain and the study of permissionless systems. Within our framework, we
then address the questions above. We consider the Byzantine Generals Problem as
a formalisation of the problem of reaching consensus, and address a programme
of research that asks, "Under what adversarial conditions, and for what types
of permissionless protocol, is consensus possible?" We prove a number of
results for this programme, our main result being that deterministic consensus
is not possible for decentralised permissionless protocols. To close, we give a
list of eight open questions
Parallel TREE code for two-component ultracold plasma analysis
The TREE method has been widely used for long-range interaction {\it N}-body
problems. We have developed a parallel TREE code for two-component classical
plasmas with open boundary conditions and highly non-uniform charge
distributions. The program efficiently handles millions of particles evolved
over long relaxation times requiring millions of time steps. Appropriate domain
decomposition and dynamic data management were employed, and large-scale
parallel processing was achieved using an intermediate level of granularity of
domain decomposition and ghost TREE communication. Even though the
computational load is not fully distributed in fine grains, high parallel
efficiency was achieved for ultracold plasma systems of charged particles. As
an application, we performed simulations of an ultracold neutral plasma with a
half million particles and a half million time steps. For the long temporal
trajectories of relaxation between heavy ions and light electrons, large
configurations of ultracold plasmas can now be investigated, which was not
possible in past studies
TensorFlow Doing HPC
TensorFlow is a popular emerging open-source programming framework supporting
the execution of distributed applications on heterogeneous hardware. While
TensorFlow has been initially designed for developing Machine Learning (ML)
applications, in fact TensorFlow aims at supporting the development of a much
broader range of application kinds that are outside the ML domain and can
possibly include HPC applications. However, very few experiments have been
conducted to evaluate TensorFlow performance when running HPC workloads on
supercomputers. This work addresses this lack by designing four traditional HPC
benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG)
solver and Fast Fourier Transform (FFT). We analyze their performance on two
supercomputers with accelerators and evaluate the potential of TensorFlow for
developing HPC applications. Our tests show that TensorFlow can fully take
advantage of high performance networks and accelerators on supercomputers.
Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical
communication bandwidth on our testing platform. We find an approximately 2x,
1.7x and 1.8x performance improvement when increasing the number of GPUs from
two to four in the matrix-matrix multiply, CG and FFT applications
respectively. All our performance results demonstrate that TensorFlow has high
potential of emerging also as HPC programming framework for heterogeneous
supercomputers.Comment: Accepted for publication at The Ninth International Workshop on
Accelerators and Hybrid Exascale Systems (AsHES'19
An Evaluation of the X10 Programming Language
As predicted by Moore\u27s law, the number of transistors on a chip has been doubled approximately every two years. As miraculous as it sounds, for many years, the extra transistors have massively benefited the whole computer industry, by using the extra transistors to increase CPU clock speed, thus boosting performance. However, due to heat wall and power constraints, the clock speed cannot be increased limitlessly. Hardware vendors now have to take another path other than increasing clock speed, which is to utilize the transistors to increase the number of processor cores on each chip. This hardware structural change presents inevitable challenges to software structure, where single thread targeted software will not benefit from newer chips or may even suffer from lower clock speed. The two fundamental challenges are: 1. How to deal with the stagnation of single core clock speed and cache memory. 2. How to utilize the additional power generated from more cores on a chip. Most software programming languages nowadays have distributed computing support, such as C and Java [1]. Meanwhile, some new programming languages were invented from scratch just to take advantage of the more distributed hardware structures. The X10 Programming Language is one of them. The goal of this project is to evaluate X10 in terms of performance, programmability and tool support
A strategy for reducing turnaround time in design optimization using a distributed computer system
There is a need to explore methods for reducing lengthly computer turnaround or clock time associated with engineering design problems. Different strategies can be employed to reduce this turnaround time. One strategy is to run validated analysis software on a network of existing smaller computers so that portions of the computation can be done in parallel. This paper focuses on the implementation of this method using two types of problems. The first type is a traditional structural design optimization problem, which is characterized by a simple data flow and a complicated analysis. The second type of problem uses an existing computer program designed to study multilevel optimization techniques. This problem is characterized by complicated data flow and a simple analysis. The paper shows that distributed computing can be a viable means for reducing computational turnaround time for engineering design problems that lend themselves to decomposition. Parallel computing can be accomplished with a minimal cost in terms of hardware and software
Feedback and time are essential for the optimal control of computing systems
The performance, reliability, cost, size and energy usage of computing systems can be improved by one or more orders of magnitude by the systematic use of modern control and optimization methods. Computing systems rely on the use of feedback algorithms to schedule tasks, data and resources, but the models that are used to design these algorithms are validated using open-loop metrics. By using closed-loop metrics instead, such as the gap metric developed in the control community, it should be possible to develop improved scheduling algorithms and computing systems that have not been over-engineered. Furthermore, scheduling problems are most naturally formulated as constraint satisfaction or mathematical optimization problems, but these are seldom implemented using state of the art numerical methods, nor do they explicitly take into account the fact that the scheduling problem itself takes time to solve. This paper makes the case that recent results in real-time model predictive control, where optimization problems are solved in order to control a process that evolves in time, are likely to form the basis of scheduling algorithms of the future. We therefore outline some of the research problems and opportunities that could arise by explicitly considering feedback and time when designing optimal scheduling algorithms for computing systems
- …