31,344 research outputs found
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Performance Analysis and Evaluation of Dynamic Loop Scheduling Techniques in a Competitive Runtime Environment for Distributed Memory Architectures
Parallel computing offers immense potential to solve large, complex scientific problems. Load imbalance is a major impediment in obtaining high performance by a parallel system. One principal form of parallelism found in scientific applications is data parallelism. Loops without dependencies are data parallel. During the execution of large parallel loops, computational requirements vary due to problem, algorithmic and systemic characteristics. These factors lead to load imbalance which in turn degrades the performance of an application. Over the years, a number of dynamic loop scheduling techniques have been proposed to address one or more of these factors. However, there is no single strategy that works well for different problem domains and system characteristics. Moreover, load balancing during runtime is complicated because of its need for dynamic data redistribution. Therefore, there is a distinct need to integrate the dynamic loop scheduling techniques into a single package and provide them as an application programming interface (API) to the application developer. In recent years, along this direction, a number of dynamic loop scheduling techniques have been integrated into the compiler technologies for shared memory environments. On the other hand, there is no such integrated approach for distributed memory applications. The purpose of this thesis is to present the design, implementation and effectiveness of an integrated approach:the dynamic loop scheduling techniques are integrated into a runtime system for distributed memory architectures. For this purpose, we choose the newly developed parallel runtime environment for multicomputer architecture (PREMA) with its main components: the data movement and control substrate (DMCS) and mobile object layer (MOL). This runtime system has recently been developed and has demonstrated to be one of the most competitive runtime systems for distributed memory architectures. The significance of this work is that the proposed API will enhance the performance of parallel applications by reducing the load imbalance among processors caused by a wide range of factors and will reduce the software developmental cost required for load balancing. With the integration of the scheduling capabilities into the runtime system, its applicability has been expanded. The performance of the API has been evaluated qualitatively and quantitatively. The overhead of the API has been studied analytically and measured experimentally. Three parallel benchmarks including scientific applications of general interest (N-body simulations, automatic quadrature routine and unstructured grid heat solver) were considered for experimentation purpose. Based on the experiments conducted, a cost improvement of up to 76% over the straight forward parallel benchmark has been obtained. For certain application characteristics, the overhead of the runtime system was found to be within 10% of the underlying messaging layer. These results demonstrate that, in large scientific applications it is possible and desirable to combine the rich functionality of a runtime system with the advantages of scheduling techniques to achieve high performance
Mask aligner for ultrahigh vacuum with capacitive distance control
We present a mask aligner driven by three piezo motors which guides and
aligns a SiN shadow mask under capacitive control towards a sample surface. The
three capacitors for read out are located at the backside of the thin mask such
that the mask can be placed in m distance from the sample surface, while
keeping it parallel to the surface. Samples and masks can be exchanged in-situ
and the mask can additionally be displaced parallel to the surface. We
demonstrate an edge sharpness of the deposited structures below 100 nm, which
is likely limited by the diffusion of the deposited Au on Si(111).Comment: 5 pages, 3 figure
Evolving Gene Regulatory Networks with Mobile DNA Mechanisms
This paper uses a recently presented abstract, tuneable Boolean regulatory
network model extended to consider aspects of mobile DNA, such as transposons.
The significant role of mobile DNA in the evolution of natural systems is
becoming increasingly clear. This paper shows how dynamically controlling
network node connectivity and function via transposon-inspired mechanisms can
be selected for in computational intelligence tasks to give improved
performance. The designs of dynamical networks intended for implementation
within the slime mould Physarum polycephalum and for the distributed control of
a smart surface are considered.Comment: 7 pages, 8 figures. arXiv admin note: substantial text overlap with
arXiv:1303.722
When the path is never shortest: a reality check on shortest path biocomputation
Shortest path problems are a touchstone for evaluating the computing
performance and functional range of novel computing substrates. Much has been
published in recent years regarding the use of biocomputers to solve minimal
path problems such as route optimisation and labyrinth navigation, but their
outputs are typically difficult to reproduce and somewhat abstract in nature,
suggesting that both experimental design and analysis in the field require
standardising. This chapter details laboratory experimental data which probe
the path finding process in two single-celled protistic model organisms,
Physarum polycephalum and Paramecium caudatum, comprising a shortest path
problem and labyrinth navigation, respectively. The results presented
illustrate several of the key difficulties that are encountered in categorising
biological behaviours in the language of computing, including biological
variability, non-halting operations and adverse reactions to experimental
stimuli. It is concluded that neither organism examined are able to efficiently
or reproducibly solve shortest path problems in the specific experimental
conditions that were tested. Data presented are contextualised with biological
theory and design principles for maximising the usefulness of experimental
biocomputer prototypes.Comment: To appear in: Adamatzky, A (Ed.) Shortest path solvers. From software
to wetware. Springer, 201
Slime mould computes planar shapes
Computing a polygon defining a set of planar points is a classical problem of
modern computational geometry. In laboratory experiments we demonstrate that a
concave hull, a connected alpha-shape without holes, of a finite planar set is
approximated by slime mould Physarum polycephalum. We represent planar points
with sources of long-distance attractants and short-distance repellents and
inoculate a piece of plasmodium outside the data set. The plasmodium moves
towards the data and envelops it by pronounced protoplasmic tubes
- …