10,634 research outputs found
Optimising Simulation Data Structures for the Xeon Phi
In this paper, we propose a lock-free architecture
to accelerate logic gate circuit simulation using SIMD multi-core
machines. We evaluate its performance on different test circuits
simulated on the Intel Xeon Phi and 2 other machines. Comparisons
are presented of this software/hardware combination with
reported performances of GPU and other multi-core simulation
platforms. Comparisons are also given between the lock free
architecture and a leading commercial simulator running on the
same Intel hardware
Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applicationsâ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude.
We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research
project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement
n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft
Optimising Simulation Data Structures for the Xeon Phi
In this paper, we propose a lock-free architecture
to accelerate logic gate circuit simulation using SIMD multi-core
machines. We evaluate its performance on different test circuits
simulated on the Intel Xeon Phi and 2 other machines. Comparisons
are presented of this software/hardware combination with
reported performances of GPU and other multi-core simulation
platforms. Comparisons are also given between the lock free
architecture and a leading commercial simulator running on the
same Intel hardware
Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1
Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified
Recommended from our members
Ray: A Distributed Execution Engine for the Machine Learning Ecosystem
In recent years, growing data volumes and more sophisticated computational procedures have greatly increased the demand for computational power. Machine learning and artificial intelligence applications, for example, are notorious for their computational requirements. At the same time, Moores law is ending and processor speeds are stalling. As a result, distributed computing has become ubiquitous. While the cloud makes distributed hardware infrastructure widely accessible and therefore offers the potential of horizontal scale, developing these distributed algorithms and applications remains surprisingly hard. This is due to the inherent complexity of concurrent algorithms, the engineering challenges that arise when communicating between many machines, the requirements like fault tolerance and straggler mitigation that arise at large scale and the lack of a general-purpose distributed execution engine that can support a wide variety of applications.In this thesis, we study the requirements for a general-purpose distributed computation model and present a solution that is easy to use yet expressive and resilient to faults. At its core our model takes familiar concepts from serial programming, namely functions and classes, and generalizes them to the distributed world, therefore unifying stateless and stateful distributed computation. This model not only supports many machine learning workloads like training or serving, but is also a good t for cross-cutting machine learning applications like reinforcement learning and data processing applications like streaming or graph processing. We implement this computational model as an open-source system called Ray, which matches or exceeds the performance of specialized systems in many application domains, while also offering horizontally scalability and strong fault tolerance properties
Intelligent systems in manufacturing: current developments and future prospects
Global competition and rapidly changing customer requirements are demanding increasing changes in manufacturing environments. Enterprises are required to constantly redesign their products and continuously reconfigure their manufacturing systems. Traditional approaches to manufacturing systems do not fully satisfy this new situation. Many authors have proposed that artificial intelligence will bring the flexibility and efficiency needed by manufacturing systems. This paper is a review of artificial intelligence techniques used in manufacturing systems. The paper first defines the components of a simplified intelligent manufacturing systems (IMS), the different Artificial Intelligence (AI) techniques to be considered and then shows how these AI techniques are used for the components of IMS
Simulating FRSN P Systems with Real Numbers in P-Lingua on sequential and CUDA platforms
Fuzzy Reasoning Spiking Neural P systems (FRSN P systems,
for short) is a variant of Spiking Neural P systems incorporating
fuzzy logic elements that make it suitable to model fuzzy diagnosis knowledge
and reasoning required for fault diagnosis applications. In this sense,
several FRSN P system variants have been proposed, dealing with real
numbers, trapezoidal numbers, weights, etc. The model incorporating
real numbers was the first introduced [13], presenting promising applications
in the field of fault diagnosis of electrical systems. For this variant,
a matrix-based algorithm was provided which, when executed on parallel
computing platforms, fully exploits the model maximally parallel
capacities. In this paper we introduce a P-Lingua framework extension
to parse and simulate FRSN P systems with real numbers. Two simulators,
implementing a variant of the original matrix-based simulation
algorithm, are provided: a sequential one (written in Java), intended to
run on traditional CPUs, and a parallel one, intended to run on CUDAenabled
devices.Ministerio de EconomĂa y Competitividad TIN2012-3743
- âŠ