2,214 research outputs found
Energy Saving Techniques for Phase Change Memory (PCM)
In recent years, the energy consumption of computing systems has increased
and a large fraction of this energy is consumed in main memory. Towards this,
researchers have proposed use of non-volatile memory, such as phase change
memory (PCM), which has low read latency and power; and nearly zero leakage
power. However, the write latency and power of PCM are very high and this,
along with limited write endurance of PCM present significant challenges in
enabling wide-spread adoption of PCM. To address this, several
architecture-level techniques have been proposed. In this report, we review
several techniques to manage power consumption of PCM. We also classify these
techniques based on their characteristics to provide insights into them. The
aim of this work is encourage researchers to propose even better techniques for
improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM
Parallel Simulations for Analysing Portfolios of Catastrophic Event Risk
At the heart of the analytical pipeline of a modern quantitative
insurance/reinsurance company is a stochastic simulation technique for
portfolio risk analysis and pricing process referred to as Aggregate Analysis.
Support for the computation of risk measures including Probable Maximum Loss
(PML) and the Tail Value at Risk (TVAR) for a variety of types of complex
property catastrophe insurance contracts including Cat eXcess of Loss (XL), or
Per-Occurrence XL, and Aggregate XL, and contracts that combine these measures
is obtained in Aggregate Analysis.
In this paper, we explore parallel methods for aggregate risk analysis. A
parallel aggregate risk analysis algorithm and an engine based on the algorithm
is proposed. This engine is implemented in C and OpenMP for multi-core CPUs and
in C and CUDA for many-core GPUs. Performance analysis of the algorithm
indicates that GPUs offer an alternative HPC solution for aggregate risk
analysis that is cost effective. The optimised algorithm on the GPU performs a
1 million trial aggregate simulation with 1000 catastrophic events per trial on
a typical exposure set and contract structure in just over 20 seconds which is
approximately 15x times faster than the sequential counterpart. This can
sufficiently support the real-time pricing scenario in which an underwriter
analyses different contractual terms and pricing while discussing a deal with a
client over the phone.Comment: Proceedings of the Workshop at the International Conference for High
Performance Computing, Networking, Storage and Analysis (SC), 2012, 8 page
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
- …