41,486 research outputs found
Evaluating Asymmetric Multicore Systems-on-Chip using Iso-Metrics
The end of Dennard scaling has pushed power consumption into a first order
concern for current systems, on par with performance. As a result,
near-threshold voltage computing (NTVC) has been proposed as a potential means
to tackle the limited cooling capacity of CMOS technology. Hardware operating
in NTV consumes significantly less power, at the cost of lower frequency, and
thus reduced performance, as well as increased error rates. In this paper, we
investigate if a low-power systems-on-chip, consisting of ARM's asymmetric
big.LITTLE technology, can be an alternative to conventional high performance
multicore processors in terms of power/energy in an unreliable scenario. For
our study, we use the Conjugate Gradient solver, an algorithm representative of
the computations performed by a large range of scientific and engineering
codes.Comment: Presented at HiPEAC EEHCO '15, 6 page
ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads
ARM processors have dominated the mobile device market in the last decade due
to their favorable computing to energy ratio. In this age of Cloud data centers
and Big Data analytics, the focus is increasingly on power efficient
processing, rather than just high throughput computing. ARM's first commodity
server-grade processor is the recent AMD A1100-series processor, based on a
64-bit ARM Cortex A57 architecture. In this paper, we study the performance and
energy efficiency of a server based on this ARM64 CPU, relative to a comparable
server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads.
Specifically, we study these for Intel's HiBench suite of web, query and
machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed
setup, for data sizes up to files, web pages and tuples. Our
results show that the ARM64 server's runtime performance is comparable to the
x64 server for integer-based workloads like Sort and Hive queries, and only
lags behind for floating-point intensive benchmarks like PageRank, when they do
not exploit data parallelism adequately. We also see that the ARM64 server
takes the energy, and has an Energy Delay Product (EDP) that
is lower than the x64 server. These results hold promise for ARM64
data centers hosting Big Data workloads to reduce their operational costs,
while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE
International Conference on High Performance Computing, Data, and Analytics
(HiPC), 201
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
Asymmetric multicore processors (AMPs) have recently emerged as an appealing
technology for severely energy-constrained environments, especially in mobile
appliances where heterogeneity in applications is mainstream. In addition,
given the growing interest for low-power high performance computing, this type
of architectures is also being investigated as a means to improve the
throughput-per-Watt of complex scientific applications.
In this paper, we design and embed several architecture-aware optimizations
into a multi-threaded general matrix multiplication (gemm), a key operation of
the BLAS, in order to obtain a high performance implementation for ARM
big.LITTLE AMPs. Our solution is based on the reference implementation of gemm
in the BLIS library, and integrates a cache-aware configuration as well as
asymmetric--static and dynamic scheduling strategies that carefully tune and
distribute the operation's micro-kernels among the big and LITTLE cores of the
target processor. The experimental results on a Samsung Exynos 5422, a
system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the
big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric
scheduling attain important gains in performance with respect to its
architecture-oblivious counterparts while exploiting all the resources of the
AMP to deliver considerable energy efficiency
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
Evaluation of Compton scattering sequence reconstruction algorithms for a portable position sensitive radioactivity detector based on pixelated Cd(Zn)Te crystals
We present extensive simulation studies on the performance of algorithms for
the Compton sequence reconstruction used for the development of a portable
spectroscopic instrument (COCAE), with the capability to localize and identify
radioactive sources, by exploiting the Compton scattering imaging. Various
Compton Sequence reconstruction algorithms have been compared using a large
number of simulated events. These algorithms are based on Compton kinematics,
as well as on statistical test criteria that exploit the redundant information
of events having two or more photon interactions in the active detector's
volume. The efficiency of the best performing technique is estimated for a wide
range of incident gamma-ray photons emitted from point-like gamma sources.Comment: 16 pages, 17 figure
- …