100,175 research outputs found
Memory and Parallelism Analysis Using a Platform-Independent Approach
Emerging computing architectures such as near-memory computing (NMC) promise
improved performance for applications by reducing the data movement between CPU
and memory. However, detecting such applications is not a trivial task. In this
ongoing work, we extend the state-of-the-art platform-independent software
analysis tool with NMC related metrics such as memory entropy, spatial
locality, data-level, and basic-block-level parallelism. These metrics help to
identify the applications more suitable for NMC architectures.Comment: 22nd ACM International Workshop on Software and Compilers for
Embedded Systems (SCOPES '19), May 201
Railroads and the Making of Modern America -- Tools for Spatio-Temporal Correlation, Analysis, and Visualization
This project aims to integrate large-scale data sources from the Digging into Data repositories with other types of relevant data on the railroad system, already assembled by the project directors. Our project seeks to develop useful tools for spatio-temporal visualization of these data and the relationships among them. Our interdisciplinary team includes computer science, history, and geography researchers. Because the railroad "system" and its spatio-temporal configuration appeared differently from locality-to-locality and region-to-region, we need to adjust how we "locate" and "see" the system. By applying data mining and pattern recognition techniques, software systems can be created that dynamically redefine the way spatial data are represented. Utilizing processes common to analysis in Computer Science, we propose to develop a software framework that allows these embedded concepts to be visualized and further studied
Performance Aspects of Sparse Matrix-Vector Multiplication
Sparse matrix-vector multiplication (shortly SpM×V) is an important building block in algorithms solving sparse systems of linear equations, e.g., FEM. Due to matrix sparsity, the memory access patterns are irregular and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM×V are based on matrix reordering and register blocking [1, 2], sometimes combined with software-pipelining [3]. Due to its overhead, register blocking achieves good speedups only for a large number of executions of SpM×V with the same matrix A.We have investigated the impact of two simple SW transformation techniques (software-pipelining and loop unrolling) on the performance of SpM×V, and have compared it with several implementation modifications aimed at reducing computational and memory complexity and improving the spatial locality. We investigate performance gains of these modifications on four CPU platforms
EVALUATING THE IMPACT OF MEMORY SYSTEM PERFORMANCE ON SOFTWARE PREFETCHING AND LOCALITY OPTIMIZATIONS
Software prefetching and locality optimizations are two techniques for
overcoming the speed gap between processor and memory known as the
memory wall as suggested by Wulf and Mckee. This
thesis evaluates the impact of memory trends on the effectiveness
of software prefetching and locality optimizations for three types
of applications: regular scientific codes, irregular scientific
codes, and pointer-chasing codes. For many applications, software
prefetching outperforms locality optimizations when there is
sufficient bandwidth in the underlying memory system, but locality
optimizations outperform software prefetching when the underlying
memory system doesn't provide sufficient bandwidth. The break-even
point, or equivalently the crossover bandwidth point, occurs at
roughly 2.4 GBytes/sec , for 1 GHz processors on today's memory systems, and will increase on future memory systems. This thesis
also studies the interactions between software prefetching and
locality optimizations when applied in concert. Naively combining
the two techniques provides a more robust application performance
in the face of variations in memory bandwidth and/or latency, but
does not yield additional performance gains. In other words, the
performance won't be better than the best performance of the two
techniques alone. Also, several algorithms are proposed and
evaluated to better combine software prefetching and locality
optimizations, including an enhanced tiling algorithm, padding for software prefetching, and index prefetching.
(Also UMIACS-TR-2002-72
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
- …