5,644 research outputs found
Lessons Learned from a Decade of Providing Interactive, On-Demand High Performance Computing to Scientists and Engineers
For decades, the use of HPC systems was limited to those in the physical
sciences who had mastered their domain in conjunction with a deep understanding
of HPC architectures and algorithms. During these same decades, consumer
computing device advances produced tablets and smartphones that allow millions
of children to interactively develop and share code projects across the globe.
As the HPC community faces the challenges associated with guiding researchers
from disciplines using high productivity interactive tools to effective use of
HPC systems, it seems appropriate to revisit the assumptions surrounding the
necessary skills required for access to large computational systems. For over a
decade, MIT Lincoln Laboratory has been supporting interactive, on-demand high
performance computing by seamlessly integrating familiar high productivity
tools to provide users with an increased number of design turns, rapid
prototyping capability, and faster time to insight. In this paper, we discuss
the lessons learned while supporting interactive, on-demand high performance
computing from the perspectives of the users and the team supporting the users
and the system. Building on these lessons, we present an overview of current
needs and the technical solutions we are building to lower the barrier to entry
for new users from the humanities, social, and biological sciences.Comment: 15 pages, 3 figures, First Workshop on Interactive High Performance
Computing (WIHPC) 2018 held in conjunction with ISC High Performance 2018 in
Frankfurt, German
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing
We introduce process-oriented programming as a natural extension of
object-oriented programming for parallel computing. It is based on the
observation that every class of an object-oriented language can be instantiated
as a process, accessible via a remote pointer. The introduction of process
pointers requires no syntax extension, identifies processes with programming
objects, and enables processes to exchange information simply by executing
remote methods. Process-oriented programming is a high-level language
alternative to multithreading, MPI and many other languages, environments and
tools currently used for parallel computations. It implements natural
object-based parallelism using only minimal syntax extension of existing
languages, such as C++ and Python, and has therefore the potential to lead to
widespread adoption of parallel programming. We implemented a prototype system
for running processes using C++ with MPI and used it to compute a large
three-dimensional Fourier transform on a computer cluster built of commodity
hardware components. Three-dimensional Fourier transform is a prototype of a
data-intensive application with a complex data-access pattern. The
process-oriented code is only a few hundred lines long, and attains very high
data throughput by achieving massive parallelism and maximizing hardware
utilization.Comment: 20 pages, 1 figur
Maestro: a remote execution tool for visualization clusters
In recent years immersive visualization systems have transitioned from running on large shared memory systems to clusters of commodity PCs. While there has been much research done to create middleware to manage application synchronization, there has been very little work done to allow easy execution of immersive applications on a cluster. Although there are existing remote execution tools, they are targeted at high performance computing (HPC) and enterprise administration. This thesis presents Maestro, a cross-platform remote execution tool designed specifically for visualization clusters. The goals of Maestro, a description of its design, and a detailed discussion of its implementation are provided. The design description gives explanations of the three components of Maestro: the core that handles networking and security, the user interface that controls the cluster, and the daemon on each cluster node. Maestro has been successfully deployed on numerous large visualization clusters, which have led to refinements and improvements to the tool
Big Data solutions for law enforcement
Big Data, the data too large and complex for most current information infrastructure to store and analyze, has changed every sector in government and industry.
Today’s sensors and devices produce an overwhelming amount of information that is often unstructured, and solutions developed to handle Big Data now allowing us to track more information and run more complex analytics to gain a level of insight once thought impossible.
The dominant Big Data solution is the Apache Hadoop ecosystem which provides an open source platform for reliable, scalable, distributed computing on commodity hardware. Hadoop has exploded in the private sector and is the back end to many of the leading Web 2.0 companies and services. Hadoop also has a growing footprint in government, with numerous Hadoop clusters run by the Departments of Defense and Energy, as well as smaller deployments by other agencies.
One sector currently exploring Hadoop is law enforcement. Big Data analysis has already been highly effective in law enforcement and can make police departments more effective, accountable, efficient, and proactive. As Hadoop continues to spread through law enforcement agencies, it has the potential to permanently change the way policing is practiced and administered
Experiences with porting and modelling wavefront algorithms on many-core architectures
We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA).
This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC).
In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale
The OMII Software – Demonstrations and Comparisons between two different deployments for Client-Server Distributed Systems
This paper describes the key elements of the OMII software and the scenarios which OMII software can be deployed to achieve distributed computing in the UK e-Science Community, where two different deployments for Client-Server distributed systems are demonstrated. Scenarios and experiments for each deployment have been described, with its advantages and disadvantages compared and analyzed. We conclude that our first deployment is more relevant for system administrators or developers, and the second deployment is more suitable for users’ perspective which they can send and check job status for hundred job submissions
Parallel detrended fluctuation analysis for fast event detection on massive PMU data
("(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.")Phasor measurement units (PMUs) are being rapidly deployed in power grids due to their high sampling rates and synchronized measurements. The devices high data reporting rates present major computational challenges in the requirement to process potentially massive volumes of data, in addition to new issues surrounding data storage. Fast algorithms capable of processing massive volumes of data are now required in the field of power systems. This paper presents a novel parallel detrended fluctuation analysis (PDFA) approach for fast event detection on massive volumes of PMU data, taking advantage of a cluster computing platform. The PDFA algorithm is evaluated using data from installed PMUs on the transmission system of Great Britain from the aspects of speedup, scalability, and accuracy. The speedup of the PDFA in computation is initially analyzed through Amdahl's Law. A revision to the law is then proposed, suggesting enhancements to its capability to analyze the performance gain in computation when parallelizing data intensive applications in a cluster computing environment
- …