120,692 research outputs found
A Virtual Tile Approach to Rastel-based Calculations of Large Digital Elevation Models in a Shared-Memory System
Grid digital elevation models (DEMs) are commonly used in hydrology to derive information related to topographically driven flow. Advances in technology for creating DEMs have increased their resolution and data size with the result that algorithms for processing them are frequently memory limited. This paper presents a new approach to the management of memory in the parallel solution of hydrologic terrain processing using a user-level virtual memory system for shared-memory multithreaded systems. The method includes tailored virtual memory management of raster-based calculations for datasets that are larger than available memory and a novel order-of-calculations approach to parallel hydrologic terrain analysis applications. The method is illustrated for the pit filling algorithm used first in most hydrologic terrain analysis workflows
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays
The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism
Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Processing Applications
A challenge facing network device designers, besides increasing the speed of network gear, is improving its programmability in order to simplify the implementation of new applications (see for example, active networks, content networking, etc). This paper presents our work on designing and implementing a virtual network processor, called NetVM, which has an instruction set optimized for packet processing applications, i.e., for handling network traffic. Similarly to a Java Virtual Machine that virtualizes a CPU, a NetVM virtualizes a network processor. The NetVM is expected to provide a compatibility layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various packet processing applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from expensive routers to small appliances (e.g. smart phones). Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks upon specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices
Scalable Range Locks for Scalable Address Spaces and Beyond
Range locks are a synchronization construct designed to provide concurrent
access to multiple threads (or processes) to disjoint parts of a shared
resource. Originally conceived in the file system context, range locks are
gaining increasing interest in the Linux kernel community seeking to alleviate
bottlenecks in the virtual memory management subsystem. The existing
implementation of range locks in the kernel, however, uses an internal spin
lock to protect the underlying tree structure that keeps track of acquired and
requested ranges. This spin lock becomes a point of contention on its own when
the range lock is frequently acquired. Furthermore, where and exactly how
specific (refined) ranges can be locked remains an open question.
In this paper, we make two independent, but related contributions. First, we
propose an alternative approach for building range locks based on linked lists.
The lists are easy to maintain in a lock-less fashion, and in fact, our range
locks do not use any internal locks in the common case. Second, we show how the
range of the lock can be refined in the mprotect operation through a
speculative mechanism. This refinement, in turn, allows concurrent execution of
mprotect operations on non-overlapping memory regions. We implement our new
algorithms and demonstrate their effectiveness in user-space and kernel-space,
achieving up to 9 speedup compared to the stock version of the Linux
kernel. Beyond the virtual memory management subsystem, we discuss other
applications of range locks in parallel software. As a concrete example, we
show how range locks can be used to facilitate the design of scalable
concurrent data structures, such as skip lists.Comment: 17 pages, 9 figures, Eurosys 202
- âŠ