610 research outputs found
The "MIND" Scalable PIM Architecture
MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a
Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on
each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND
architecture
GPUs as Storage System Accelerators
Massively multicore processors, such as Graphics Processing Units (GPUs),
provide, at a comparable price, a one order of magnitude higher peak
performance than traditional CPUs. This drop in the cost of computation, as any
order-of-magnitude drop in the cost per unit of performance for a class of
system components, triggers the opportunity to redesign systems and to explore
new ways to engineer them to recalibrate the cost-to-performance relation. This
project explores the feasibility of harnessing GPUs' computational power to
improve the performance, reliability, or security of distributed storage
systems. In this context, we present the design of a storage system prototype
that uses GPU offloading to accelerate a number of computationally intensive
primitives based on hashing, and introduce techniques to efficiently leverage
the processing power of GPUs. We evaluate the performance of this prototype
under two configurations: as a content addressable storage system that
facilitates online similarity detection between successive versions of the same
file and as a traditional system that uses hashing to preserve data integrity.
Further, we evaluate the impact of offloading to the GPU on competing
applications' performance. Our results show that this technique can bring
tangible performance gains without negatively impacting the performance of
concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201
CARISMA: a context-sensitive approach to race-condition sample-instance selection for multithreaded applications
Dynamic race detectors can explore multiple thread schedules of a multithreaded program over the same input to detect data races. Although existing sampling-based precise race detectors reduce overheads effectively so that lightweight precise race detection can be performed in testing or post-deployment environments, they are ineffective in detecting races if the sampling rates are low. This paper presents CARISMA to address this problem. CARISMA exploits the insight that along an execution trace, a program may potentially handle many accesses to the memory locations created at the same site for similar purposes. Iterating over multiple execution trials of the same input, CARISMA estimates and distributes the sampling budgets among such location creation sites, and probabilistically collects a fraction of all accesses to the memory locations associated with such sites for subsequent race detection. Our experiment shows that, compared with PACER on the same platform and at the same sampling rate (such as 1%), CARISMA is significantly more effective. © 2012 ACM.postprin
A Context-Sensitive Coverage Criterion for Test Suite Reduction
Modern software is increasingly developed using multi-language implementations, large supporting libraries and frameworks, callbacks, virtual function calls, reflection, multithreading, and object- and aspect-oriented programming. The predominant example of such software is the graphical user interface (GUI), which is used as a front-end to most of today's software applications. The characteristics of GUIs and other modern software present new challenges to software testing. Because recently developed techniques for automated test case generation can generate more tests than are practical to regularly execute, one important challenge is test suite reduction. Test suite reduction seeks to decrease the size of a test suite without overly compromising its original fault detection ability. This research advances the state-of-the-art in test suite reduction by empirically studying a coverage criterion which considers the context in which program concepts are covered. Conventional approaches to test suite reduction were developed and evaluated on batch-style applications and, due to the aforementioned considerations, are not always easily applicable to modern software. Furthermore, many existing techniques fail to consider the context in which code executes inside an event-driven paradigm, where programs wait for and interactively respond to user- and system-generated events. Consequently, they yield reduced test suites with severely impaired fault detection ability. The novel feature of this research is a test suite reduction technique based on the call stack coverage criterion which addresses many of the challenges associated with coverage-based test suite reduction in modern applications. Results show that reducing test suites while maintaining call stack coverage yields good tradeoffs between size reduction and fault detection effectiveness compared to traditional techniques. The output of this research includes models, metrics, algorithms, and techniques based upon this approach
Rigorous concurrency analysis of multithreaded programs
technical reportThis paper explores the practicality of conducting program analysis for multithreaded software using constraint solv- ing. By precisely defining the underlying memory consis- tency rules in addition to the intra-thread program seman- tics, our approach orders a unique advantage for program ver- ification | it provides an accurate and exhaustive coverage of all thread interleavings for any given memory model. We demonstrate how this can be achieved by formalizing sequen- tial consistency for a source language that supports control branches and a monitor-style mutual exclusion mechanism. We then discuss how to formulate programmer expectations as constraints and propose three concrete applications of this approach: execution validation, race detection, and atom- icity analysis. Finally, we describe the implementation of a formal analysis tool using constraint logic programming, with promising initial results for reasoning about small but non-trivial concurrent programs
A Configurable Transport Layer for CAF
The message-driven nature of actors lays a foundation for developing scalable
and distributed software. While the actor itself has been thoroughly modeled,
the message passing layer lacks a common definition. Properties and guarantees
of message exchange often shift with implementations and contexts. This adds
complexity to the development process, limits portability, and removes
transparency from distributed actor systems.
In this work, we examine actor communication, focusing on the implementation
and runtime costs of reliable and ordered delivery. Both guarantees are often
based on TCP for remote messaging, which mixes network transport with the
semantics of messaging. However, the choice of transport may follow different
constraints and is often governed by deployment. As a first step towards
re-architecting actor-to-actor communication, we decouple the messaging
guarantees from the transport protocol. We validate our approach by redesigning
the network stack of the C++ Actor Framework (CAF) so that it allows to combine
an arbitrary transport protocol with additional functions for remote messaging.
An evaluation quantifies the cost of composability and the impact of individual
layers on the entire stack
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications
We present Pathway, a new unified data processing framework that can run
workloads on both bounded and unbounded data streams. The framework was created
with the original motivation of resolving challenges faced when analyzing and
processing data from the physical economy, including streams of data generated
by IoT and enterprise systems. These required rapid reaction while calling for
the application of advanced computation paradigms (machinelearning-powered
analytics, contextual analysis, and other elements of complex event
processing). Pathway is equipped with a Table API tailored for Python and
Python/SQL workflows, and is powered by a distributed incremental dataflow in
Rust. We describe the system and present benchmarking results which demonstrate
its capabilities in both batch and streaming contexts, where it is able to
surpass state-of-the-art industry frameworks in both scenarios. We also discuss
streaming use cases handled by Pathway which cannot be easily resolved with
state-of-the-art industry frameworks, such as streaming iterative graph
algorithms (PageRank, etc.)
- …