442 research outputs found
A simple deterministic algorithm for guaranteeing the forward progress of transactions
This paper describes a remarkably simple deterministic (not probabilistic) contention-management algorithm for guaranteeing the forward progress of transactions - avoiding deadlocks, livelocks, and other anomalies. The transactions must be finite (no infinite loops), but on each restart, a transaction may access different shared-memory locations. The algorithm supports irrevocable transactions as long as the transaction satisfies a simple ordering constraint. In particular, a transaction that accesses only one shared-memory location is never aborted. The algorithm is suitable for both hardware and software transactional-memory systems. It also can be used in some contexts as a locking protocol for implementing transactions "by hand."
Randomized Routing on Fat-Trees
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor lambda on a fat-tree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(lambda + lg n lg lg n) with probability 1-O(1/n). The best previous bound was O(lambda lg n) for the offline problem in which the set of messages is known in advance. In the context of a VLSI model that equates hardware cost with physical volume, the routing algorithm can be used to demonstrate that fat-trees are universal routing networks. Specifically, we prove that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost
Transactions Everywhere
Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task. Atomicity is typically enforced through locking protocols, but these protocols can introduce other complications, such as deadlock, unless restrictive methodologies in their use are adopted. We have recently begun a research project focusing on transactional memory [18] as an alternative mechanism for enforcing atomicity, since it allows the user to avoid many of the complications inherent in locking protocols. Rather than viewing transactions as infrequent occurrences in a program, as has generally been done in the past, we have adopted the point of view that all user code should execute in the context of some transaction. To make this viewpoint viable requires the development of two key technologies: effective hardware support for scalable transactional memory, and linguistic and compiler support. This paper describes our preliminary research results on making “transactions everywhere” a practical reality.Singapore-MIT Alliance (SMA
Data-race detection in transactions-everywhere parallel programming
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 69-72).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis studies how to perform dynamic data-race detection in programs using "transactions everywhere", a new methodology for shared-memory parallel programming. Since the conventional definition of a data race does not make sense in the transactions-everywhere methodology, this thesis develops a new definition based on a weak assumption about the correctness of the target program's parallel-control flow, which is made in the same spirit as the assumption underlying the conventional definition. This thesis proves, via a reduction from the problem of 3cnf-formula satisfiability, that data-race detection in the transactions-everywhere methodology is an NP-complete problem. In view of this result, it presents an algorithm that approximately detects data races. The algorithm never reports false negatives. When a possible data race is detected, the algorithm outputs simple information that allows the programmer to efficiently resolve the root of the problem. The algorithm requires running time that is worst-case quadratic in the size of a graph representing all the scheduling constraints in the target program.by Kai Huang.M.Eng
Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers
AbstractWhen a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called “out-of-core” algorithm, which moves data between primary and secondary memories, must be used. In this paper, we study out-of-core algorithms for sparse linear relaxation problems in which each iteration of the algorithm updates the state of every vertex in a graph with a linear combination of the states of its neighbors. We give a general method that can save substantially on the I/O traffic for many problems. For example, our technique allows a computer withMwords of primary memory to performT=Ω(M1/5) cycles of a multigrid algorithm for a two-dimensional elliptic solver over an n-point domain using onlyΘ(nT/M1/5) I/O transfers, as compared with the naive algorithm which requiresΩ(nT) I/O's. Our method depends on the existence of a “blocking” cover of the graph that underlies the linear relaxation. A blocking cover has the property that the subgraphs forming the cover have large diameters once a small number of vertices have been removed. The key idea in our method is to introduce a variable for each removed vertex for each time step of the algorithm. We maintain linear dependences among the removed vertices, thereby allowing each subgraph to be iteratively relaxed without external communication. We give a general theorem relating blocking covers to I/O-efficient relaxation schemes. We also give an automatic method for finding blocking covers for certain classes of graphs, including planar graphs andd-dimensional simplicial graphs with constant aspect ratio (i.e., graphs that arise from dividingd-space into “well-shaped” polyhedra). As a result, we can performTiterations of linear relaxation on anyn-vertex planar graph using onlyΘ(n+nTlgn/M1/4) I/O's or on anyn-noded-dimensional simplicial graph with constant aspect ratio using onlyΘ(n+nTlgn/MΩ(1/d)) I/O's
Provably Efficient Adaptive Scheduling for Parallel Jobs
Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level of fairness to user jobs. Various degrees of successes have been achieved over the years. However, few existing schemes address both efficiency and fairness over a wide range of work loads. Moreover, in order to obtain analytical results, most of them require prior information about jobs, which may be difficult to obtain in real applications.
This paper presents two novel adaptive scheduling algorithms -- GRAD for centralized scheduling, and WRAD for distributed scheduling. Both GRAD and WRAD ensure fair allocation under all levels of workload, and they offer provable efficiency without requiring prior information of job's parallelism. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. To the best of our knowledge, they are the first non-clairvoyant scheduling algorithms that offer such guarantees. We also believe that our new approach of resource request-allotment protocol deserves further exploration.
Specifically, both GRAD and WRAD are O(1)-competitive with respect to mean response time for batched jobs, and O(1)-competitive with respect to makespan for non-batched jobs with arbitrary release times. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average and it never exceeds 4.5 times. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average, and it never exceeds 5.5 times.Singapore-MIT Alliance (SMA
Charge structure in volcanic plumes: a comparison of plume properties predicted by an integral plume model to observations of volcanic lightning during the 2010 eruption of Eyjafjallajökull, Iceland
Cancer is a heterogeneous disease with different combinations of genetic alterations driving its development in different individuals. We introduce CoMEt, an algorithm to identify combinations of alterations that exhibit a pattern of mutual exclusivity across individuals, often observed for alterations in the same pathway. CoMEt includes an exact statistical test for mutual exclusivity and techniques to perform simultaneous analysis of multiple sets of mutually exclusive and subtype-specific alterations. We demonstrate that CoMEt outperforms existing approaches on simulated and real data. We apply CoMEt to five different cancer types, identifying both known cancer genes and pathways, and novel putative cancer genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0700-7) contains supplementary material, which is available to authorized users
The Cilkview scalability analyzer
The Cilkview scalability analyzer is a software tool for profiling, estimating scalability, and benchmarking multithreaded Cilk++ ap-plications. Cilkview monitors logical parallelism during an instru-mented execution of the Cilk++ application on a single process-ing core. As Cilkview executes, it analyzes logical dependencies within the computation to determine its work and span (critical-path length). These metrics allow Cilkview to estimate parallelism and predict how the application will scale with the number of pro-cessing cores. In addition, Cilkview analyzes scheduling overhead using the concept of a “burdened dag, ” which allows it to diagnose performance problems in the application due to an insufficient grain size of parallel subcomputations. Cilkview employs the Pin dynamic-instrumentation framework to collect metrics during a serial execution of the application code. It operates directly on the optimized code rather than on a debug version. Metadata embedded by the Cilk++ compiler in the binary executable identifies the parallel control constructs in the executing application. This approach introduces little or no overhead to the program binary in normal runs. Cilkview can perform real-time scalability benchmarking auto-matically, producing gnuplot-compatible output that allows devel-opers to compare an application’s performance with the tool’s pre-dictions. If the program performs beneath the range of expectation, the programmer can be confident in seeking a cause such as insuf-ficient memory bandwidth, false sharing, or contention, rather than inadequate parallelism or insufficient grain size
A Media Player for Use in Distance Education
We have developed a media player for use in distance education. The player can incorporate several time-indexed sources, including video, audio, PowerPoint, and text index. We have converted all the SMA 5503 Introduction to Algorithms lecture videos for use with the player. Student response to the player has been excellent.
The media player's graphical user interface (see Fig. 1) permits a student viewer to navigate a video by the text index. The text index can be produced either automatically by a PowerPoint plug-in developed by David Mycue of MIT's Academic Media Production Systems, or manually by a teaching assistant using an ordinary text editor. The student can flip easily through PowerPoint slides, invoking the video only when needed, rather than being forced to watch video presentation of material the student already understands.
The player also allows playback to be accelerated up to twice real-time speed without changing the speaker's tone, thereby allowing the student to watch a video in much less time that would otherwise be possible. A single-click button allows the student to skip backward in the video a few seconds, easing the replay of critical material. Another single-click button allows the user to skip forward.
Although none of the features in this media player are unique or original, their combination provides a platform for lecture viewing, superior to the one currently being used by SMA. Ongoing work on the player includes adding more features to the player, such as allowing students to add multiple streams of other information (e.g., notes, snapshots of the lecture, etc.) which will be synchronized with the video in the same manner as the text indices and the slides in the current version.Singapore-MIT Alliance (SMA
- …