9 research outputs found

    A Parallel, Multithreaded Decision Tree Builder

    No full text
    Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors. This research is supported by ARPA Contract No. DABT63-96-C-0071. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyri..

    Scheduling Threads for Low Space Requirement and Good Locality

    No full text
    The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 +O(K \Delta p \Delta D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its p..

    Pthreads for Dynamic Parallelism

    No full text
    Expressing a large number of lightweight, parallel threads in a shared address space significantly eases the task of writing a parallel program. Threads can be dynamically created to execute individual parallel tasks; the implementation schedules these threads onto the processors and effectively balances the load. However, unless the threads scheduler is designed carefully, such a parallel program may suffer poor space and time performance. In this paper, we evaluate the performance of a native, lightweight POSIX threads (Pthreads) library on a shared memory machine using a set of parallel benchmarks that dynamically create a large number of threads. By studying the performance of one of the benchmarks, matrix multiply, we show how simple, yet provably good modifications to the library can result in significantly improved space and time performance. With the modified Pthreads library, each of the parallel benchmarks performs as well as its coarse-grained, hand-partitioned counterpart. ..

    A framework for space and time efficient scheduling of parallelism

    No full text
    Many of today’s high level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. In this paper, we first present a general framework to model non-preemptive parallel computations based on task graphs, in which schedules of the graphs represent executions of the computations. We then prove bounds on the space and time requirements of certain classes of schedules that can be generated by an offline scheduler. Next, we present an online scheduling algorithm that is provably space-efficient and time-efficient for multithreaded computations with nested parallelism. If a serial execution requires 1 units of memory for a computation of depth and work, our algorithm results in an execution on processors that requires 1 log units of memory, and log time, including scheduling overheads. Finally, we demonstrate that our scheduling algorithm is efficient in practice. We have implemented a runtime system that uses our algorithm to schedule parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance

    Space-Efficient Scheduling of Nested Parallelism

    No full text
    This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-o# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performanc

    Space-efficient scheduling of parallelism with synchronization variables

    No full text
    Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include Ianguages with futures or user-specified synchronize ation const mints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such as ID. The main result is an ordine scheduling algorithm which, given a computation with w work (total operations), u synchronizations, a’depth (critical path) and SI sequential space, WiIl run in O(w/P +
    corecore