45 research outputs found

    PARMA-CC: Parallel Multiphase Approximate Cluster Combining

    Get PDF
    Clustering is a common component in data analysis applications. Despite the extensive literature, the continuously increasing volumes of data produced by sensors (e.g. rates of several MB/s by 3D scanners such as LIDAR sensors), and the time-sensitivity of the applications leveraging the clustering outcomes (e.g. detecting critical situations, that are known to be accuracy-dependent), demand for novel approaches that respond faster while coping with large data sets. The latter is the challenge we address in this paper. We propose an algorithm, PARMA-CC, that complements existing density-based and distance-based clustering methods. PARMA-CC is based on approximate, data parallel cluster combining, where parallel threads can compute summaries of clusters of data (sub)sets and, through combining, together construct a comprehensive summary of the sets of clusters. By approximating clusters with their respective geometrical summaries, our technique scales well with increased data volumes, and, by computing and efficiently combining the summaries in parallel, it enables latency improvements. PARMA-CC combines the summaries using special data structures that enable parallelism through in-place data processing. As we show in our analysis and evaluation, PARMA-CC can complement and outperform well-established methods, with significantly better scalability, while still providing highly accurate results in a variety of data sets, even with skewed data distributions, which cause the traditional approaches to exhibit their worst-case behaviour. In the paper we also describe how PARMA-CC can facilitate time-critical applications through appropriate use of the summaries

    Run-Time Dependence Testing by Integer Sequence Analysis

    No full text
    A simple run-time data dependence test is presented which is based on a new formulation of the dependence problem. This test makes it possible to discern independence in the case of a potential self-output dependence in a loop (a case where the GCD test is useless) and in certain potential anti- and flow-dependences. The test handles subscript expression forms which arise in linearized arrays, making it possible to handle coupled subscripts with ease and do dependence testing on multiple dimensions at once. The test is useful for arbitrarily deep loop nests, and even allows the testing of a group of dependences in one step. Keywords: parallelizing compilers, data dependence, integer sequences, linearization. 1 Introduction The parallelizing compilers of today still have not been generally successful in producing executable code for real programs which makes consistently good use of the expensive parallel hardware they compile for. This is apparently not because of any lack of parallel..

    Interprocedural Parallelization Using Memory Classification Analysis

    No full text
    140 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1998.This thesis outlines a way of addressing the goal of precise interprocedural analysis, based on a combination of techniques: for representing memory accesses within interprocedural. sections of code, for summarizing dependence information in program contexts, and for testing that dependence. The thesis presents a new technique for summarizing the memory access activity in an arbitrary section of code, called Memory Classification Analysis (MCA), using a precise form for representing memory access patterns, called the Access Region Descriptor (ARD). A new, simple dependence test, the Access Region Test (ART), is also described which uses the summary sets of ARDs produced by MCA. This test is capable of parallelizing loops containing non-affine subscript expressions, such as those found in FFT codes. A unified parallelization framework is described, which combines privatization, reduction and induction analysis. Array references using subscripting arrays, such as are found in sparse codes are precisely representable using ARDs, and can sometimes be parallelized using the parallelization framework. Parallelization conditions are generated at critical points in the analysis when dependence cannot be disproved. These can be used to drive on-demand deeper program analysis. Whatever conditions remain unproven can then be generated as code to be used for runtime dependence testing. Its precise memory access representation makes the ARD useful within algorithms for generating data movement messages.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD

    Intraprocedural Parallelization Using Memory Classification Analysis

    No full text
    This thesis outlines a way of addressing the goal of precise interprocedural analysis, based on a combination of techniques: for representing memory accesses within interprocedural sections of code, for summarizing dependence information in program contexts, and for testing that dependence. The thesis presents a new technique for summarizing the memory access activity in an arbitrary section of code, called Memory Classification Analysis (MCA), using a precise form for representing memory access patterns, called the Access Region Descriptor (ARD). A new, simple dependence test, the Access Region Test (ART), is also described which uses the summary sets of ARDs produced by MCA. This test is capable of parallelizing loops containing non-ane subscript expressions, such as those found in FFT codes. A uni ed parallelization framework is described, which combines privatization, reduction and induction analysis. Array references using subscripting arrays, such as are found in sparse codes are precisely representable using ARDs, and can sometimes be parallelized using the parallelization framework. Parallelization conditions are generated at critical points in the analysis when dependence cannot be disproved. These can be used to drive on-demand deeper program analysis. Whatever conditions remain unproven can then be generated as code to be used for runtime dependence testing. Its precise memory access representation makes the ARD useful within algorithms for generating data movement messages

    On the Automatic Parallelization of the Perfect Benchmarks

    No full text
    This paper presents the results of the Cedar Hand-Parallelization Experiment, conducted from 1989 through 1992 within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment we manually transformed the Perfect Benchmarks R fl into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique
    corecore