2,760 research outputs found

    Breaking Instance-Independent Symmetries In Exact Graph Coloring

    Full text link
    Code optimization and high level synthesis can be posed as constraint satisfaction and optimization problems, such as graph coloring used in register allocation. Graph coloring is also used to model more traditional CSPs relevant to AI, such as planning, time-tabling and scheduling. Provably optimal solutions may be desirable for commercial and defense applications. Additionally, for applications such as register allocation and code optimization, naturally-occurring instances of graph coloring are often small and can be solved optimally. A recent wave of improvements in algorithms for Boolean satisfiability (SAT) and 0-1 Integer Linear Programming (ILP) suggests generic problem-reduction methods, rather than problem-specific heuristics, because (1) heuristics may be upset by new constraints, (2) heuristics tend to ignore structure, and (3) many relevant problems are provably inapproximable. Problem reductions often lead to highly symmetric SAT instances, and symmetries are known to slow down SAT solvers. In this work, we compare several avenues for symmetry breaking, in particular when certain kinds of symmetry are present in all generated instances. Our focus on reducing CSPs to SAT allows us to leverage recent dramatic improvement in SAT solvers and automatically benefit from future progress. We can use a variety of black-box SAT solvers without modifying their source code because our symmetry-breaking techniques are static, i.e., we detect symmetries and add symmetry breaking predicates (SBPs) during pre-processing. An important result of our work is that among the types of instance-independent SBPs we studied and their combinations, the simplest and least complete constructions are the most effective. Our experiments also clearly indicate that instance-independent symmetries should mostly be processed together with instance-specific symmetries rather than at the specification level, contrary to what has been suggested in the literature

    Hierarchical Parallelization of Gene Differential Association Analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today.</p> <p>Results</p> <p>Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from <url>http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm.</url></p> <p>Conclusions</p> <p>The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.</p

    Doing-it-All with Bounded Work and Communication

    Get PDF
    We consider the Do-All problem, where pp cooperating processors need to complete tt similar and independent tasks in an adversarial setting. Here we deal with a synchronous message passing system with processors that are subject to crash failures. Efficiency of algorithms in this setting is measured in terms of work complexity (also known as total available processor steps) and communication complexity (total number of point-to-point messages). When work and communication are considered to be comparable resources, then the overall efficiency is meaningfully expressed in terms of effort defined as work + communication. We develop and analyze a constructive algorithm that has work O(t+plogp(plogp+tlogt))O( t + p \log p\, (\sqrt{p\log p}+\sqrt{t\log t}\, ) ) and a nonconstructive algorithm that has work O(t+plog2p)O(t +p \log^2 p). The latter result is close to the lower bound Ω(t+plogp/loglogp)\Omega(t + p \log p/ \log \log p) on work. The effort of each of these algorithms is proportional to its work when the number of crashes is bounded above by cpc\,p, for some positive constant c<1c < 1. We also present a nonconstructive algorithm that has effort O(t+p1.77)O(t + p ^{1.77})

    Static Scheduling Strategies for Heterogeneous Systems

    Get PDF
    In this paper, we consider static scheduling techniques for heterogeneous systems, such as clusters and grids. We successively deal with minimum makespan scheduling, divisible load scheduling and steady-state scheduling. Finally, we discuss the limitations of static scheduling approaches

    Least space-time first scheduling algorithm : scheduling complex tasks with hard deadline on parallel machines

    Get PDF
    Both time constraints and logical correctness are essential to real-time systems and failure to specify and observe a time constraint may result in disaster. Two orthogonal issues arise in the design and analysis of real-time systems: one is the specification of the system, and the semantic model describing the properties of real-time programs; the other is the scheduling and allocation of resources that may be shared by real-time program modules. The problem of scheduling tasks with precedence and timing constraints onto a set of processors in a way that minimizes maximum tardiness is here considered. A new scheduling heuristic, Least Space Time First (LSTF), is proposed for this NP-Complete problem. Basic properties of LSTF are explored; for example, it is shown that (1) LSTF dominates Earliest-Deadline-First (EDF) for scheduling a set of tasks on a single processor (i.e., if a set of tasks are schedulable under EDF, they are also schedulable under LSTF); and (2) LSTF is more effective than EDF for scheduling a set of independent simple tasks on multiple processors. Within an idealized framework, theoretical bounds on maximum tardiness for scheduling algorithms in general, and tighter bounds for LSTF in particular, are proven for worst case behavior. Furthermore, simulation benchmarks are developed, comparing the performance of LSTF with other scheduling disciplines for average case behavior. Several techniques are introduced to integrate overhead (for example, scheduler and context switch) and more realistic assumptions (such as inter-processor communication cost) in various execution models. A workload generator and symbolic simulator have been implemented for comparing the performance of LSTF (and a variant -- LSTF+) with that of several standard scheduling algorithms. LSTF\u27s execution model, basic theories, and overhead considerations have been defined and developed. Based upon the evidence, it is proposed that LSTF is a good and practical scheduling algorithm for building predictable, analyzable, and reliable complex real-time systems. There remain some open issues to be explored, such as relaxing some current restrictions, discovering more properties and theorems of LSTF under different models, etc. We strongly believe that LSTF can be a practical scheduling algorithm in the near future
    corecore