2,395 research outputs found

    Multi-criteria scheduling of pipeline workflows

    Get PDF
    Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonist criteria should be optimized, such as throughput and latency (or a combination). In this paper, we study the complexity of the bi-criteria mapping problem for pipeline graphs on communication homogeneous platforms. In particular, we assess the complexity of the well-known chains-to-chains problem for different-speed processors, which turns out to be NP-hard. We provide several efficient polynomial bi-criteria heuristics, and their relative performance is evaluated through extensive simulations

    OBDD-Based Representation of Interval Graphs

    Full text link
    A graph G=(V,E)G = (V,E) can be described by the characteristic function of the edge set χE\chi_E which maps a pair of binary encoded nodes to 1 iff the nodes are adjacent. Using \emph{Ordered Binary Decision Diagrams} (OBDDs) to store χE\chi_E can lead to a (hopefully) compact representation. Given the OBDD as an input, symbolic/implicit OBDD-based graph algorithms can solve optimization problems by mainly using functional operations, e.g. quantification or binary synthesis. While the OBDD representation size can not be small in general, it can be provable small for special graph classes and then also lead to fast algorithms. In this paper, we show that the OBDD size of unit interval graphs is O( V /log V )O(\ | V \ | /\log \ | V \ |) and the OBDD size of interval graphs is $O(\ | V \ | \log \ | V \ |)whichbothimproveaknownresultfromNunkesserandWoelfel(2009).Furthermore,wecanshowthatusingourvariableorderandnodelabelingforintervalgraphstheworstcaseOBDDsizeis which both improve a known result from Nunkesser and Woelfel (2009). Furthermore, we can show that using our variable order and node labeling for interval graphs the worst-case OBDD size is \Omega(\ | V \ | \log \ | V \ |).Weusethestructureoftheadjacencymatricestoprovethesebounds.Thismethodmaybeofindependentinterestandcanbeappliedtoothergraphclasses.Wealsodevelopamaximummatchingalgorithmonunitintervalgraphsusing. We use the structure of the adjacency matrices to prove these bounds. This method may be of independent interest and can be applied to other graph classes. We also develop a maximum matching algorithm on unit interval graphs using O(\log \ | V \ |)operationsandacoloringalgorithmforunitandgeneralintervalsgraphsusing operations and a coloring algorithm for unit and general intervals graphs using O(\log^2 \ | V \ |)$ operations and evaluate the algorithms empirically.Comment: 29 pages, accepted for 39th International Workshop on Graph-Theoretic Concepts 201

    Efficient Utilization of Fine-Grained Parallelism using a microHeterogeneous Environment

    Get PDF
    The goal of this thesis is to propose a new computing paradigm, called micro- Heterogeneous computing or mHC, which incorporates PCI (or other high speed local system bus) based processing elements (vector processors, digital signal processors, etc) into a general purpose machine. In this manner the benefits of heterogeneous computing on scientific applications can be achieved while avoiding some of the lim itations. Overall performance is increased by exploiting fine-grained parallelism on the most efficient architecture available, while reducing the high communication over head and costs of traditional heterogeneous environments. Furthermore, mHC based machines can be combined into a cluster, allowing both the coarse-grained and fine grained parallelism to be fully exploited in order to achieve even greater levels of performance. An existing high performance computing API (GSL) was chosen as the interface to the system to allow for easy integration with applications that were previously developed using this API. The ensuing chapters will provide the motivation for this work, an overview of heterogenous computing, and the details pertaining to microHeterogeneous comput ing. The framework implemented to demonstrate a microHeterogeneous computing environment will be examined as well as the results. Finally, the future of micro Heterogeneous computing will be discussed

    Sequencing and scheduling : algorithms and complexity

    Get PDF

    Determining the optimal redistribution

    Get PDF
    The classical redistribution problem aims at optimally scheduling communications when moving from an initial data distribution \Dini to a target distribution \Dtar where each processor PiP_{i} will host a subset P(i)P(i) of data items. However, modern computing platforms are equipped with a powerful interconnection switch, and the cost of a given communication is (almost) independent of the location of its sender and receiver. This leads to generalizing the redistribution problem as follows: find the optimal permutation σ\sigma of processors such that PiP_{i} will host the set P(σ(i))P(\sigma(i)), and for which the cost of the redistribution is minimal. This report studies the complexity of this generalized problem. We provide optimal algorithms and evaluate their gain over classical redistribution through simulations. We also show the NP-hardness of the problem to find the optimal data partition and processor permutation (defined by new subsets P(σ(i))P(\sigma(i))) that minimize the cost of redistribution followed by a simple computation kernel.Le problème de redistribution classique consiste à ordonnancer les communications de manière optimale lorsque l'on passe une distribution de données initiale \Dini à une distribution cible \Dtar où chaque processeur PiP_{i} héberge un sous-ensemble P(i)P(i) des données. Cependant, les plates-formes de calcul modernes sont équipées de puissants réseaux d'interconnexion programmables, et le coût d'une communication donnée est (presque) indépendant de l'emplacement de l'expéditeur et du récepteur. Cela conduit à généraliser le problème de redistribution comme suit: trouver la permutation optimale σ\sigma de processeurs telle que PiP_{i} héberge l'ensemble P(σ(i))P(\sigma(i)), et telle que le coût de redistribution soit minimal. Ce rapport étudie la complexité de ce problème généralisé. Nous proposons des algorithmes optimaux et évaluons leur gain par rapport à la redistribution classique, via quelques simulations. Nous montrons aussi la NP-completude du problème consistant à trouver la partition de données optimale et la permutation des processeurs (définie par les nouveaux sous-ensembles P(σ(i))P(\sigma(i))) qui minimise le coût de la redistribution suivie d'un noyau de calcul simple
    corecore