430 research outputs found

    Partitioning Optimization for Massively Parallel Transport Sweeps on Unstructured Grids

    Get PDF
    The field of radiation transport studies the distribution of radiation throughout a seven-dimensional phase-space consisting of time, space, energy, and direction. Radiation transport is described by the Boltzmann equation that can be solved stochastically or deterministically. The work presented in this dissertation utilizes the deterministic method known as the transport sweep, a popular technique that has been the subject of a large amount of research. We specifically focus on the parallel implementations of the transport sweep, and predicting the time it takes to sweep across a structured or unstructured mesh given a set of partitioning parameters, achieved through a time-to-solution estimator, written in Python. The time-to-solution estimator is tested against PDT, Texas A&M’s massively deterministic transport code. The time-to-solution estimator’s sweep time is within 10% of PDT’s sweep time for the majority of problems tested. We use the time-to-solution estimator as the objective function in an optimization scheme to attempt to get the partitions that lead to the fastest sweep time for a given problem and partitioning scheme. Two optimization methods are discussed: using a black box tool (scipy’s optimize library) and an intuitive method that prioritizes placing partitions in mesh locations that does not increase the number of cells (which we chose to name the CDF method). The time-to-solution estimator proved to not be smooth enough for a black box tool to work, so the CDF optimization method became the primary method. The CDF method proved effective for the majority of problems run, improving the time to solution over previously used partitioning scheme

    A Computational Paradigm on Network-Based Models of Computation

    Get PDF
    The maturation of computer science has strengthened the need to consolidate isolated algorithms and techniques into general computational paradigms. The main goal of this dissertation is to provide a unifying framework which captures the essence of a number of problems in seemingly unrelated contexts in database design, pattern recognition, image processing, VLSI design, computer vision, and robot navigation. The main contribution of this work is to provide a computational paradigm which involves the unifying framework, referred to as the multiple Query problem, along with a generic solution to the Multiple Query problem. To demonstrate the applicability of the paradigm, a number of problems from different areas of computer science are solved by formulating them in this framework. Also, to show practical relevance, two fundamental problems were implemented in the C language using MPI. The code can be ported onto many commercially available parallel computers; in particular, the code was tested on an IBM-SP2 and on a network of workstations

    QuickCSG: Arbitrary and Faster Boolean Combinations of N Solids

    Get PDF
    While studied over several decades, the computation of boolean operations on polyhedra is almost always addressed by focusing on the case of two polyhedra. For multiple input polyhedra and an arbitrary boolean operation to be applied, the operation is decomposed over a binary CSG tree, each node being processed separately in quasilinear time. For large trees, this is both error prone due to intermediate geometry and error accumulation, and inefficient because each node yields a specific overhead. We introduce a fundamentally new approach to polyhedral CSG evaluation, addressing the general N-polyhedron case. We propose a new vertex-centric view of the problem, which both simplifies the algorithm computing resulting geometric contributions, and vastly facilitates its spatial decomposition. We then embed the entire problem in a single KD-tree, specifically geared toward the final result by early pruning of any region of space not contributing to the final surface. This not only improves the robustness of the approach, it also gives it a fundamental speed advantage, with an output complexity depending on the output mesh size instead of the input size as with usual approaches. Complemented with a task-stealing parallelization, the algorithm achieves breakthrough performance, one to two orders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over two to several dozen polyhedra. The algorithm is also shown to outperform recent GPU implementations and approximate discretizations, while producing an exact output without redundant facets.Quoique étudié depuis des décennies, le calcul d'opérations booléennes sur des polyèdres est quasiment toujours fait sur deux opérandes. Pour un plus grand nombre de polyèdres et une opération booléenne arbitraire à effectuer, l'opération est décomposée sur un arbre binaire CSG (géométrie constructive), dans lequel chaque nœud est traité séparément en temps quasi-linéaire. Pour de grands arbres, ceci est à la fois source d'erreurs, à cause des calculs géométriques intermédiaires, et inefficace à cause des traitements superflus au niveau des nœuds. Nous introduisons une approche fondamentalement nouvelle qui traite le cas général de N polyèdres. Nous proposons une vue du problème centrée sur les sommets, ce qui simplifie l'algorithme et facilite sa décomposition spatiale. Nous traitons le problème dans un seul KD-tree, qui est dirigé vers le résultat final, en élaguant les régions de l'espace qui ne contribuent pas à la surface finale. Non seulement ceci améliore la robustesse de l'approche mais ça lui donne un avantage en vitesse, car la complexité dépend plus de la taille de la sortie que celle d'entrée. En la combinant avec une parallélisation basée sur du vol de tâche, l'algorithme a des performances inouïes, d'un ou deux ordres de grandeur plus rapide que les algorithmes de l'état de l'art sur CPU et GPU. De plus il produit un résultat exact, sans aucune primitive géométrique superflue

    QuickCSG: Arbitrary and Faster Boolean Combinations of N Solids

    No full text
    While studied over several decades, the computation of boolean operations on polyhedra is almost always addressed by focusing on the case of two polyhedra. For multiple input polyhedra and an arbitrary boolean operation to be applied, the operation is decomposed over a binary CSG tree, each node being processed separately in quasilinear time. For large trees, this is both error prone due to intermediate geometry and error accumulation, and inefficient because each node yields a specific overhead. We introduce a fundamentally new approach to polyhedral CSG evaluation, addressing the general N-polyhedron case. We propose a new vertex-centric view of the problem, which both simplifies the algorithm computing resulting geometric contributions, and vastly facilitates its spatial decomposition. We then embed the entire problem in a single KD-tree, specifically geared toward the final result by early pruning of any region of space not contributing to the final surface. This not only improves the robustness of the approach, it also gives it a fundamental speed advantage, with an output complexity depending on the output mesh size instead of the input size as with usual approaches. Complemented with a task-stealing parallelization, the algorithm achieves breakthrough performance, one to two orders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over two to several dozen polyhedra. The algorithm is also shown to outperform recent GPU implementations and approximate discretizations, while producing an exact output without redundant facets.Quoique étudié depuis des décennies, le calcul d'opérations booléennes sur des polyèdres est quasiment toujours fait sur deux opérandes. Pour un plus grand nombre de polyèdres et une opération booléenne arbitraire à effectuer, l'opération est décomposée sur un arbre binaire CSG (géométrie constructive), dans lequel chaque nœud est traité séparément en temps quasi-linéaire. Pour de grands arbres, ceci est à la fois source d'erreurs, à cause des calculs géométriques intermédiaires, et inefficace à cause des traitements superflus au niveau des nœuds. Nous introduisons une approche fondamentalement nouvelle qui traite le cas général de N polyèdres. Nous proposons une vue du problème centrée sur les sommets, ce qui simplifie l'algorithme et facilite sa décomposition spatiale. Nous traitons le problème dans un seul KD-tree, qui est dirigé vers le résultat final, en élaguant les régions de l'espace qui ne contribuent pas à la surface finale. Non seulement ceci améliore la robustesse de l'approche mais ça lui donne un avantage en vitesse, car la complexité dépend plus de la taille de la sortie que celle d'entrée. En la combinant avec une parallélisation basée sur du vol de tâche, l'algorithme a des performances inouïes, d'un ou deux ordres de grandeur plus rapide que les algorithmes de l'état de l'art sur CPU et GPU. De plus il produit un résultat exact, sans aucune primitive géométrique superflue

    Topology Agnostic Methods for Routing, Reconfiguration and Virtualization of Interconnection Networks

    Get PDF
    Modern computing systems, such as supercomputers, data centers and multicore chips, generally require efficient communication between their different system units; tolerance towards component faults; flexibility to expand or merge; and a high utilization of their resources. Interconnection networks are used in a variety of such computing systems in order to enable communication between their diverse system units. Investigation and proposal of new or improved solutions to topology agnostic routing and reconfiguration of interconnection networks are main objectives of this thesis. In addition, topology agnostic routing and reconfiguration algorithms are utilized in the development of new and flexible approaches to processor allocation. The thesis aims to present versatile solutions that can be used for the interconnection networks of a number of different computing systems. No particular routing algorithm was specified for an interconnection network technology which is now incorporated in Dolphin Express. The thesis states a set of criteria for a suitable routing algorithm, evaluates a number of existing routing algorithms, and recommend that one of the algorithms – which fulfils all of the criteria – is used. Further investigations demonstrate how this routing algorithm inherently supports fault-tolerance, and how it can be optimized for some network topologies. These considerations are also relevant for the InfiniBand interconnection network technology. Reconfiguration of interconnection networks (change of routing function) is a deadlock prone process. Some existing reconfiguration strategies include deadlock avoidance mechanisms that significantly reduce the network service offered to running applications. The thesis expands the area of application for one of the most versatile and efficient reconfiguration algorithms available in the literature, and proposes an optimization of this algorithm that improves the network service offered to running applications. Moreover, a new reconfiguration algorithm is presented that supports a replacement of the routing function without causing performance penalties. Processor allocation strategies that guarantee traffic-containment commonly pose strict requirements on the shape of partitions, and thus achieve only a limited utilization of a system’s computing resources. The thesis introduces two new approaches that are more flexible. Both approaches utilize the properties of a topology agnostic routing algorithm in order to enforce traffic-containment within arbitrarily shaped partitions. Consequently, a high resource utilization as well as isolation of traffic between different partitions is achieved

    Feasibility study for a numerical aerodynamic simulation facility. Volume 1

    Get PDF
    A Numerical Aerodynamic Simulation Facility (NASF) was designed for the simulation of fluid flow around three-dimensional bodies, both in wind tunnel environments and in free space. The application of numerical simulation to this field of endeavor promised to yield economies in aerodynamic and aircraft body designs. A model for a NASF/FMP (Flow Model Processor) ensemble using a possible approach to meeting NASF goals is presented. The computer hardware and software are presented, along with the entire design and performance analysis and evaluation

    Efficient Domain Decomposition Algorithms and Applications in Transportation and Structural Engineering

    Get PDF
    Domain decomposition is a divide-and-conquer strategy. In the first part of this dissertation, a new/simple/efficient domain decomposition partitioning algorithm is proposed to break a large domain into smaller sub-domains, in such a way as to minimize the number of system boundary nodes and to balance the work load for each sub-domain. This new domain decomposition algorithm is based on the network’s shortest path solution. Numerical results indicate that the new Shortest Distance Decomposition Algorithm outperformed the most widely used METIS algorithm in 21 out of 27 tested (transportation) examples. In the second part of this dissertation, another new/simple and highly efficient shortest path algorithm is described for finding the shortest path from all-to-all (all source nodes to all destination nodes). This new Domain Decomposition-based Shortest Path algorithm basically finds the SP from all-to-all for each sub-domain, and assembles each sub-domains’ shortest path solution to correctly obtain the original (un-partitioned) network’s shortest path solution. Numerical results for real-life transportation networks have shown that the algorithm is much faster than the existing Dijkstra’s shortest path algorithm. Finally, the Shortest Distance Decomposition Algorithm has also been shown to perform better than METIS when minimizing the non-zero fill-in terms of structural engineering stiffness matrices used during the finite element simultaneous linear equation solution process

    High-performance direct solution of finite element problems on multi-core processors

    Get PDF
    A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.Ph.D.Committee Chair: Will, Kenneth; Committee Member: Emkin, Leroy; Committee Member: Kurc, Ozgur; Committee Member: Vuduc, Richard; Committee Member: White, Donal

    HCTNav: A path planning algorithm for low-cost autonomous robot navigation in indoor environments

    Full text link
    © 2013 by MDPI (http://www.mdpi.org). Reproduction is permitted for noncommercial purposes.Low-cost robots are characterized by low computational resources and limited energy supply. Path planning algorithms aim to find the optimal path between two points so the robot consumes as little energy as possible. However, these algorithms were not developed considering computational limitations (i.e., processing and memory capacity). This paper presents the HCTNav path-planning algorithm (HCTLab research group’s navigation algorithm). This algorithm was designed to be run in low-cost robots for indoor navigation. The results of the comparison between HCTNav and the Dijkstra’s algorithms show that HCTNav’s memory peak is nine times lower than Dijkstra’s in maps with more than 150,000 cells.This work has been partially supported by the Spanish “Ministerio de Ciencia e Innovación”, under project TEC2009-09871

    A CAD/CAM concept for High Speed Cutting compatible rough machining in die, mould and pattern manufacturing

    Get PDF
    Die, mould and pattern manufacturing plays a central role in the production of capital and consumer goods. Ever-shorter product life cycles and the expanding diversity of features require continued cuts in production lead times. Recently, these developments in the market, accompanied by a simultaneous demand for improved quality at a lower cost, are becoming clearly noticeable. Along with the streamlining of organizational structures and advanced technological developments, it is above all the introduction of CAD/CAM software that offers great potential for reducing lead times for components with free surfaces. The role of milling in the integrated process chain of die, mould and pattern manufacturing is steadily gaining importance. This is due to the ongoing further development of milling-machine technology, the cutting tools and their coatings, and of the CAD /CAM systems themselves. Generally speaking, the milling process is divided into the operations of roughing and finishing. For rough milling, efficient machining means high stock-removal rates together with close contour approximation and low tool wear. Rough milling is normally carried out layer by layer, i.e. in a 2.SD machining operation with constant depth per cut because the rate of material removal and process reliability are usually highest when this method is used. High-speed cutting (HSC), which has been the subject of extensive university research for far more than ten years, has meanwhile become established as a finishing process in many companies. However, the application of HSC demands the observance of geometric and, above all, technological constraints. A considerable degree of optimization can be achieved when these constraints are applied to rough milling. In the integrated process chain, the CAD/CAM system performs the task of calculating NC programs based on CAD data which meet the requirements posed by rough and finish machining operations. While general interest was focused on the development of CAM strategies for HSC finish machining, advanced development of technology-oriented CAM modules for upstream roughing operations was neglected. The paper at hand deals with the development of a CAM module for rough-machining complex components in die, mould and pattern manufacturing. It provides an insight into the process-technological demands made on HSC operations and their application in rough machining, from which guidelines and requirements on technologically oriented NC functions for CAM software were derived. These encompass both the complete development of an interactive, dialogue-based user guidance function and the algorithmic conversion of the calculation routines. The concept at hand was almost entirely implemented and integrated in the CAD/CAM system developed by Tebis AG, Germany, which was conceived especially for die, mould and pattern manufacturing and is scheduled for introduction to the free market starting in April 2001
    corecore