85 research outputs found

    On the Practice and Application of Context-Free Language Reachability

    Get PDF
    The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques

    Parallelization for image processing algorithms based chain and mid-crack codes

    Get PDF
    Freeman chain code is a widely-used description for a contour image. Another mid-crack code algorithm was proposed as a more precise method for image representation. We have developed a coding algorithm which is suitable to generate either chain code description or mid-crack code description by switching between two different tables. Since there is a strong urge to use parallel processing in image related problems, a parallel coding algorithm is implemented. This algorithm is developed on a pyramid architecture and a N cube architecture. Using link-list data structure and neighbor identification, the algorithm gains efficiency because no sorting or neighborhood pairing is needed. In this dissertation, the local symmetry deficiency (LSD) computation to calculate the local k-symmetry is embedded in the coding algorithm. Therefore, we can finish the code extraction and the LSD computation in one pass. The embedding process is not limited to the k-symmetry algorithm and has the capability of parallelism. An adaptive quadtree to chain code conversion algorithm is also presented. This algorithm is designed for constructing the chain codes of the resulting quadtree from the boolean operation of two quadtrees by using the chain codes of the original one. The algorithm has the parallelism and is ready to be implemented on a pyramid architecture. Our parallel processing approach can be viewed as a parallelization paradigm - a template to embed image processing algorithms in the chain coding process and to implement them in a parallel approach

    Quality of alternative terrain models

    Get PDF

    Towards Automatic and Adaptive Optimizations of MPI Collective Operations

    Get PDF
    Message passing is one of the most commonly used paradigms of parallel programming. Message Passing Interface, MPI, is a standard used in scientific and high-performance computing. Collective operations are a subset of MPI standard that deals with processes synchronization, data exchange and computation among a group of processes. The collective operations are commonly used and can be application performance bottleneck. The performance of collective operations depends on many factors, some of which are the input parameters (e.g., communicator and message size); system characteristics (e.g., interconnect type); the application computation and communication pattern; and internal algorithm parameters (e.g., internal segment size). We refer to an algorithm and its internal parameters as a method. The goal of this dissertation is a performance improvement of MPI collective operations and applications that use them. In our framework, during a collective call, a system-specific decision function is invoked to select the most appropriate method for the particular collective instance. This dissertation focuses on automatic techniques for system-specific decision function generation. Our approach takes the following steps: first, we collect method performance information on the system of interest; second, we analyze this information using parallel communication models, graphical encoding methods, and decision trees; third, based on the previous step, we automatically generate the system-specific decision function to be used at run-time. In situation when a detailed performance measurement is not feasible, method performance models can be used to supplement the measured method performance information. We build and evaluate parallel communication models of 35 different collective algorithms. These models are built on top of the three commonly used point-to-point communication models, Hockney, LogGP, and PLogP.We use the method performance information on a system to build quadtrees and C4.5 decision trees of variable sizes and accuracies. The collective method selection functions are then generated automatically from these trees. Our experiments show that quadtrees of three or four levels are often enough to approximate experimentally optimal decision with a small mean performance penalty (less than 10%). The C4.5 decision trees are even more accurate (with mean performance penalty of less than 5%). The size and accuracy of C4.5 decision trees can be further improved with use of appropriate composite attributes (such as “total message size”, or “even communicator size”.) Finally, we apply these techniques to tune the collective operations on the Grig cluster at the University of Tennessee and to improve an application performance on the Cray XT4 system at Oak Ridge National Laboratory. The tuned collective is able to achieve more than 40% mean performance improvement over the native broadcast implementation. Using the platform-specific reduce on Cray XT4 lead to 10% improvement in the overall application performance. Our results show that the methods we explored are both applicable and effective for the system-specific optimizations of collective operations and are a right step toward automatically tunable, adaptive, MPI collectives

    A user-friendly authoring language and development environment for graphic adventure games

    Get PDF
    This thesis discusses the implementation of a DSL and web IDE to support users with little or no programming experience with the creation of web games focused on interactive storytelling. The system has been implemented to validate the reactive programming model proposed by the evReact framework and its ability to easily represent control flow as a data structure that can be generated by a system like ours. Moreover, we have been able to persist the current state of the workflow by serializing the networks underlying the evReact model

    A framework for developing finite element codes for multi- disciplinary applications

    Get PDF
    The world of computing simulation has experienced great progresses in recent years and requires more exigent multidisciplinary challenges to satisfy the new upcoming demands. Increasing the importance of solving multi-disciplinary problems makes developers put more attention to these problems and deal with difficulties involved in developing software in this area. Conventional finite element codes have several difficulties in dealing with multi-disciplinary problems. Many of these codes are designed and implemented for solving a certain type of problems, generally involving a single field. Extending these codes to deal with another field of analysis usually consists of several problems and large amounts of modifications and implementations. Some typical difficulties are: predefined set of degrees of freedom per node, data structure with fixed set of defined variables, global list of variables for all entities, domain based interfaces, IO restriction in reading new data and writing new results and algorithm definition inside the code. A common approach is to connect different solvers via a master program which implements the interaction algorithms and also transfers data from one solver to another. This approach has been used successfully in practice but results duplicated implementation and redundant overhead of data storing and transferring which may be significant depending to the solvers data structure. The objective of this work is to design and implement a framework for building multi-disciplinary finite element programs. Generality, reusability, extendibility, good performance and memory efficiency are considered to be the main points in design and implementation of this framework. Preparing the structure for team development is another objective because usually a team of experts in different fields are involved in the development of multi-disciplinary code. Kratos, the framework created in this work, provides several tools for easy implementation of finite element applications and also provides a common platform for natural interaction of its applications in different ways. This is done not only by a number of innovations but also by collecting and reusing several existing works. In this work an innovative variable base interface is designed and implemented which is used at different levels of abstraction and showed to be very clear and extendible. Another innovation is a very efficient and flexible data structure which can be used to store any type of data in a type-safe manner. An extendible IO is also created to overcome another bottleneck in dealing with multi-disciplinary problems. Collecting different concepts of existing works and adapting them to coupled problems is considered to be another innovation in this work. Examples are using an interpreter, different data organizations and variable number of dofs per node. The kernel and application approach is used to reduce the possible conflicts arising between developers of different fields and layers are designed to reflect the working space of different developers also considering their programming knowledge. Finally several technical details are applied in order to increase the performance and efficiency of Kratos which makes it practically usable. This work is completed by demonstrating the framework’s functionality in practice. First some classical single field applications like thermal, fluid and structural applications are implemented and used as benchmark to prove its performance. These applications are used to solve coupled problems in order to demonstrate the natural interaction facility provided by the framework. Finally some less classical coupled finite element algorithms are implemented to show its high flexibility and extendibility

    Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra

    Get PDF
    This dissertation advances the state of the art for scalable high-performance graph analytics and data mining using the language of linear algebra. Many graph computations suffer poor scalability due to their irregular nature and low operational intensity. A small but powerful set of linear algebra primitives that specifically target graph and data mining applications can expose sufficient coarse-grained parallelism to scale to thousands of processors.In this dissertation we advance existing distributed memory approaches in two important ways. First, we observe that data scientists and domain experts know their analysis and mining problems well, but suffer from little HPC experience. We describe a system that presents the user with a clean API in a high-level language that scales from a laptop to a supercomputer with thousands of cores. We utilize a Domain-Specific Embedded Language with Selective Just-In-Time Specialization to ensure a negligible performance impact over the original distributed memory low-level code. The high-level language enables ease of use, rapid prototyping, and additional features such as on-the-fly filtering, runtime-defined objects, and exposure to a large set of third-party visualization packages.The second important advance is a new sparse matrix data structure and set of algorithms. We note that shared memory machines are dominant both in stand-alone form and as nodes in distributed memory clusters. This thesis offers the design of a new sparse-matrix data structure and set of parallel algorithms, a reusable implementation in shared memory, and a performance evaluation that shows significant speed and memory usage improvements over competing packages. Our method also offers features such as in-memory compression, a low-cost transpose, and chained primitives that do not materialize the entire intermediate result at any one time. We focus on a scalable, generalized, sparse matrix-matrix multiplication algorithm. This primitive is used extensively in many graph algorithms such as betweenness centrality, graph clustering, graph contraction, and subgraph extraction

    Data Structures & Algorithm Analysis in C++

    Get PDF
    This is the textbook for CSIS 215 at Liberty University.https://digitalcommons.liberty.edu/textbooks/1005/thumbnail.jp

    Solveur Parallèle pour l'Equation de Poisson sur Mailles Superposées et Hiérarchiques, dans le Cadre du Langage Python

    Get PDF
    Adaptive discretizations are important in compressible/incompressible flow problems since it is often necessary to resolve details on multiple levels, allowing large regions of space to be modeled using a reduced number of degrees of freedom (reducing the computational time). There are a wide variety of methods for adaptively discretizing space, but Cartesian grids have often outperformed them even at high resolutions due to their simple and accurate numerical stencils and their superior parallel performances. Such performance and simplicity are in general obtained applying a finite-difference scheme for the resolution of the problems involved, but this discretization approach does not present, by contrast, an easy adapting path. In a finite-volume scheme, instead, we can incorporate different types of grids, more suitable for adaptive refinements, increasing the complexity on the stencils and getting a greater flexibility. The Laplace operator is an essential building block of the Navier-Stokes equations, a model that governs fluid flows, but it occurs also in differential equations that describe many other physical phenomena, such as electric and gravitational potentials, and quantum mechanics. So, it is a very important differential operator, and all the studies carried out on it, prove its relevance. In this work will be presented 2D finite-difference and finite-volume approaches to solve the Laplacian operator, applying patches of overlapping grids where a more fined level is needed, leaving coarsermeshes in the rest of the computational domain. These overlapping grids will have generic quadrilateral shapes. Specifically, the topics covered will be: 1) introduction to the finite difference method, finite volume method, domain partitioning, solution approximation; 2)overview of different types of meshes to represent in a discrete way the geometry involved in a problem, with a focus on the octree data structure, presenting PABLO and PABLitO. The first one is an external library used to manage each single grid’s creation, load balancing andinternal communications, while the second one is the Python API of that library written ad hoc for the current project; 3) presentation of the algorithm used to communicate data between meshes (being all of them unaware of each other’s existence) using MPI inter-communicators and clarification of the monolithic approach applied building the final matrix for the system to solve, taking into account diagonal, restriction and prolongation blocks; 4) presentation of some results; conclusions, references. It is important to underline that everything is done under Python as programming framework, using Cython for the writing of PABLitO, MPI4Py for the communications between grids, PETSc4py for the assembling and resolution parts of the system of unknowns, NumPy for contiguous memory buffer objects. The choice of this programming language has been made because Python, easy to learn and understand, is today a significant contender for the numerical computing and HPC ecosystem, thanks to its clean style, its packages, its compilers and, why not, its specific architecture optimized versions.Les discrétisations adaptatives sont importantes dans les problèmes de flux compressible/incompressible puisqu'il est souvent nécessaire de résoudre des détails sur plusieurs niveaux, en permettant de modéliser de grandes régions d'espace en utilisant un nombre réduit de degrés de liberté (et en réduisant le temps de calcul). Il existe une grande variété de méthodes de discrétisation adaptative, mais les grilles cartésiennes sont les plus efficaces, grâce à leurs stencils numériques simples et précis et à leurs performances parallèles supérieures. Et telles performance et simplicité sont généralement obtenues en appliquant un schéma de différences finies pour la résolution des problèmes, mais cette approche de discrétisation ne présente pas, au contraire, un chemin facile d'adaptation.Dans un schéma de volumes finis, en revanche, nous pouvons incorporer différents types de maillages, plus appropriées aux raffinements adaptatifs, en augmentant la complexité sur les stencils et en obtenant une plus grande flexibilité. L'opérateur de Laplace est un élémentessentiel des équations de Navier-Stokes, un modèle qui gouverne les écoulements de fluides, mais il se produit également dans des équations différentielles qui décrivent de nombreux autres phénomènes physiques, tels que les potentiels électriques et gravitationnels. Il s'agit donc d'un opérateur différentiel très important, et toutes les études qui ont été effectuées sur celui-ci, prouvent sa pertinence. Dans ce travail seront présentés des approches de différences finies et de volumes finis 2D pour résoudre l'opérateur laplacien, en appliquant des patchs de grilles superposées où un niveau plus fin est nécessaire, en laissant des maillages plus grossiers dans le reste du domaine de calcul. Ces grilles superposées auront des formes quadrilatérales génériques. Plus précisément, les sujets abordés seront les suivants: 1) introduction à la méthode des différences finies, méthode des volumes finis, partitionnement des domaines, approximation de la solution; 2)récapitulatif des différents types de maillages pour représenter de façon discrète la géométrie impliquée dans un problème, avec un focus sur la structure de données octree, présentant PABLO et PABLitO. Le premier est une bibliothèque externe utilisée pour gérer la création de chaque grille, l'équilibrage de charge et les communications internes, tandis que la seconde est l'API Python de cette bibliothèque, écrite ad hoc pour le projet en cours; 3) la présentation de l'algorithme utilisé pour communiquer les données entre les maillages (en ignorant chacune l'existence de l'autre) en utilisant les intercommunicateurs MPI et la clarification de l'approche monolithique appliquée à la construction finale de la matrice pour résoudre le système, en tenant compte des blocs diagonaux, de restriction et de prolongement; 4) la présentation de certains résultats; conclusions, références. Il est important de souligner que tout est fait sous Python comme framework de programmation, en utilisant Cython pour l'écriture de PABLitO, MPI4Py pour les communications entre grilles, PETSc4py pour les parties assemblage et résolution du système d'inconnues, NumPy pour les objets à mémoire continue. Le choix de ce langage de programmation a été fait car Python, facile à apprendre et à comprendre, est aujourd'hui un concurrent significatif pour l'informatique numérique et l'écosystème HPC, grâce à son style épuré, ses packages, ses compilateurs et pourquoi pas ses versions optimisées pour des architectures spécifiques
    • …
    corecore