3,015 research outputs found

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

    GPU Usage for Parallel Functions and Contacts in Modelica

    Get PDF
    This thesis investigates two ways of incorporating GPUs in Modelica. The first by automatically generating GPU code for Modelica functions, and the second by using GPU accelerated external code for a contact handling package. Automatic parallelization of functions is desired, as it can potentially accelerate large simulations significantly. Special patterns of nested for-loops in Modelica code are recognized and translated into CUDA kernel functions. Inline integration allows a broader spectrum of models to take advantage of the parallelization, by reducing CPU-GPU transfers. The prototype has been tested and achieved a speed-up factor of up to five compared to the CPU. The contact handling package is capable of handling both complex contact behavior between arbitrarily shaped bodies and large DEM-like simulations, something which Modelica is currently lacking. Attempts to accelerate the package with GPUs were made, with partial success for the broad phase. The package uses Morton encoding for the broad phase, and the narrow phase is based on CSG intersection with BSP trees. Contact response is calculated using a volume dependent method, taking friction, damping and multiple contact points into account. The capability of the package was demonstrated by the fact that both complex contact behavior such as the inversion of the Tippe Top toy and tens of thousands of colliding spheres could be simulated.One of the key components in our modern society is the ability to simulate. By simulations, the industry can design new cars, phones, aircrafts etc., without having to go through prototype after prototype, allowing us the cheap and high-tech products most of us rely on in our day-to-day life. However, good as simulations are today there are still severe limitations on what can be simulated. Two of the largest limiting factors are how hard simulations are to design and how long they take to run. The first of these problems are tackled by the Modelica programming language, which is designed for easy set-up of simulations. We have attacked both these problems by using GPUs to speed-up Modelica simulations more than 5 times, and extending the capability of Modelica to handle collisions between objects

    New Approaches to Column Compatibility Checking and Column-Based Input/Output Encoding for Curtis Decompositions of Completely or Incompletely Specified Switching Functions

    Get PDF
    Cube calculus is an algebraic model used to process boolean functions. Cube calculus operations are widely used in logic optimization, logic synthesis, image processing and recognition, machine learning, and other applications which require massive logic operations. The cube calculus operations can be carried out on general-purpose computers. Since these operations can involve several levels of nested loops, this approach has poor performance. A cube calculus machine which has a special data path designed to speed up cube calculus operations is presented in this thesis. This c-qbe calculus machine can execute cube calculus operations 10 to 25 times faster than the software approach on a general-purpose computer. This thesis proposes a complete design of the Cube Calculus Machine Version II (CCM2). In this design, the CCM acts as a coprocessor of the host computer; it accepts a set of instructions that let the CCM carry out cube calculus operations. This design is mapped on a reconfigurable hardware DEC PeRLe-1 board

    Probabilistic structural mechanics research for parallel processing computers

    Get PDF
    Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

    Detectors for the James Webb Space Telescope Near-Infrared Spectrograph I: Readout Mode, Noise Model, and Calibration Considerations

    Full text link
    We describe how the James Webb Space Telescope (JWST) Near-Infrared Spectrograph's (NIRSpec's) detectors will be read out, and present a model of how noise scales with the number of multiple non-destructive reads sampling-up-the-ramp. We believe that this noise model, which is validated using real and simulated test data, is applicable to most astronomical near-infrared instruments. We describe some non-ideal behaviors that have been observed in engineering grade NIRSpec detectors, and demonstrate that they are unlikely to affect NIRSpec sensitivity, operations, or calibration. These include a HAWAII-2RG reset anomaly and random telegraph noise (RTN). Using real test data, we show that the reset anomaly is: (1) very nearly noiseless and (2) can be easily calibrated out. Likewise, we show that large-amplitude RTN affects only a small and fixed population of pixels. It can therefore be tracked using standard pixel operability maps.Comment: 55 pages, 10 figure

    Aspects of algorithms and dynamics of cellular paradigms

    Get PDF
    Els paradigmes cel·lulars, com les xarxes neuronals cel·lulars (CNN, en anglès) i els autòmats cel·lulars (CA, en anglès), són una eina excel·lent de càlcul, al ser equivalents a una màquina universal de Turing. La introducció de la màquina universal CNN (CNN-UM, en anglès) ha permès desenvolupar hardware, el nucli computacional del qual funciona segons la filosofia cel·lular; aquest hardware ha trobat aplicació en diversos camps al llarg de la darrera dècada. Malgrat això, encara hi ha moltes preguntes a obertes sobre com definir els algoritmes d'una CNN-UM i com estudiar la dinàmica dels autòmats cel·lulars. En aquesta tesis es tracten els dos problemes: primer, es demostra que es possible acotar l'espai dels algoritmes per a la CNN-UM i explorar-lo gràcies a les tècniques genètiques; i segon, s'expliquen els fonaments de l'estudi dels CA per mitjà de la dinàmica no lineal (segons la definició de Chua) i s'il·lustra com aquesta tècnica ha permès trobar resultats innovadors.Los paradigmas celulares, como las redes neuronales celulares (CNN, eninglés) y los autómatas celulares (CA, en inglés), son una excelenteherramienta de cálculo, al ser equivalentes a una maquina universal deTuring. La introducción de la maquina universal CNN (CNN-UM, eninglés) ha permitido desarrollar hardware cuyo núcleo computacionalfunciona según la filosofía celular; dicho hardware ha encontradoaplicación en varios campos a lo largo de la ultima década. Sinembargo, hay aun muchas preguntas abiertas sobre como definir losalgoritmos de una CNN-UM y como estudiar la dinámica de los autómatascelular. En esta tesis se tratan ambos problemas: primero se demuestraque es posible acotar el espacio de los algoritmos para la CNN-UM yexplorarlo gracias a técnicas genéticas; segundo, se explican losfundamentos del estudio de los CA por medio de la dinámica no lineal(según la definición de Chua) y se ilustra como esta técnica hapermitido encontrar resultados novedosos.Cellular paradigms, like Cellular Neural Networks (CNNs) and Cellular Automata (CA) are an excellent tool to perform computation, since they are equivalent to a Universal Turing machine. The introduction of the Cellular Neural Network - Universal Machine (CNN-UM) allowed us to develop hardware whose computational core works according to the principles of cellular paradigms; such a hardware has found application in a number of fields throughout the last decade. Nevertheless, there are still many open questions about how to define algorithms for a CNN-UM, and how to study the dynamics of Cellular Automata. In this dissertation both problems are tackled: first, we prove that it is possible to bound the space of all algorithms of CNN-UM and explore it through genetic techniques; second, we explain the fundamentals of the nonlinear perspective of CA (according to Chua's definition), and we illustrate how this technique has allowed us to find novel results

    Doctor of Philosophy

    Get PDF
    dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving

    Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies

    Get PDF
    The simulation of large numbers of rigid bodies of non-analytical shapes or vastly varying sizes which collide with each other is computationally challenging. The fundamental problem is the identification of all contact points between all particles at every time step. In the Discrete Element Method (DEM), this is particularly difficult for particles of arbitrary geometry that exhibit sharp features (e.g. rock granulates). While most codes avoid non-spherical or non-analytical shapes due to the computational complexity, we introduce an iterative-based contact detection method for triangulated geometries. The new method is an improvement over a naive brute force approach which checks all possible geometric constellations of contact and thus exhibits a lot of execution branching. Our iterative approach has limited branching and high floating point operations per processed byte. It thus is suitable for modern Single Instruction Multiple Data (SIMD) CPU hardware. As only the naive brute force approach is robust and always yields a correct solution, we propose a hybrid solution that combines the best of the two worlds to produce fast and robust contacts. In terms of the DEM workflow, we furthermore propose a multilevel tree-based data structure strategy that holds all particles in the domain on multiple scales in grids. Grids reduce the total computational complexity of the simulation. The data structure is combined with the DEM phases to form a single touch tree-based traversal that identifies both contact points between particle pairs and introduces concurrency to the system during particle comparisons in one multiscale grid sweep. Finally, a reluctant adaptivity variant is introduced which enables us to realise an improved time stepping scheme with larger time steps than standard adaptivity while we still minimise the grid administration overhead. Four different parallelisation strategies that exploit multicore architectures are discussed for the triad of methodological ingredients. Each parallelisation scheme exhibits unique behaviour depending on the grid and particle geometry at hand. The fusion of them into a task-based parallelisation workflow yields promising speedups. Our work shows that new computer architecture can push the boundary of DEM computability but this is only possible if the right data structures and algorithms are chosen

    Cellular Automata

    Get PDF
    Modelling and simulation are disciplines of major importance for science and engineering. There is no science without models, and simulation has nowadays become a very useful tool, sometimes unavoidable, for development of both science and engineering. The main attractive feature of cellular automata is that, in spite of their conceptual simplicity which allows an easiness of implementation for computer simulation, as a detailed and complete mathematical analysis in principle, they are able to exhibit a wide variety of amazingly complex behaviour. This feature of cellular automata has attracted the researchers' attention from a wide variety of divergent fields of the exact disciplines of science and engineering, but also of the social sciences, and sometimes beyond. The collective complex behaviour of numerous systems, which emerge from the interaction of a multitude of simple individuals, is being conveniently modelled and simulated with cellular automata for very different purposes. In this book, a number of innovative applications of cellular automata models in the fields of Quantum Computing, Materials Science, Cryptography and Coding, and Robotics and Image Processing are presented
    corecore