76 research outputs found

    Parallel Simulation for VLSI Power Grid

    Get PDF
    Due to the increasing complexity of VLSI circuits, power grid simulation has become more and more time-consuming. Hence, there is a need for fast and accurate power grid simulator. In order to perform power grid simulation in a timely manner, parallel algorithms have been developed to accelerate the simulation. In this dissertation, we present parallel algorithms and software for power grid simulation on CPU-GPU platforms. The power grid is divided into disjoint partitions. The partitions are enlarged using Breath First Search (BFS) method. In the partition enlarging process, a portion of edges are ignored to make the matrix factorization light-weight. Solving the enlarged partitions using a direct solver serves as a preconditioner for the Preconditioned Conjugate Gradient (PCG) method that is used to solve the power grid. This work combines the advantages of direct solvers and iterative solvers to obtain an efficient hybrid parallel solver. Two-tier parallelism is harnessed using MPI for partitions and CUDA within each partition. The experiments conducted on supercomputing clusters demonstrate significant speed improvements over a state-of-the-art direct solver in both static and transient analysis

    TURBULENT TRANSITION SIMULATION AND PARTICULATE CAPTURE MODELING WITH AN INCOMPRESSIBLE LATTICE BOLTZMANN METHOD

    Get PDF
    Derivation of an unambiguous incompressible form of the lattice Boltzmann equation is pursued in this dissertation. Further, parallelized implementation in developing application areas is researched. In order to achieve a unique incompressible form which clarifies the algorithm implementation, appropriate ansatzes are utilized. Through the Chapman-Enskog expansion, the exact incompressible Navier-Stokes equations are recovered. In initial studies, fundamental 2D and 3D canonical simulations are used to evaluate the validity and application, and test the required boundary condition modifications. Several unique advantages over the standard equation and alternative forms found in literature are found, including faster convergence, greater stability, and higher fidelity for relevant flows. Direct numerical simulation and large eddy simulation of transitional and chaotic flows are one application area explored with the derived incompressible form. A multiple relaxation time derivation is performed and implemented in a 2D cavity (direct simulation) and a 3D cavity (large eddy simulation). The Kolmogorov length scale, a function of Reynolds number, determines grid resolution in the 2D case. Comparison is made to the extensive literature on laminar flows and the Hopf bifurcation, and final transition to chaos is predicted. Steady and statistical properties in all cases are in good agreement with literature. In the 3D case the relatively new Vreman subgrid model provides eddy viscosity modeling. By comparing the center plane to the direct numerical simulation case, both steady and unsteady flows are found to be in good agreement, with a coarse grid, including prediction of the Hopf bifurcation. Multiphysics pore scale flow is the other main application researched here. In order to provide the substrate geometry, a straightforward algorithm is developed to generate random blockages producing realistic porosities and passages. Combined with advection-diffusion equations for conjugate heat transfer and soot particle transport, critical diesel particulate filtration phenomena are simulated. To introduce additional fidelity, a model is added which accounts for deposition caused by a variety of molecular and atomic forces. Detailed conclusions are presented to lay the groundwork for future extensions and improvements. Predominantly, higher lattice velocity large eddy simulation, improved parallelization, and filter regeneration

    On continuous maximum ow image segmentation algorithm

    Get PDF
    Ces dernières années avec les progrès matériels, les dimensions et le contenu des images acquises se sont complexifiés de manière notable. Egalement, le différentiel de performance entre les architectures classiques mono-processeur et parallèles est passé résolument en faveur de ces dernières. Pourtant, les manières de programmer sont restées largement les mêmes, instituant un manque criant de performance même sur ces architectures. Dans cette thèse, nous explorons en détails un algorithme particulier, les flots maximaux continus. Nous explicitons pourquoi cet algorithme est important et utile, et nous proposons plusieurs implémentations sur diverses architectures, du mono-processeur à l'architecture SMP et NUMA, ainsi que sur les architectures massivement parallèles des GPGPU. Nous explorons aussi des applications et nous évaluons ses performances sur des images de grande taille en science des matériaux et en biologie à l'échelle nanoIn recent years, with the advance of computing equipment and image acquisition techniques, the sizes, dimensions and content of acquired images have increased considerably. Unfortunately as time passes there is a steadily increasing gap between the classical and parallel programming paradigms and their actual performance on modern computer hardware. In this thesis we consider in depth one particular algorithm, the continuous maximum flow computation. We review in detail why this algorithm is useful and interesting, and we propose efficient and portable implementations on various architectures. We also examine how it performs in the terms of segmentation quality on some recent problems of materials science and nano-scale biologyPARIS-EST-Université (770839901) / SudocSudocFranceF

    Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes

    Full text link
    [EN] The main topic of the present thesis is the improvement of fabrication processes simulation by means of the Level Set (LS) method. The LS is a mathematical approach used for evolving fronts according to a motion defined by certain laws. The main advantage of this method is that the front is embedded inside a higher dimensional function such that updating this function instead of directly the front itself enables a trivial handling of complex situations like the splitting or coalescing of multiple fronts. In particular, this document is focused on wet and dry etching processes, which are widely used in the micromachining process of Micro-Electro-Mechanical Systems (MEMS). A MEMS is a system formed by mechanical elements, sensors, actuators, and electronics. These devices have gained a lot of popularity in last decades and are employed in several industry fields such as automotive security, motion sensors, and smartphones. Wet etching process consists in removing selectively substrate material (e.g. silicon or quartz) with a liquid solution in order to form a certain structure. This is a complex process since the result of a particular experiment depends on many factors, such as crystallographic structure of the material, etchant solution or its temperature. Similarly, dry etching processes are used for removing substrate material, however, gaseous substances are employed in the etching stage. In both cases, the usage of a simulator capable of predicting accurately the result of a certain experiment would imply a significant reduction of design time and costs. There exist a few LS-based wet etching simulators but they have many limitations and they have never been validated with real experiments. On the other hand, atomistic models are currently considered the most advanced simulators. Nevertheless, atomistic simulators present some drawbacks like the requirement of a prior calibration process in order to use the experimental data. Additionally, a lot of effort must be invested to create an atomistic model for simulating the etching process of substrate materials with different atomistic structures. Furthermore, the final result is always formed by unconnected atoms, which makes difficult a proper visualization and understanding of complex structures, thus, usually an additional visualization technique must be employed. For its part, dry etching simulators usually employ an explicit representation technique to evolve the surface being etched according to etching models. This strategy can produce unrealistic results, specially in complex situations like the interaction of multiple surfaces. Despite some models that use implicit representation have been published, they have never been directly compared with real experiments and computational performance of the implementations have not been properly analysed. The commented limitations are addressed in the various chapters of the present thesis, producing the following contributions: - An efficient LS implementation in order to improve the visual representation of atomistic wet etching simulators. This implementation produces continuous surfaces from atomistic results. - Definition of a new LS-based model which can directly use experimental data of many etchant solutions (such as KOH, TMAH, NH4HF2, and IPA and Triton additives) to simulate wet etching processes of various substrate materials (e.g. silicon and quartz). - Validation of the developed wet etching simulator by comparing it to experimental and atomistic simulator results. - Implementation of a LS-based tool which evolves the surface being etched according to dry etching models in order to enable the simulation of complex processes. This implementation is also validated experimentally. - Acceleration of the developed wet and dry etching simulators by using Graphics Processing Units (GPUs).[ES] El tema principal de la presente tesis consiste en mejorar la simulación de los procesos de fabricación utilizando el método Level Set (LS). El LS es una técnica matemática utilizada para la evolución de frentes según un movimiento definido por unas leyes. La principal ventaja de este método es que el frente está embebido dentro de una función definida en una dimensión superior. Actualizar dicha función en lugar del propio frente permite tratar de forma trivial situaciones complejas como la separación o la colisión de diversos frentes. En concreto, este documento se centra en los procesos de atacado húmedo y seco, los cuales son ampliamente utilizados en el proceso de fabricación de Sistemas Micro-Electro-Mecánicos (MEMS, de sus siglas en inglés). Un MEMS es un sistema formado por elementos mecánicos, sensores, actuadores y electrónica. Estos dispositivos hoy en día son utilizados en muchos campos de la industria como la seguridad automovilística, sensores de movimiento y teléfonos inteligentes. El proceso de atacado húmedo consiste en eliminar de forma selectiva el material del sustrato (por ejemplo, silicio o cuarzo) con una solución líquida con el fin de formar una estructura específica. Éste es un proceso complejo pues el resultado depende de muchos factores, tales como la estructura cristalográfica del material, la solución atacante o su temperatura. De forma similar, los procesos de atacado seco son utilizados para eliminar el material del sustrato, sin embargo, se utilizan sustancias gaseosas en la fase de atacado. En ambos casos, la utilización de un simulador capaz de predecir de forma precisa el resultado de un experimento concreto implicaría una reducción significativa del tiempo de diseño y de los costes. Existen unos pocos simuladores del proceso de atacado húmedo basados en el método LS, no obstante tienen muchas limitaciones y nunca han sido validados con experimentos reales. Por otro lado, los simuladores atomísticos son hoy en día considerados los simuladores más avanzados pero tienen algunos inconvenientes como la necesidad de un proceso de calibración previo para poder utilizar los datos experimentales. Además, debe invertirse mucho esfuerzo para crear un modelo atomístico para la simulación de materiales de sustrato con distintas estructuras atomísticas. Asimismo, el resultado final siempre está formado por átomos inconexos que dificultan una correcta visualización y un correcto entendimiento de aquellas estructuras complejas, por tanto, normalmente debe emplearse una técnica adicional para la visualización de dichos resultados. Por su parte, los simuladores del proceso de atacado seco normalmente utilizan técnicas de representación explícita para evolucionar, según los modelos de atacado, la superficie que está siendo atacada. Esta técnica puede producir resultados poco realistas, sobre todo en situaciones complejas como la interacción de múltiples superficies. A pesar de que unos pocos modelos son capaces de solventar estos problemas, nunca han sido comparados con experimentos reales ni el rendimiento computacional de las correspondientes implementaciones ha sido adecuadamente analizado. Las expuestas limitaciones son abordadas en la presente tesis y se han producido las siguientes contribuciones: - Implementación eficiente del método LS para mejorar la representación visual de los simuladores atomísticos del proceso de atacado húmedo. - Definición de un nuevo modelo basado en el LS que pueda usar directamente los datos experimentales de muchos atacantes para simular el proceso de atacado húmedo de diversos materiales de sustrato. - Validación del simulador comparándolo con resultados experimentales y con los de simuladores atomísticos. - Implementación de una herramienta basada en el método LS que evolucione la superficie que está siendo atacada según los modelos de atacado seco para habilitar la simulación de procesos comple[CA] El tema principal de la present tesi consisteix en millorar la simulació de processos de fabricació mitjançant el mètode Level Set (LS). El LS és una tècnica matemàtica utilitzada per a l'evolució de fronts segons un moviment definit per unes lleis en concret. El principal avantatge d'aquest mètode és que el front està embegut dins d'una funció definida en una dimensió superior. D'aquesta forma, actualitzar la dita funció en lloc del propi front, permet tractar de forma trivial situacions complexes com la separació o la col·lisió de diversos fronts. En concret, aquest document es centra en els processos d'atacat humit i sec, els quals són àmpliament utilitzats en el procés de fabricació de Sistemes Micro-Electro-Mecànics (MEMS, de les sigles en anglès). Un MEMS és un sistema format per elements mecànics, sensors, actuadors i electrònica. Aquests dispositius han guanyat molta popularitat en les últimes dècades i són utilitzats en molts camps de la indústria, com la seguretat automobilística, sensors de moviment i telèfons intel·ligents. El procés d'atacat humit consisteix en eliminar de forma selectiva el material del substrat (per exemple, silici o quars) amb una solució líquida, amb la finalitat de formar una estructura específica. Aquest és un procés complex ja que el resultat de un determinat experiment depèn de molts factors, com l'estructura cristal·logràfica del material, la solució atacant o la seva temperatura. De manera similar, els processos d'atacat sec son utilitzats per a eliminar el material del substrat, no obstant, s'utilitzen substàncies gasoses en la fase d'atacat. En ambdós casos, la utilització d'un simulador capaç de predir de forma precisa el resultat d'un experiment en concret implicaria una reducció significativa del temps de disseny i dels costos. Existeixen uns pocs simuladors del procés d'atacat humit basats en el mètode LS, no obstant tenen moltes limitacions i mai han sigut validats amb experiments reals. Per la seva part, els simuladors atomístics tenen alguns inconvenients com la necessitat d'un procés de calibratge previ per a poder utilitzar les dades experimentals. A més, deu invertir-se molt d'esforç per crear un model atomístic per a la simulació de materials de substrat amb diferents estructures atomístiques. Així mateix, el resultat final sempre està format per àtoms inconnexos que dificulten una correcta visualització i un correcte enteniment d'aquelles estructures complexes, per tant, normalment deu emprar-se una tècnica addicional per a la visualització d'aquests resultats. D'altra banda, els simuladors del procés d'atacat sec normalment utilitzen tècniques de representació explícita per evolucionar, segons els models d'atacat, la superfície que està sent atacada. Aquesta tècnica pot introduir resultats poc realistes, sobretot en situacions complexes com per exemple la interacció de múltiples superfícies. A pesar que uns pocs models son capaços de resoldre aquests problemes, mai han sigut comparats amb experiments reals ni tampoc el rendiment computacional de les corresponents implementacions ha sigut adequadament analitzat. Les exposades limitacions son abordades en els diferents capítols de la present tesi i s'han produït les següents contribucions: - Implementació eficient del mètode LS per millorar la representació visual dels simuladors atomístics del procés d'atacat humit. - Definició d'un nou model basat en el mètode LS que puga utilitzar directament les dades experimentals de molts atacants per a simular el procés d'atacat humit de diversos materials de substrat. - Validació del simulador d'atacat humit desenvolupat comparant-lo amb resultats experimentals i amb els de simuladors atomístics. - Implementació d'una ferramenta basada en el mètode LS que evolucione la superfície que està sent atacada segons els models d'atacat sec per, d'aquesta forma, habilitar la simulació de processoMontoliu Álvaro, C. (2015). Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/58609TESI

    GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications

    Get PDF
    A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods. Special consideration will be given to performing the linear algebra algorithms under certain matrix types (banded, dense, diagonal, sparse, symmetric and triangular). Benchmarks are performed for all analyses with baseline CPU times being determined to find speed-up factors and measure computational capability of the GPU accelerated algorithms. The GPU implemented algorithms used in this work along with the optimization techniques performed are measured against preexisting work and test matrices available in the NIST Matrix Market. CFD analysis looked to strengthen the assessment of this work by providing a direct engineering application to analysis that would benefit from matrix optimization techniques and accelerated algorithms. Overall, this work desired to develop optimization for selected linear algebra and matrix computations performed with modern GPU architectures and CUDA developer which were applied directly to mathematical and engineering applications through CFD analysis

    Brain fame:From FPGA to heterogeneous acceleration of brain simulations

    Get PDF
    Among the various methods in neuroscience for understanding brain function, in-silico simulations have been gaining popularity. Advances in neuroscience and engineering led to the creation of mathematical models of networks that do not simply mimic biological behaviour in an abstract fashion but emulate its in significant detail, even to the level of its biophysical properties. Such an example is the Spiking Neural Network (SNN) that can model a variety of additional behavioural features, like encoding data and adapting according to a spike train`s amplitude, frequency and general precise pattern of arrival of spiking events on a neuron. As a result, SNNs have higher explanatory power than their predecessors, thus brain simulations based on SNNs become an attractive topic to explore. In-silico simulations of SNNs can have beneficial results not only for neuroscience research but breakthroughs can also potentially benefit medical, computing and A.I. research. SNNs, though, computationally depending workloads that traditional computing might not be able to cover. Thus, the use of High Performance Computing (HPC) platforms in this application domain becomes desirable. This dissertation explores the topic of HPC-based in-silico brain simulations. Initially, the effort focuses on custom hardware accelerators, due to their potential in providing real-time performance alongside support for large-scale non-real-time experiments and specifically Field Programmable Gate Arrays (FPGAs). The nature of FPGA-based accelerators provides specific benefits against other similar paradigms like Application Specific Integrated Circuit (ASIC) designs.Firstly, we explore the general characteristics of typical SNNs model types to identify their computational requirements in relation to their explanatory strength. We also identify major design characteristics in model development that can directly affect its performance and behaviour when ported to an HPC platform. Subsequently, a detailed literature review is made on FPGA-based SNN implementations. The HPC porting effort begins with the implementation of an extended-Hodgkin-Huxley model of the Inferior-olivary nucleus featuring advanced connectivity. The model is quite demanding and complex enough to act as a realistic benchmark for HPC implementations, while also being scientifically relevant in its own right. FPGA development shows promising performance results not only when doing custom designs but also using High-level synthesis (HLS) toolflows that significantly reduce development time. FPGAs have proven suitable for small-scale embedded-HPC uses as well. The various efforts, though, reveal a very specific weakness of FPGA development that has less to do with the silicon itself and more with its programming environment. The FPGA tools are very inaccessible to non-experts, thus any acceleration effort would require the engineer (and the FPGA development time) to be in the critical path of the research process. An important question to be answered is how the FPGA platform would compare to other popular software-based HPC solutions such as GPU- and CPU-based platforms. A detailed comparison of the best FPGA implementation with GPU and manycore-CPU ports of the same benchmark is conducted. The comparison and evaluation shows that, when it comes to real-time performance, FPGAs have a clear advantage. But for non-real-time, large scale simulations, there is no single platform that can optimally support the complete range of experiments that could be conducted with the inferior olive model. The comparison makes a clear case for BrainFrame, a platform that supports heterogeneous HPC substrates. This dissertation, thus, concludes with the proposal of the BrainFrame system. The proof-of-concept design supports standard and extended Hodgkin-Huxley models, , such as the original inferior-olive model. The system integrates a GPU-, CPU- and FPGA-based HPC back-end while also using a standard neuroscientific language front-end (PyNN) that can score best-in-class performance, alleviate some of the development hurdles and make it far more user-friendly for the typical model developer. Additionally, the multi-node potential of the platform is being explored. BrainFrame provides both a powerful heterogeneous platform for acceleration and also a front-end familiar to the neuroscientist

    Providing Information by Resource- Constrained Data Analysis

    Get PDF
    The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school
    corecore