539 research outputs found

    Proceedings, MSVSCC 2017

    Get PDF
    Proceedings of the 11th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 20, 2017 at VMASC in Suffolk, Virginia. 211 pp

    Simulation of networks of spiking neurons: A review of tools and strategies

    Full text link
    We review different aspects of the simulation of spiking neural networks. We start by reviewing the different types of simulation strategies and algorithms that are currently implemented. We next review the precision of those simulation strategies, in particular in cases where plasticity depends on the exact timing of the spikes. We overview different simulators and simulation environments presently available (restricted to those freely available, open source and documented). For each simulation tool, its advantages and pitfalls are reviewed, with an aim to allow the reader to identify which simulator is appropriate for a given task. Finally, we provide a series of benchmark simulations of different types of networks of spiking neurons, including Hodgkin-Huxley type, integrate-and-fire models, interacting with current-based or conductance-based synapses, using clock-driven or event-driven integration strategies. The same set of models are implemented on the different simulators, and the codes are made available. The ultimate goal of this review is to provide a resource to facilitate identifying the appropriate integration strategy and simulation tool to use for a given modeling problem related to spiking neural networks.Comment: 49 pages, 24 figures, 1 table; review article, Journal of Computational Neuroscience, in press (2007

    Computational Modeling of Biological Neural Networks on GPUs: Strategies and Performance

    Get PDF
    Simulating biological neural networks is an important task for computational neuroscientists attempting to model and analyze brain activity and function. As these networks become larger and more complex, the computational power required grows significantly, often requiring the use of supercomputers or compute clusters. An emerging low-cost, highly accessible alternative to many of these resources is the Graphics Processing Unit (GPU) - specialized massively-parallel graphics hardware that has seen increasing use as a general purpose computational accelerator thanks largely due to NVIDIA\u27s CUDA programming interface. We evaluated the relative benefits and limitations of GPU-based tools for large-scale neural network simulation and analysis, first by developing an agent-inspired spiking neural network simulator then by adapting a neural signal decoding algorithm. Under certain network configurations, the simulator was able to outperform an equivalent MPI-based parallel implementation run on a dedicated compute cluster, while the decoding algorithm implementation consistently outperformed its serial counterpart. Additionally, the GPU-based simulator was able to readily visualize network spiking activity in real-time due to the close integration with standard computer graphics APIs. The GPU was shown to provide significant performance benefits under certain circumstances while lagging behind in others. Given the complex nature of these research tasks, a hybrid strategy that combines GPU- and CPU-based approaches provides greater performance than either separately

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    OPTIMIZATION OF ALGORITHMS WITH THE OPAL FRAMEWORK

    Get PDF
    RÉSUMÉ La question d'identifier de bons paramètres a été étudiée depuis longtemps et on peut compter un grand nombre de recherches qui se concentrent sur ce sujet. Certaines de ces recherches manquent de généralité et surtout de re-utilisabilité. Une première raison est que ces projets visent des systèmes spécifiques. En plus, la plupart de ces projets ne se concentrent pas sur les questions fondamentales de l'identification de bons paramètres. Et enfin, il n'y avait pas un outil puissant capable de surmonter des difficulté dans ce domaine. En conséquence, malgré un grand nombre de projets, les utilisateurs n'ont pas trop de possibilité à appliquer les résultats antérieurs à leurs problèmes. Cette thèse propose le cadre OPAL pour identifier de bons paramètres algorithmiques avec des éléments essentiels, indispensables. Les étapes de l'élaboration du cadre de travail ainsi que les résultats principaux sont présentés dans trois articles correspondant aux trois chapitres 4, 5 et 6 de la thèse. Le premier article introduit le cadre par l'intermédiaire d'exemples fondamentaux. En outre, dans ce cadre, la question d'identifier de bons paramètres est modélisée comme un problème d'optimisation non-lisse qui est ensuite résolu par un algorithme de recherche directe sur treillis adaptatifs. Cela réduit l'effort des utilisateurs pour accomplir la tâche d'identifier de bons paramètres. Le deuxième article décrit une extension visant à améliorer la performance du cadre OPAL. L'utilisation efficace de ressources informatiques dans ce cadre se fait par l'étude de plusieurs stratégies d'utilisation du parallélisme et par l'intermédiaire d'une fonctionnalité particulière appelée l'interruption des tâches inutiles. Le troisième article est une description complète du cadre et de son implémentation en Python. En plus de rappeler les caractéristiques principales présentées dans des travaux antérieurs, l'intégration est présentée comme une nouvelle fonctionnalité par une démonstration de la coopération avec un outil de classification. Plus précisément, le travail illustre une coopération de OPAL et un outil de classification pour résoudre un problème d'optimisation des paramètres dont l'ensemble de problèmes tests est trop grand et une seule évaluation peut prendre une journée.----------ABSTRACT The task of parameter tuning question has been around for a long time, spread over most domains and there have been many attempts to address it. Research on this question often lacks in generality and re-utilisability. A first reason is that these projects aim at specific systems. Moreover, some approaches do not concentrate on the fundamental questions of parameter tuning. And finally, there was not a powerful tool that is able to take over the difficulties in this domain. As a result, the number of projects continues to grow, while users are not able to apply the previous achievements to their own problem. The present work systematically approaches parameter tuning by figuring out the fundamental issues and identifying the basic elements for a general system. This provides the base for developing a general and flexible framework called OPAL, which stands for OPtimization of ALgorithms. The milestones in developing the framework as well as the main achievements are presented through three papers corresponding to the three chapters 4, 5 and 6 of this thesis. The first paper introduces the framework by describing the crucial basic elements through some very simple examples. To this end, the paper considers three questions in constructing an automated parameter tuning framework. By answering these questions, we propose OPAL, consisting of indispensable components of a parameter tuning framework. OPAL models the parameter tuning task as a blackbox optimization problem. This reduces the effort of users in launching a tuning session. The second paper shows one of the opportunities to extend the framework. To take advantage of the situations where multiple processors are available, we study various ways of embedding parallelism and develop a feature called ''interruption of unnecessary tasks'' in order to improve performance of the framework. The third paper is a full description of the framework and a release of its Python} implementation. In addition to the confirmations on the methodology and the main features presented in previous works, the integrability is introduced as a new feature of this release through an example of the cooperation with a classification tool. More specifically, the work illustrates a cooperation of OPAL and a classification tool to solve a parameter optimization problem of which the test problem set is too large and an assessment can take a day

    Proceedings, MSVSCC 2018

    Get PDF
    Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    Heuristinen yhteistyöhaku ohjelmistoagenttien avulla

    Get PDF
    Parallel algorithms extend the notion of sequential algorithms by permitting the simultaneous execution of independent computational steps. When the independence constraint is lifted and executions can freely interact and intertwine, parallel algorithms become concurrent and may behave in a nondeterministic way. Parallelism has over the years slowly risen to be a standard feature of high-performance computing, but concurrency, being even harder to reason about, is still considered somewhat notorious and undesirable. As such, the implicit randomness available in concurrency is rarely made use of in algorithms. This thesis explores concurrency as a means to facilitate algorithmic cooperation in a heuristic search setting. We use agents, cooperating software entities, to build a single-source shortest path (SSSP) search algorithm based on parallelized A∗, dubbed A!. We show how asynchronous information sharing gives rise to implicit randomness, which cooperating agents use in A! to maintain a collective secondary ranking heuristic and focus search space exploration. We experimentally show that A! consistently outperforms both vanilla A∗ and a noncooperative, explicitly randomized A∗ variant in the standard n-puzzle sliding tile problem context. The results indicate that A! performance increases with the addition of more agents, but that the returns are diminishing. A! is observed to be sensitive to heuristic improvement, but also constrained by search overhead from limited path diversity. A hybrid approach combining both implicit and explicit randomness is also evaluated and found to not be an improvement over A! alone. The studied A! implementation based on vanilla A∗ is not as such competitive against state-of-the-art parallel A∗ algorithms, but rather a first step in applying concurrency to speed up heuristic SSSP search. The empirical results imply that concurrency and nondeterministic cooperation can successfully be harnessed in algorithm design, inviting further inquiry into algorithms of this kind.Rinnakkaisalgoritmit sallivat useiden riippumattomien ohjelmakäskyjen suorittamisen samanaikaisesti. Kun riippumattomuusrajoite poistetaan ja käskyjen suorittamisen järjestystä ei hallita, rinnakkaisalgoritmit voivat käskysuoritusten samanaikaisuuden vuoksi käyttäytyä epädeterministisellä tavalla. Rinnakkaisuus on vuosien saatossa noussut tärkeään rooliin tietotekniikassa ja samalla hallitsematonta samanaikaisuutta on yleisesti alettu pitää ongelmallisena ja ei-toivottuna. Samanaikaisuudesta kumpuavaa epäsuoraa satunnaisuutta hyödynnetään harvoin algoritmeissa. Tämä työ käsittelee käskysuoritusten samanaikaisuuden hyödyntämistä osana heuristista yhteistyöhakua. Työssä toteutetaan agenttien, yhteistyökykyisten ohjelmistokomponenttien, avulla uudenlainen A!-hakualgoritmi. A! perustuu rinnakkaiseen A∗ -algoritmiin, joka ratkaisee yhden lähteen lyhimmän polun hakuongelman. Työssä näytetään, miten ajastamaton viestintä agenttien välillä johtaa epäsuoraan satunnaisuuteen, jota A!-agentit kollektiivisesti hyödyntävät toissijaisen järjestämisheuristiikan ylläpitämisessä ja edelleen haun kohdentamisessa. Työssä näytetään kokeellisesti, kuinka A! suoriutuu niin tavanomaista kuin satunnaistettuakin A∗ -algoritmia paremmin n-puzzle pulmapelin ratkaisemisessa. Tulokset osoittavat, että A!-algoritmin suorituskyky kasvaa lisäagenttien myötä, mutta myös sen, että hyöty on joka lisäyksen jälkeen suhteellisesti pienempi. A! osoittautuu heuristiikan hyödyntämisen osalta verrokkeja herkemmäksi, mutta myös etsintäpolkujen monimuotoisuuden kannalta vaatimattomaksi. Yksinkertaisen suoraa ja epäsuoraa satunnaisuutta yhdistävän hybridialgoritmin ei todeta tuovan lisäsuorituskykyä A!-algoritmiin verrattuna. Empiiriset kokeet osoittavat, että hallitsematonta samanaikaisuutta ja epädeterminististä yhteistyötä voi onnistuneesti hyödyntää algoritmisuunnittelussa, mikä kannustaa lisätutkimuksiin näitä soveltavan algoritmiikan parissa
    corecore