68 research outputs found

    Data-parallel concurrent constraint programming.

    Get PDF
    by Bo-ming Tong.Thesis (M.Phil.)--Chinese University of Hong Kong, 1994.Includes bibliographical references (leaves 104-[110]).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Concurrent Constraint Programming --- p.2Chapter 1.2 --- Finite Domain Constraints --- p.3Chapter 2 --- The Firebird Language --- p.5Chapter 2.1 --- Finite Domain Constraints --- p.6Chapter 2.2 --- The Firebird Computation Model --- p.6Chapter 2.3 --- Miscellaneous Features --- p.7Chapter 2.4 --- Clause-Based N on determinism --- p.9Chapter 2.5 --- Programming Examples --- p.10Chapter 2.5.1 --- Magic Series --- p.10Chapter 2.5.2 --- Weak Queens --- p.14Chapter 3 --- Operational Semantics --- p.15Chapter 3.1 --- The Firebird Computation Model --- p.16Chapter 3.2 --- The Firebird Commit Law --- p.17Chapter 3.3 --- Derivation --- p.17Chapter 3.4 --- Correctness of Firebird Computation Model --- p.18Chapter 4 --- Exploitation of Data-Parallelism in Firebird --- p.24Chapter 4.1 --- An Illustrative Example --- p.25Chapter 4.2 --- Mapping Partitions to Processor Elements --- p.26Chapter 4.3 --- Masks --- p.27Chapter 4.4 --- Control Strategy --- p.27Chapter 4.4.1 --- A Control Strategy Suitable for Linear Equations --- p.28Chapter 5 --- Data-Parallel Abstract Machine --- p.30Chapter 5.1 --- Basic DPAM --- p.31Chapter 5.1.1 --- Hardware Requirements --- p.31Chapter 5.1.2 --- Procedure Calling Convention And Process Creation --- p.32Chapter 5.1.3 --- Memory Model --- p.34Chapter 5.1.4 --- Registers --- p.41Chapter 5.1.5 --- Process Management --- p.41Chapter 5.1.6 --- Unification --- p.49Chapter 5.1.7 --- Variable Table --- p.49Chapter 5.2 --- DPAM with Backtracking --- p.50Chapter 5.2.1 --- Choice Point --- p.52Chapter 5.2.2 --- Trailing --- p.52Chapter 5.2.3 --- Recovering the Process Queues --- p.57Chapter 6 --- Implementation --- p.58Chapter 6.1 --- The DECmpp Massively Parallel Computer --- p.58Chapter 6.2 --- Implementation Overview --- p.59Chapter 6.3 --- Constraints --- p.60Chapter 6.3.1 --- Breaking Down Equality Constraints --- p.61Chapter 6.3.2 --- Processing the Constraint 'As Is' --- p.62Chapter 6.4 --- The Wide-Tag Architecture --- p.63Chapter 6.5 --- Register Window --- p.64Chapter 6.6 --- Dereferencing --- p.65Chapter 6.7 --- Output --- p.66Chapter 6.7.1 --- Collecting the Solutions --- p.66Chapter 6.7.2 --- Decoding the solution --- p.68Chapter 7 --- Performance --- p.69Chapter 7.1 --- Uniprocessor Performance --- p.71Chapter 7.2 --- Solitary Mode --- p.73Chapter 7.3 --- Bit Vectors of Domain Variables --- p.75Chapter 7.4 --- Heap Consumption of the Heap Frame Scheme --- p.77Chapter 7.5 --- Eager Nondeterministic Derivation vs Lazy Nondeterministic Deriva- tion --- p.78Chapter 7.6 --- Priority Scheduling --- p.79Chapter 7.7 --- Execution Profile --- p.80Chapter 7.8 --- Effect of the Number of Processor Elements on Performance --- p.82Chapter 7.9 --- Change of the Degree of Parallelism During Execution --- p.84Chapter 8 --- Related Work --- p.88Chapter 8.1 --- Vectorization of Prolog --- p.89Chapter 8.2 --- Parallel Clause Matching --- p.90Chapter 8.3 --- Parallel Interpreter --- p.90Chapter 8.4 --- Bounded Quantifications --- p.91Chapter 8.5 --- SIMD MultiLog --- p.91Chapter 9 --- Conclusion --- p.93Chapter 9.1 --- Limitations --- p.94Chapter 9.1.1 --- Data-Parallel Firebird is Specialized --- p.94Chapter 9.1.2 --- Limitations of the Implementation Scheme --- p.95Chapter 9.2 --- Future Work --- p.95Chapter 9.2.1 --- Extending Firebird --- p.95Chapter 9.2.2 --- Improvements Specific to DECmpp --- p.99Chapter 9.2.3 --- Labeling --- p.100Chapter 9.2.4 --- Parallel Domain Consistency --- p.101Chapter 9.2.5 --- Branch and Bound Algorithm --- p.102Chapter 9.2.6 --- Other Possible Future Work --- p.102Bibliography --- p.10

    Simty: generalized SIMT execution on RISC-V

    Get PDF
    International audienceWe present Simty, a massively multi-threaded RISC-V processor core that acts as a proof of concept for dynamic inter-thread vector-ization at the micro-architecture level. Simty runs groups of scalar threads executing SPMD code in lockstep, and assembles SIMD instructions dynamically across threads. Unlike existing SIMD or SIMT processors like GPUs or vector processors, Simty vector-izes scalar general-purpose binaries. It does not involve any instruction set extension or compiler change. Simty is described in synthesizable RTL. A FPGA prototype validates its scaling up to 2048 threads per core with 32-wide SIMD units. Simty provides an open platform for research on GPU micro-architecture, on hybrid CPU-GPU micro-architecture, or on heterogeneous platforms with throughput-optimized cores

    Combiner approches statique et dynamique pour modéliser la performance de boucles HPC

    Get PDF
    The complexity of CPUs has increased considerably since their beginnings, introducing mechanisms such as register renaming, out-of-order execution, vectorization,prefetchers and multi-core environments to keep performance rising with each product generation. However, so has the difficulty in making proper use of all these mechanisms, or even evaluating whether one’s program makes good use of a machine,whether users’ needs match a CPU’s design, or, for CPU architects, knowing how each feature really affects customers.This thesis focuses on increasing the observability of potential bottlenecks inHPC computational loops and how they relate to each other in modern microarchitectures.We will first introduce a framework combining CQA and DECAN (respectively static and dynamic analysis tools) to get detailed performance metrics on smallcodelets in various execution scenarios.We will then present PAMDA, a performance analysis methodology leveraging elements obtained from codelet analysis to detect potential performance problems in HPC applications and help resolve them. A work extending the Cape linear model to better cover Sandy Bridge and give it more flexibility for HW/SW codesign purposes will also be described. It will bedirectly used in VP3, a tool evaluating the performance gains vectorizing loops could provide.Finally, we will describe UFS, an approach combining static analysis and cycle accurate simulation to very quickly estimate a loop’s execution time while accounting for out-of-order limitations in modern CPUsLa complexité des CPUs s’est accrue considérablement depuis leurs débuts, introduisant des mécanismes comme le renommage de registres, l’exécution dans le désordre, la vectorisation, les préfetchers et les environnements multi-coeurs pour améliorer les performances avec chaque nouvelle génération de processeurs. Cependant, la difficulté a suivi la même tendance pour ce qui est a) d’utiliser ces mêmes mécanismes à leur plein potentiel, b) d’évaluer si un programme utilise une machine correctement, ou c) de savoir si le design d’un processeur répond bien aux besoins des utilisateurs.Cette thèse porte sur l’amélioration de l’observabilité des facteurs limitants dans les boucles de calcul intensif, ainsi que leurs interactions au sein de microarchitectures modernes.Nous introduirons d’abord un framework combinant CQA et DECAN (des outils d’analyse respectivement statique et dynamique) pour obtenir des métriques détaillées de performance sur des petits codelets et dans divers scénarios d’exécution.Nous présenterons ensuite PAMDA, une méthodologie d’analyse de performance tirant partie de l’analyse de codelets pour détecter d’éventuels problèmes de performance dans des applications de calcul à haute performance et en guider la résolution.Un travail permettant au modèle linéaire Cape de couvrir la microarchitecture Sandy Bridge de façon détaillée sera décrit, lui donnant plus de flexibilité pour effectuer du codesign matériel / logiciel. Il sera mis en pratique dans VP3, un outil évaluant les gains de performance atteignables en vectorisant des boucles.Nous décrirons finalement UFS, une approche combinant analyse statique et simulation au cycle près pour permettre l’estimation rapide du temps d’exécution d’une boucle en prenant en compte certaines des limites de l’exécution en désordre dans des microarchitectures moderne

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    Dynamic Dependency Collapsing

    Get PDF
    In this dissertation, we explore the concept of dynamic dependency collapsing. Performance increases in computer architecture are always introduced by exploiting additional parallelism when the clock speed is fixed. We show that further improvements are possible even when the available parallelism in programs are exhausted. This performance improvement is possible due to executing instructions in parallel that would ordinarily have been serialized. We call this concept dependency collapsing. We explore existing techniques that exploit parallelism and show which of them fall under the umbrella of dependency collapsing. We then introduce two dependency collapsing techniques of our own. The first technique collapses data dependencies by executing two normally dependent instructions together by fusing them. We show that exploiting the additional parallelism generated by collapsing these dependencies results in a performance increase. Our second technique collapses resource dependencies to execute instructions that would normally have been serialized due to resource constraints in the processor. We show that it is possible to take advantage of larger in-processor structures while avoiding the power and area penalty this often implies

    High-level synthesis of dataflow programs for heterogeneous platforms:design flow tools and design space exploration

    Get PDF
    The growing complexity of digital signal processing applications implemented in programmable logic and embedded processors make a compelling case the use of high-level methodologies for their design and implementation. Past research has shown that for complex systems, raising the level of abstraction does not necessarily come at a cost in terms of performance or resource requirements. As a matter of fact, high-level synthesis tools supporting such a high abstraction often rival and on occasion improve low-level design. In spite of these successes, high-level synthesis still relies on programs being written with the target and often the synthesis process, in mind. In other words, imperative languages such as C or C++, most used languages for high-level synthesis, are either modified or a constrained subset is used to make parallelism explicit. In addition, a proper behavioral description that permits the unification for hardware and software design is still an elusive goal for heterogeneous platforms. A promising behavioral description capable of expressing both sequential and parallel application is RVC-CAL. RVC-CAL is a dataflow programming language that permits design abstraction, modularity, and portability. The objective of this thesis is to provide a high-level synthesis solution for RVC-CAL dataflow programs and provide an RVC-CAL design flow for heterogeneous platforms. The main contributions of this thesis are: a high-level synthesis infrastructure that supports the full specification of RVC-CAL, an action selection strategy for supporting parallel read and writes of list of tokens in hardware synthesis, a dynamic fine-grain profiling for synthesized dataflow programs, an iterative design space exploration framework that permits the performance estimation, analysis, and optimization of heterogeneous platforms, and finally a clock gating strategy that reduces the dynamic power consumption. Experimental results on all stages of the provided design flow, demonstrate the capabilities of the tools for high-level synthesis, software hardware Co-Design, design space exploration, and power optimization for reconfigurable hardware. Consequently, this work proves the viability of complex systems design and implementation using dataflow programming, not only for system-level simulation but real heterogeneous implementations

    Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

    Get PDF
    [[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI

    Higher-order Voronoi diagrams of polygonal objects

    Get PDF
    Higher-order Voronoi diagrams are fundamental geometric structures which encode the k-nearest neighbor information. Thus, they aid in computations that require proximity information beyond the nearest neighbor. They are related to various favorite structures in computational geometry and are a fascinating combinatorial problem to study. While higher-order Voronoi diagrams of points have been studied a lot, they have not been considered for other types of sites. Points lack dimensionality which makes them unable to represent various real-life instances. Points are the simplest kind of geometric object and therefore higher- order Voronoi diagrams of points can be considered as the corner case of all higher-order Voronoi diagrams. The goal of this dissertation is to move away from the corner and bring the higher-order Voronoi diagram to more general geometric instances. We focus on certain polygonal objects as they provide flexibility and are able to represent real-life instances. Before this dissertation, higher-order Voronoi diagrams of polygonal objects had been studied only for the nearest neighbor and farthest Voronoi diagrams. In this dissertation we investigate structural and combinatorial properties and discover that the dimensionality of geometric objects manifests itself in numerous ways which do not exist in the case of points. We prove that the structural complexity of the order-k Voronoi diagram of non-crossing line segments is O(k(n-k)), as in the case of points. We study disjoint line segments, intersecting line segments, line segments forming a planar straight-line graph and extend the results to the Lp metric, 1<=p<=infty. We also establish the connection between two mathematical abstractions: abstract Voronoi diagrams and the Clarkson-Shor framework. We design several construction algorithms that cover the case of non-point sites. While computational geometry provides several approaches to study the structural complexity that give tight realizable bounds, developing an effective construction algorithm is still a challenging problem even for points. Most of the construction algorithms are designed to work with points as they utilize their simplicity and relations with data-structures that work specifically for points. We extend the iterative and the sweepline approaches that are quite efficient in constructing all order-i Voronoi diagrams, for i<=k and we also give three randomized construction algorithms for abstract higher-order Voronoi diagrams that deal specifically with the construction of the order-k Voronoi diagrams

    Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

    Get PDF
    The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RésuméLa quantité de données produites, que ce soit dans la communauté scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a émergéface au traitement de grandes quantités de données sur les infrastructures informatiques distribuées. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisées pour l’exécution de charges de travail intensives en calcul. Cependant, la communauté HPC fait également face à un nombre croissant debesoin de traitement de grandes quantités de données dérivées de capteurs hautedéfinition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communauté HPC utilise déjà des outilsBig Data, qui ne sont pas toujours correctement intégrés, en particulier au niveaudu système de fichiers ainsi que du système de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les défis pour les infrastructures HPC, nousavons étudié plusieurs aspects de la convergence: nous avons d’abord proposé uneétude sur les méthodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de données. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelée BeBiDa basée sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous évaluons ce mécanisme en conditions réelles et en environnement simuléavec notre simulateur Batsim. En outre, nous fournissons des extensions à Batsimpour prendre en charge les entrées/sorties et présentons le développements d’unmodèle de système de fichiers générique accompagné d’un modèle d’applicationBig Data. Cela nous permet de compléter les expériences en conditions réellesde BeBiDa en simulation tout en étudiant le dimensionnement et les différentscompromis autours des systèmes de fichiers.Toutes les expériences et analyses de ce travail ont été effectuées avec la reproductibilité à l’esprit. Sur la base de cette expérience, nous proposons d’intégrerle flux de travail du développement et de l’analyse des données dans l’esprit dela reproductibilité, et de donner un retour sur nos expériences avec une liste debonnes pratiques
    • …
    corecore