9 research outputs found

    Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysis

    Full text link
    In high performance computing environments, we observe an ongoing increase in the available numbers of cores. This development calls for re-emphasizing performance (scalability) analysis and speedup laws as suggested in the literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic performance. Understanding speedup and efficiency issues of algorithmic parallelism is useful for several purposes, including the optimization of system operations, temporal predictions on the execution of a program, and the analysis of asymptotic properties and the determination of speedup bounds. However, the literature is fragmented and shows a large diversity and heterogeneity of speedup models and laws. These phenomena make it challenging to obtain an overview of the models and their relationships, to identify the determinants of performance in a given algorithmic and computational context, and, finally, to determine the applicability of performance models and laws to a particular parallel computing setting. In this work, we provide a generic speedup (and thus also efficiency) model for homogeneous computing environments. Our approach generalizes many prominent models suggested in the literature and allows showing that they can be considered special cases of a unifying approach. The genericity of the unifying speedup model is achieved through parameterization. Considering combinations of parameter ranges, we identify six different asymptotic speedup cases and eight different asymptotic efficiency cases. Jointly applying these speedup and efficiency cases, we derive eleven scalability cases, from which we build a scalability typology. Researchers can draw upon our typology to classify their speedup model and to determine the asymptotic behavior when the number of parallel processing units increases. In addition, our results may be used to address various extensions of our setting

    Investigation into scalable energy and performance models for many-core systems

    Get PDF
    PhD ThesisIt is likely that many-core processor systems will continue to penetrate emerging embedded and high-performance applications. Scalable energy and performance models are two critical aspects that provide insights into the conflicting trade-offs between them with growing hardware and software complexity. Traditional performance models, such as Amdahl’s Law, Gustafson’s and Sun-Ni’s, have helped the research community and industry to better understand the system performance bounds with given processing resources, which is otherwise known as speedup. However, these models and their existing extensions have limited applicability for energy and/or performance-driven system optimization in practical systems. For instance, these are typically based on software characteristics, assuming ideal and homogeneous hardware platforms or limited forms of processor heterogeneity. In addition, the measurement of speedup and parallelization factors of an application running on a specific hardware platform require instrumenting the original software codes. Indeed, practical speedup and parallelizability models of application workloads running on modern heterogeneous hardware are critical for energy and performance models, as they can be used to inform design and control decisions with an aim to improve system throughput and energy efficiency. This thesis addresses the limitations by firstly developing novel and scalable speedup and energy consumption models based on a more general representation of heterogeneity, referred to as the normal form heterogeneity. A method is developed whereby standard performance counters found in modern many-core platforms can be used to derive speedup, and therefore the parallelizability of the software, without instrumenting applications. This extends the usability of the new models to scenarios where the parallelizability of software is unknown, leading to potentially Run-Time Management (RTM) speedup and/or energy efficiency optimization. The models and optimization methods presented in this thesis are validated through extensive experimentation, by running a number of different applications in wide-ranging concurrency scenarios on a number of different homogeneous and heterogeneous Multi/Many Core Processor (M/MCP) systems. These include homogeneous and heterogeneous architectures and viii range from existing off-the-shelf platforms to potential future system extensions. The practical use of these models and methods is demonstrated through real examples such as studying the effectiveness of the system load balancer. The models and methodologies proposed in this thesis provide guidance to a new opportunities for improving the energy efficiency of M/MCP systemsHigher Committee of Education Development (HCED) in Ira

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    Many-core architectures with time predictable execution Support for hard real-time applications

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 183-193).Hybrid control systems are a growing domain of application. They are pervasive and their complexity is increasing rapidly. Distributed control systems for future "Intelligent Grid" and renewable energy generation systems are demanding high-performance, hard real-time computation, and more programmability. General-purpose computer systems are primarily designed to process data and not to interact with physical processes as required by these systems. Generic general-purpose architectures even with the use of real-time operating systems fail to meet the hard realtime constraints of hybrid system dynamics. ASIC, FPGA, or traditional embedded design approaches to these systems often result in expensive, complicated systems that are hard to program, reuse, or maintain. In this thesis, we propose a domain-specific architecture template targeting hybrid control system applications. Using power electronics control applications, we present new modeling techniques, synthesis methodologies, and a parameterizable computer architecture for these large distributed control systems. We propose a new system modeling approach, called Adaptive Hybrid Automaton, based on previous work in control system theory, that uses a mixed-model abstractions and lends itself well to digital processing. We develop a domain-specific architecture based on this modeling that uses heterogeneous processing units and predictable execution, called MARTHA. We develop a hard real-time aware router architecture to enable deterministic on-chip interconnect network communication. We present several algorithms for scheduling task-based applications onto these types of heterogeneous architectures. We create Heracles, an open-source, functional, parameterized, synthesizable many-core system design toolkit, that can be used to explore future multi/many-core processors with different topologies, routing schemes, processing elements or cores, and memory system organizations. Using the Heracles design tool we build a prototype of the proposed architecture using a state-of-the-art FPGA-based platform, and deploy and test it in actual physical power electronics systems. We develop and release an open-source, small representative set of power electronics system applications that can be used for hard real-time application benchmarking.by Michel A. Kinsy.Ph.D

    Parallel Triplet Finding for Particle Track Reconstruction. [Mit einer ausführlichen deutschen Zusammenfassung]

    Get PDF

    Efficient software implementation of elliptic curves and bilinear pairings

    Get PDF
    Orientador: Júlio César Lopez HernándezTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O advento da criptografia assimétrica ou de chave pública possibilitou a aplicação de criptografia em novos cenários, como assinaturas digitais e comércio eletrônico, tornando-a componente vital para o fornecimento de confidencialidade e autenticação em meios de comunicação. Dentre os métodos mais eficientes de criptografia assimétrica, a criptografia de curvas elípticas destaca-se pelos baixos requisitos de armazenamento para chaves e custo computacional para execução. A descoberta relativamente recente da criptografia baseada em emparelhamentos bilineares sobre curvas elípticas permitiu ainda sua flexibilização e a construção de sistemas criptográficos com propriedades inovadoras, como sistemas baseados em identidades e suas variantes. Porém, o custo computacional de criptossistemas baseados em emparelhamentos ainda permanece significativamente maior do que os assimétricos tradicionais, representando um obstáculo para sua adoção, especialmente em dispositivos com recursos limitados. As contribuições deste trabalho objetivam aprimorar o desempenho de criptossistemas baseados em curvas elípticas e emparelhamentos bilineares e consistem em: (i) implementação eficiente de corpos binários em arquiteturas embutidas de 8 bits (microcontroladores presentes em sensores sem fio); (ii) formulação eficiente de aritmética em corpos binários para conjuntos vetoriais de arquiteturas de 64 bits e famílias mais recentes de processadores desktop dotadas de suporte nativo à multiplicação em corpos binários; (iii) técnicas para implementação serial e paralela de curvas elípticas binárias e emparelhamentos bilineares simétricos e assimétricos definidos sobre corpos primos ou binários. Estas contribuições permitiram obter significativos ganhos de desempenho e, conseqüentemente, uma série de recordes de velocidade para o cálculo de diversos algoritmos criptográficos relevantes em arquiteturas modernas que vão de sistemas embarcados de 8 bits a processadores com 8 coresAbstract: The development of asymmetric or public key cryptography made possible new applications of cryptography such as digital signatures and electronic commerce. Cryptography is now a vital component for providing confidentiality and authentication in communication infra-structures. Elliptic Curve Cryptography is among the most efficient public-key methods because of its low storage and computational requirements. The relatively recent advent of Pairing-Based Cryptography allowed the further construction of flexible and innovative cryptographic solutions like Identity-Based Cryptography and variants. However, the computational cost of pairing-based cryptosystems remains significantly higher than traditional public key cryptosystems and thus an important obstacle for adoption, specially in resource-constrained devices. The main contributions of this work aim to improve the performance of curve-based cryptosystems, consisting of: (i) efficient implementation of binary fields in 8-bit microcontrollers embedded in sensor network nodes; (ii) efficient formulation of binary field arithmetic in terms of vector instructions present in 64-bit architectures, and on the recently-introduced native support for binary field multiplication in the latest Intel microarchitecture families; (iii) techniques for serial and parallel implementation of binary elliptic curves and symmetric and asymmetric pairings defined over prime and binary fields. These contributions produced important performance improvements and, consequently, several speed records for computing relevant cryptographic algorithms in modern computer architectures ranging from embedded 8-bit microcontrollers to 8-core processorsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Instruction-set architecture synthesis for VLIW processors

    Get PDF

    Computational Methods for Medical and Cyber Security

    Get PDF
    Over the past decade, computational methods, including machine learning (ML) and deep learning (DL), have been exponentially growing in their development of solutions in various domains, especially medicine, cybersecurity, finance, and education. While these applications of machine learning algorithms have been proven beneficial in various fields, many shortcomings have also been highlighted, such as the lack of benchmark datasets, the inability to learn from small datasets, the cost of architecture, adversarial attacks, and imbalanced datasets. On the other hand, new and emerging algorithms, such as deep learning, one-shot learning, continuous learning, and generative adversarial networks, have successfully solved various tasks in these fields. Therefore, applying these new methods to life-critical missions is crucial, as is measuring these less-traditional algorithms' success when used in these fields

    Globally Optimal Catalysts: Computerbasierte Optimierung von abstrakten katalytischen Einbettungen für beliebige chemische Reaktionen

    Get PDF
    In the context of inverse design of molecules with desired optimal properties, the long-term goal of this Thesis is to develop a general framework which tackles the design of molecular systems for an optimal catalytic effect onto arbitrary chemical reactions. For any given reaction, an arrangement of an additional molecular framework around this reaction center is sought such that the energetic reaction barrier is lowered as much as possible. As necessary abstraction layer, the so-called globally optimal catalyst (GOCAT) model is introduced, and, furthermore, evolutionary algorithms (EAs) are harnessed as implemented in our global optimization suite for chemical problems, ogolem, which was highly extended to allow for these catalysis optimizations. Starting with a maximally reductionistic approach for studying the non-bonding interactions, electrostatic GOCATs are introduced that consist of arbitrary numbers, distributions and strengths of partial point charges around reacting molecules, mostly surrounding these on a common exposed surface. In the end, two reactions are studied in detail within the general topic of electrostatic catalysis. Some of the initially present model approximations are already sufficiently lifted, still-existing ones are critically assessed and further future extensions to the framework are discussed. Moreover, many method development matters are addressed: They range from optimal shared-memory parallelization, exemplified for global parameter optimization of the reactive force field, ReaxFF, via diversity control parameters for the EAs, applied to a cluster structure optimization problem, to EA operator benchmarks and optimizations of abstract electrostatics.Im Kontext von inversem Design von Molekülen mit optimalen Eigenschaften versucht die vorliegende Arbeit als Langzeitziel eine passende Plattform zu entwickeln, welche das generelle Design molekularer Systeme für einen optimalen Katalyseeffekt auf beliebige chemische Reaktionen projektiert. Für eine gegeben Reaktion soll eine hinzukommende chemische Umgebung komponiert werden, welche die Reaktionsenergiebarriere so weit wie möglich vermindert. Als notwendige Abstraktionsschicht wird das sogenannte Modell des globally optimal catalyst (GOCAT) eingeführt und außerdem kommen Evolutionäre Algorithmen (EAs) zur Anwendung, wie sie bereits in unserem Programmpaket zur Lösung allgemeiner globaler Optimierungsprobleme der Chemie, ogolem, bereitgestellt werden, welches jedoch deutlich für diese Katalyseoptimierungen ergänzt wurde. Angefangen in einem maximal-reduktionistischen Ansatz werden elektrostatische GOCATs erarbeitet, die aus einer beliebigen Anzahl, Verteilung und Stärke von Partialladungen bestehen und rund um die reagierenden Moleküle drapiert werden, meist auf einer gemeinsamen exponierten Oberfläche. Insgesamt werden zwei Reaktionen detailliert untersucht im generellen Kontext von elektrostatischer Katalyse. Einige eingangs vorhandene Modellannahmen werden bereits systematisch verbessert, noch vorhandene kritisch beleuchtet und künftige Erweiterungen auseinandergesetzt. Weiterhin werden unterschiedliche Methodenentwicklungsaspekte angesprochen: Diese reichen von verbesserter Parallelisierung in Mehrprozessorarchitekturen, beispielhaft gezeigt anhand einer globalen Parameteroptimierung des reaktiven Kraftfeldes ReaxFF, über Diversitätskontrollparameter des EAs, illustriert mittels eines Clusterstrukturoptimierungsproblems, bis hin zu EA-Operator-Testevaluationen und allgemeinen abstrakten Elektrostatikoptimierungen
    corecore