Search CORE

11 research outputs found

ФОРМАЛИЗОВАННОЕ ПОЛУЧЕНИЕ КОММУНИКАЦИОННЫХ ОПЕРАЦИЙ ПАРАЛЛЕЛЬНЫХ ЗЕРНИСТЫХ АЛГОРИТМОВ

Author: A. Tolstsikau A.
N. Likhoded A.
А. Толстиков А.
Н. Лиходед А.
Publication venue: 'Publishing House Belorusskaya Nauka'
Publication date: 05/04/2018
Field of study

Algorithms designed for implementation on parallel computers with distributed memory consist of computational macro operations (calculation grains) and communication operations specifying the data arrays exchange between computing nodes. The major difficulty is how to find an efficient way to organize the data exchange. To solve this problem, it is first necessary to identify information dependences between macro operations and then to generate the communication operations caused by these dependences. To automate and simplify the process of code generation, it is necessary to formalize communication operations. The formalization is known for the case of homogeneous information dependences. Such formalization uses the vectors of global dependences as a representation of dependences between the calculation grains. Also, there is a way that makes it possible to obtain the data arrays exchange, but it requires the usage of tools to work with polyhedra and does not formalize communication operations. This article presents a formalization method and a method of inclusion of communication operations into the algorithm structure (receiving and sending data arrays) in case of a parallel algorithm with affine dependences. The usage of functions determining the relationship between macro operations allowed obtaining explicit representations of communication operations. This work is a generalization of the formalization of the operations of sending data in a parallel algorithm, where operations are not divided into macro operations, as well as a generalization of some aspects of obtaining the communication operation method. Алгоритмы, предназначенные для реализации на параллельных компьютерах с распределенной памятью, включают в себя вычислительные макрооперации (зерна вычислений), а также коммуникационные операции, которые в явном виде задают обмен массивами данных между вычислительными узлами. Наибольшие затруднения вызывает, как правило, задача организации обмена данными. Для ее решения надо сначала выявить информационные зависимости между макрооперациями, а затем сгенерировать порождаемые этими зависимостями коммуникационные операции. Для автоматизации и упрощения процесса генерации кода необходима формализация получения коммуникационных операций. Такая формализация известна для случая однородных информационных зависимостей. Она использует представление зависимостей между макрооперациями векторами глобальных зависимостей. Известны также результаты, которые позволяют получить пересылаемые массивы данных, но при этом требуют применения инструментальных средств для работы с многогранниками и не формализуют коммуникационные операции. В настоящем исследовании представлен способ формализации и включения в структуру алгоритма коммуникационных операций получения и отправки массивов данных в параллельном алгоритме с аффинными зависимостями. Применение функций, определяющих зависимости между макрооперациями, позволило в алгоритме, задающем параллельные вычислительные процессы, получить явные представления коммуникационных операций. Исследования являются обобщением формализации операций пересылки элементов массивов в параллельном алгоритме, операции которого не разбиты на макрооперации, а также некоторых аспектов метода получения коммуникационных операций, определяемых однородными информационными зависимостями

Proceedings of the National Academy of Sciences of Belarus. Series of Physical-Mathematical Sciences / Известия Национальной академии наук Беларуси. Серия физико-математических наук

ПАРАЛЛЕЛЬНЫЕ ВЕРСИИ РЕАЛИЗАЦИИ МНОГОМЕРНЫХ ЦИКЛОВ

Author: А. Толстиков А.
Н. Лиходед А.
Publication venue: UIIP NASB
Publication date: 01/11/2018
Field of study

Формулируются и доказываются условия, при выполнении которых параллельные версии алгоритмов, заданных вложенными циклами, можно получить незначительной модификацией исходного последовательного алгоритма. Исследуются загруженность процессоров и задача выбора зерна вычислений

Informatics (E-Journal) / Информатика

Adding Parallelism to Sequential Programs – a Combined Method

Author: Czejdo Denny Bogdan
Daszczuk Wiktor Bohdan
Grześkowiak Wojciech
Publication venue: Electronics and Telecommunications Committee
Publication date: 15/04/2024
Field of study

The article outlines a contemporary method for creating software for multi-processor computers. It describes the identification of parallelizable sequential code structures. Three structures were found and then carefully examined. The algorithms used to determine whether or not certain parts of code may be parallelized result from static analysis. The techniques demonstrate how, if possible, existing sequential structures might be transformed into parallel-running programs. A dynamic evaluation is also a part of our process, and it can be used to assess the efficiency of the parallel programs that are developed. As a tool for sequential programs, the algorithms have been implemented in C#. All proposed methods were discussed using a common benchmark

International Journal of Electronics and Telecommunications (Warsaw University of Technology)

Automatic mapping of nested loops to FPGAS

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Crossref

Recommended from our members

Evaluating the Scalability of SDF Single-chip Multiprocessor Architecture Using Automatically Parallelizing Code

Author: Zhang Yuhua
Publication venue: 'University of North Texas Libraries'
Publication date: 01/12/2004
Field of study

Advances in integrated circuit technology continue to provide more and more transistors on a chip. Computer architects are faced with the challenge of finding the best way to translate these resources into high performance. The challenge in the design of next generation CPU (central processing unit) lies not on trying to use up the silicon area, but on finding smart ways to make use of the wealth of transistors now available. In addition, the next generation architecture should offer high throughout performance, scalability, modularity, and low energy consumption, instead of an architecture that is suitable for only one class of applications or users, or only emphasize faster clock rate. A program exhibits different types of parallelism: instruction level parallelism (ILP), thread level parallelism (TLP), or data level parallelism (DLP). Likewise, architectures can be designed to exploit one or more of these types of parallelism. It is generally not possible to design architectures that can take advantage of all three types of parallelism without using very complex hardware structures and complex compiler optimizations. We present the state-of-art architecture SDF (scheduled data flowed) which explores the TLP parallelism as much as that is supplied by that application. We implement a SDF single-chip multiprocessor constructed from simpler processors and execute the automatically parallelizing application on the single-chip multiprocessor. SDF has many desirable features such as high throughput, scalability, and low power consumption, which meet the requirements of the next generation of CPU design. Compared with superscalar, VLIW (very long instruction word), and SMT (simultaneous multithreading), the experiment results show that for application with very little parallelism SDF is comparable to other architectures, for applications with large amounts of parallelism SDF outperforms other architectures

UNT Digital Library

A step towards unifying schedule and storage optimization

Author: Cohen A.
Darte A.
Feautrier P.
Feautrier P.
Feautrier P.
Frédéric Vivien
Saman Amarasinghe
Sheldon J. W.
William Thies
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Techniques d'exploration architecturale de design à usage spécifique pour l'accélération de boucles

Author: Mbaye Mame Maria
Publication venue
Publication date: 01/08/2010
Field of study

RÉSUMÉ De nos jours, les industriels privilégient les architectures flexibles afin de réduire le temps et les coûts de conception d’un système. Les processeurs à usage spécifique (ASIP) fournissent beaucoup de flexibilité, tout en atteignant des performances élevées. Une tendance qui a de plus en plus de succès dans le processus de conception d’un système sur puce consiste à spécifier le comportement du système en langage évolué tel que le C, SystemC, etc. La spécification est ensuite utilisée durant le partitionement pour déterminer les composantes logicielles et matérielles du système. Avec la maturité des générateurs automatiques de ASIP, les concepteurs peuvent rajouter dans leurs boîtes à outils un nouveau type d’architecture, à savoir les ASIP, en sachant que ces derniers sont conçus à partir d’une spécification décrite en langage évolué. D’un autre côté, dans le monde matériel, et cela depuis très longtemps, les chercheurs ont vu l’avantage de baser le processus de conception sur un langage évolué. Cette recherche a abouti à l’avénement de générateurs automatiques de matériel sur le marché qui sont des outils d’aide à la conception comme CapatultC, Forte’s Cynthetizer, etc. Ainsi, avec tous ces outils basés sur le langage C, les concepteurs ont un choix de types de design élargi mais, d’un autre côté, les options de designs possibles explosent, ce qui peut allonger au lieu de réduire le temps de conception. C’est dans ce cadre que notre thèse doctorale s’inscrit, puisqu’elle présente des méthodologies d’exploration architecturale de design à usage spécifique pour l’accélération de boucles afin de réduire le temps de conception, entre autres. Cette thèse a débuté par l’exploration de designs de ASIP. Les boucles de traitement sont de bonnes candidates à l’accélération, si elles comportent de bonnes possibilités de parallélisme et si ces dernières sont bien exploitées. Le matériel est très efficace à profiter des possibilités de parallélisme au niveau instruction, donc, une méthode de conception a été proposée. Cette dernière extrait le parallélisme d’une boucle afin d’exécuter plus d’opérations concurrentes dans des instructions spécialisées. Notre méthode se base aussi sur l’optimisation des données dans l’architecture du processeur.---------- ABSTRACT Time to market is a very important concern in industry. That is why the industry always looks for new CAD tools that contribute to reducing design time. Application-specific instruction-set processors (ASIPs) provide flexibility and they allow reaching good performance if they are well designed. One trend that gains more and more success is C-based design that uses a high level language such as C, SystemC, etc. The C-based specification is used during the partitionning phase to determine the software and hardware components of the system. Since automatic processor generators are mature now, designers have a new type of tool they can rely on during architecture design. In the hardware world, high level synthesis was and is still a hot research topic. The advances in ESL lead to commercial high-level synthesis tools such as CapatultC, Forte’s Cynthetizer, etc. The designers have more tools in their box but they have more solutions to explore, thus their use can have a reverse effect since the design time can increase instead of being reduced. Our doctoral research tackles this issue by proposing new methodologies for design space exploration of application specific architecture for loop acceleration in order to reduce the design time while reaching some targeted performances. Our thesis starts with the exploration of ASIP design. We propose a method that targets loop acceleration with highly coupled specialized-instructions executing loop operations. Loops are good candidates for acceleration when the parallelism they offer is well exploited (if they have any parallelization opportunities). Hardware components such as specialized-instructions can leverage parallelization opportunities at low level. Thus, we propose to extract loop parallelization opportunities and to execute more concurrent operations in specialized-instructions. The main contribution of this method is a new approach to specialized-instruction (SI) design based on loop acceleration where loop optimization and transformation are done in SIs directly, instead of optimizing the software code. Another contribution is the design of tightly-coupled specialized-instructions associated with loops based on a 5-pattern representation

PolyPublie