9 research outputs found

    Teadusarvutuse algoritmide taandamine hajusarvutuse raamistikele

    Get PDF
    Teadusarvutuses kasutatakse arvuteid ja algoritme selleks, et lahendada probleeme erinevates reaalteadustes nagu geneetika, bioloogia ja keemia. Tihti on eesmärgiks selliste loodusnähtuste modelleerimine ja simuleerimine, mida päris keskkonnas oleks väga raske uurida. Näiteks on võimalik luua päikesetormi või meteoriiditabamuse mudel ning arvutisimulatsioonide abil hinnata katastroofi mõju keskkonnale. Mida keerulisemad ja täpsemad on sellised simulatsioonid, seda rohkem arvutusvõimsust on vaja. Tihti kasutatakse selleks suurt hulka arvuteid, mis kõik samaaegselt töötavad ühe probleemi kallal. Selliseid arvutusi nimetatakse paralleel- või hajusarvutusteks. Hajusarvutuse programmide loomine on aga keeruline ning nõuab palju rohkem aega ja ressursse, kuna vaja on sünkroniseerida erinevates arvutites samaaegselt tehtavat tööd. On loodud mitmeid tarkvararaamistikke, mis lihtsustavad seda tööd automatiseerides osa hajusprogrammeerimisest. Selle teadustöö eesmärk oli uurida selliste hajusarvutusraamistike sobivust keerulisemate teadusarvutuse algoritmide jaoks. Tulemused näitasid, et olemasolevad raamistikud on üksteisest väga erinevad ning neist ükski ei ole sobiv kõigi erinevat tüüpi algoritmide jaoks. Mõni raamistik on sobiv ainult lihtsamate algoritmide jaoks; mõni ei sobi olukorras, kus andmed ei mahu arvutite mällu. Algoritmi jaoks kõige sobivama hajusarvutisraamistiku valimine võib olla väga keeruline ülesanne, kuna see nõuab olemasolevate raamistike uurimist ja rakendamist. Sellele probleemile lahendust otsides otsustati luua dünaamiline algoritmide modelleerimise rakendus (DAMR), mis oskab simuleerida algoritmi implementatsioone erinevates hajusarvutusraamistikes. DAMR aitab hinnata milline hajusraamistik on kõige sobivam ette antud algoritmi jaoks, ilma algoritmi reaalselt ühegi hajusraamistiku peale implementeerimata. Selle uurimustöö peamine panus on hajusarvutusraamistike kasutuselevõtu lihtsamaks tegemine teadlastele, kes ei ole varem nende kasutamisega kokku puutunud. See peaks märkimisväärselt aega ja ressursse kokku hoidma, kuna ei pea ükshaaval kõiki olemasolevaid hajusraamistikke tundma õppima ja rakendama.Scientific computing uses computers and algorithms to solve problems in various sciences such as genetics, biology and chemistry. Often the goal is to model and simulate different natural phenomena which would otherwise be very difficult to study in real environments. For example, it is possible to create a model of a solar storm or a meteor hit and run computer simulations to assess the impact of the disaster on the environment. The more sophisticated and accurate the simulations are the more computing power is required. It is often necessary to use a large number of computers, all working simultaneously on a single problem. These kind of computations are called parallel or distributed computing. However, creating distributed computing programs is complicated and requires a lot more time and resources, because it is necessary to synchronize different computers working at the same time. A number of software frameworks have been created to simplify this process by automating part of a distributed programming. The goal of this research was to assess the suitability of such distributed computing frameworks for complex scientific computing algorithms. The results showed that existing frameworks are very different from each other and none of them are suitable for all different types of algorithms. Some frameworks are only suitable for simple algorithms; others are not suitable when data does not fit into the computer memory. Choosing the most appropriate distributed computing framework for an algorithm can be a very complex task, because it requires studying and applying the existing frameworks. While searching for a solution to this problem, it was decided to create a Dynamic Algorithms Modelling Application (DAMA), which is able to simulate the implementation of the algorithm in different distributed computing frameworks. DAMA helps to estimate which distributed framework is the most appropriate for a given algorithm, without actually implementing it in any of the available frameworks. This main contribution of this study is simplifying the adoption of distributed computing frameworks for researchers who are not yet familiar with using them. It should save significant time and resources as it is not necessary to study each of the available distributed computing frameworks in detail

    Parallelizing Machine Learning- Functionally: A Framework and Abstractions for Parallel Graph Processing

    Get PDF
    Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity. We present a framework for implementing parallel and distributed machine learning algorithms on large graphs, flexibly, through the use of functional programming abstractions. Our aim is a system that allows researchers and practitioners to quickly and easily implement (and experiment with) their algorithms in a parallel or distributed setting. We introduce functional combinators for the flexible composition of parallel, aggregation, and sequential steps. To the best of our knowledge, our system is the first to avoid inversion of control in a (bulk) synchronous parallel model

    Categorization And Visualization Of Parallel Programming Systems

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Yükesek kazanımlı programlama olarak da bilinen paralel programlama, bir problemi daha hızlı çözmek için aynı anda birden çok işlemci kullanılmasına denir. Günümüzde, ağır işlemler içeren birçok problem paralel olarak uygulanmaya çalışılmaktadır, buna örnek olarak nehir sularının simüle edilmesi, fizik veya kimya problemleri, astrolojik simülasyonlar verilebilir. Bu tezin amacı, bilimsel hesaplama veya mühendislik amaçlı kullanılan yüksek kazanımlı yazılımları tartışmaktır. Paralel programlama sistemleri ile kastedilen kütüphaneler, diller, derleyiciler, derleyici yönlendiricileri veya bunun dışında kalan, programcının paralel algoritmasını ifade edebileceği yapılardır. Yükesek kazanımlı program tasarımı için programcının dikkat etmesi gereken iki önemli nokta vardır: problemi iyi kavrayıp uygun bir çözüm önermek, doğru sisteme karar verebilmek. Doğru karar verebilmek için kullanıcının sistemler hakkında oldukça iyi bilgiye sahip olması gerekir. Bazen, birden çok yazılım ve donanımı bir arada kullanmak da gerekebilir. Bu tezde var olan paralel programlama sistemleri tanımlanır ve sınıflandırılır, bunun için güncel bildiriler esas alınmıştır. Özellikle algoritmik taslaklar ve fonsiyonel paralel programlama üzerinde durulmuştur.Ayrica güncel bilgileri depolamak ve bir kaynak yaratmak için wiki temelli bir web kaynağı oluşturulmuştur. Sistemlerin grafik gösterimini sağlayıp daha anlaşılır bir sınıflandırma yapabilmek için yeni bir sözdizimi tasarlanıp dinamik ağ çizebilecek webdot aracı ile bir araya getirilerek sistemleri temsil edecek ağı çizecek araç geliştirilmiştir. Bu sözdiziminin öğrenilmesi ve kullanılması son derece kolaydır. Son olarak iki temel paralel programlama tipi, paylaşılan bellek ve mesajlaşma, iki farklı tipte algoritma kullanılarak karşılaştırılmıştır. Programlar OpenMP ve MPI ile gerçeklenmiştir, farklı paralel makinelerde koşturulup sonuçları karşılaştırılmıştır. Paralel makineler için Almanya nın Aachen Üniversitesi nin SMP ağı ve Ulakbim in dağıtık bellekli paralel makineleri kullanılmıştır.Parallel computing, also called high-performance computing, refers to solving problems faster by using multiple processors simultaneously. Nowadays, almost every computationally-intensive problem that one could imagine is tried to be implemented in parallel. This thesis is aimed at discussing high-performance software for scientific or engineering applications. The term parallel programming systems here means libraries, languages, compiler directives or other means through which a programmer can express a parallel algorithm. To design high performance programs, there are two keys for the programmer: to understand the problem and find a solution for parallelization, and to decide on the right system for the implementation, which requires a good knowledge about existing parallel programming systems. The programmer, after having understood the problem, has to choose between many systems, some of which are closely related, whereas others have big differences. This thesis describes and classifies existing parallel programming systems, thus bringing existing surveys up to date. It describes a wiki-based web portal for collecting information about most recent systems, which has been developed as part of the thesis. A special syntax and a visualization tool has been developed. This syntax and tool allow users to have their own categorization scheme. Fourth, it compares two major programming styles message passing and shared memory with two different algorithms in order show performance differences of these styles. Algorithms are implemented in OpenMP and MPI, performance of both programs are measured on the SMP Cluster of Aachen University, Germany and on the Beowulf Cluster of Ulakbim, Ankara.Yüksek LisansM.Sc

    Toward optimised skeletons for heterogeneous parallel architecture with performance cost model

    Get PDF
    High performance architectures are increasingly heterogeneous with shared and distributed memory components, and accelerators like GPUs. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This thesis explores the potential for algorithmic skeletons integrating a dynamically parametrised static cost model, to deliver portable performance for mostly regular data parallel programs on heterogeneous archi- tectures. The rst contribution of this thesis is to address the challenges of program- ming heterogeneous architectures by providing two skeleton-based programming libraries: i.e. HWSkel for heterogeneous multicore clusters and GPU-HWSkel that enables GPUs to be exploited as general purpose multi-processor devices. Both libraries provide heterogeneous data parallel algorithmic skeletons including hMap, hMapAll, hReduce, hMapReduce, and hMapReduceAll. The second contribution is the development of cost models for workload dis- tribution. First, we construct an architectural cost model (CM1) to optimise overall processing time for HWSkel heterogeneous skeletons on a heterogeneous system composed of networks of arbitrary numbers of nodes, each with an ar- bitrary number of cores sharing arbitrary amounts of memory. The cost model characterises the components of the architecture by the number of cores, clock speed, and crucially the size of the L2 cache. Second, we extend the HWSkel cost model (CM1) to account for GPU performance. The extended cost model (CM2) is used in the GPU-HWSkel library to automatically nd a good distribution for both a single heterogeneous multicore/GPU node, and clusters of heteroge- neous multicore/GPU nodes. Experiments are carried out on three heterogeneous multicore clusters, four heterogeneous multicore/GPU clusters, and three single heterogeneous multicore/GPU nodes. The results of experimental evaluations for four data parallel benchmarks, i.e. sumEuler, Image matching, Fibonacci, and Matrix Multiplication, show that our combined heterogeneous skeletons and cost models can make good use of resources in heterogeneous systems. Moreover using cores together with a GPU in the same host can deliver good performance either on a single node or on multiple node architectures

    A BSPlib-style API for Bulk Synchronous Parallel ML

    No full text
    International audienceBulk synchronous parallelism (BSP) offers an abstract and simple model of parallelism yet allows to take realistically into account the communication costs of parallel algorithms. BSP has been used in many application domains. BSPlib and its variants are programming libraries for the C language that support the BSP style. Bulk Synchronous Parallel ML (BSML) is a library for BSP programming with the functional language OCaml. It offers parallel operations on a data structure named parallel vector. BSML provides a global view of programs, i.e. BSML programs can be seen as sequential programs working on a parallel data structure (seq of par) while a BSPlib program is written in the SPMD style and understood as a parallel composition of communicating sequential programs (par of seq). The communication styles of BSML and BSPlib are also quite different. The contribution of this paper is a BSPlib-style communication API implemented on top of BSML. It has been designed without extending BSML, but only using the imperative features of the underlying function language OCaml. Programs implemented using this API are very close to programs implemented using a BSPlib library for the C language. It therefore shows that BSML is universal for the BSP model

    A self-mobile skeleton in the presence of external loads

    Get PDF
    Multicore clusters provide cost-effective platforms for running CPU-intensive and data-intensive parallel applications. To effectively utilise these platforms, sharing their resources is needed amongst the applications rather than dedicated environments. When such computational platforms are shared, user applications must compete at runtime for the same resource so the demand is irregular and hence the load is changeable and unpredictable. This thesis explores a mechanism to exploit shared multicore clusters taking into account the external load. This mechanism seeks to reduce runtime by finding the best computing locations to serve the running computations. We propose a generic algorithmic data-parallel skeleton which is aware of its computations and the load state of the computing environment. This skeleton is structured using the Master/Worker pattern where the master and workers are distributed on the nodes of the cluster. This skeleton divides the problem into computations where all these computations are initiated by the master and coordinated by the distributed workers. Moreover, the skeleton has built-in mobility to implicitly move the parallel computations between two workers. This mobility is data mobility controlled by the application, the skeleton. This skeleton is not problem-specific and therefore it is able to execute different kinds of problems. Our experiments suggest that this skeleton is able to efficiently compensate for unpredictable load variations. We also propose a performance cost model that estimates the continuation time of the running computations locally and remotely. This model also takes the network delay, data size and the load state as inputs to estimate the transfer time of the potential movement. Our experiments demonstrate that this model takes accurate decisions based on estimates in different load patterns to reduce the total execution time. This model is problem-independent because it considers the progress of all current computations. Moreover, this model is based on measurements so it is not dependent on the programming language. Furthermore, this model takes into account the load state of the nodes on which the computation run. This state includes the characteristics of the nodes and hence this model is architecture-independent. Because the scheduling has direct impact on system performance, we support the skeleton with a cost-informed scheduler that uses a hybrid scheduling policy to improve the dynamicity and adaptivity of the skeleton. This scheduler has agents distributed over the participating workers to keep the load information up to date, trigger the estimations, and facilitate the mobility operations. On runtime, the skeleton co-schedules its computations over computational resources without interfering with the native operating system scheduler. We demonstrate that using a hybrid approach the system makes mobility decisions which lead to improved performance and scalability over large number of computational resources. Our experiments suggest that the adaptivity of our skeleton in shared environment improves the performance and reduces resource contention on nodes that are heavily loaded. Therefore, this adaptivity allows other applications to acquire more resources. Finally, our experiments show that the load scheduler has a low incurred overhead, not exceeding 0.6%, compared to the total execution time

    Anales del XIII Congreso Argentino de Ciencias de la Computación (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI

    Anales del XIII Congreso Argentino de Ciencias de la Computación (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI
    corecore