179 research outputs found

    Fuzzy logic based energy and throughput aware design space exploration for MPSoCs

    Get PDF
    Multicore architectures were introduced to mitigate the issue of increase in power dissipation with clock frequency. Introduction of deeper pipelines, speculative threading etc. for single core systems were not able to bring much increase in performance as compared to their associated power overhead. However for multicore architectures performance scaling with number of cores has always been a challenge. The Amdahl's law shows that the theoretical maximum speedup of a multicore architecture is not even close to the multiple of number of cores. With less amount of code in parallel having more number of cores for an application might just contribute in greater power dissipation instead of bringing some performance advantage. Therefore there is a need of an adaptive multicore architecture that can be tailored for the application in use for higher energy efficiency. In this paper a fuzzy logic based design space exploration technique is presented that is targeted to optimize a multicore architecture according to the workload requirements in order to achieve optimum balance between throughput and energy of the system

    Efficient CFD code implementation for the ARM-based Mont-Blanc architecture

    Get PDF
    Since 2011, the European project Mont-Blanc has been focused on enabling ARM-based technology for HPC, developing both hardware platforms and system software. The latest Mont-Blanc prototypes use system-on-chip (SoC) devices that combine a CPU and a GPU sharing a common main memory. Specific developments of parallel computing software and well-suited implementation approaches are of crucial importance for such a heterogeneous architecture in order to efficiently exploit its potential. This paper is devoted to the optimizations carried out in the TermoFluids CFD code to efficiently run it on the Mont-Blanc system. The underlying numerical method is based on an unstructured finite-volume discretization of the Navier–Stokes equations for the numerical simulation of incompressible turbulent flows. It is implemented using a portable and modular operational approach based on a minimal set of linear algebra operations. An architecture-specific heterogeneous multilevel MPI+OpenMP+OpenCL implementation of such kernels is proposed. It includes optimizations of the storage formats, dynamic load balancing between the CPU and GPU devices and hiding of communication overheads by overlapping computations and data transfers. A detailed performance study shows time reductions of up to on the kernels’ execution with the new heterogeneous implementation, its scalability on up to 128 Mont-Blanc nodes and the energy savings (around ) achieved with the Mont-Blanc system versus the high-end hybrid supercomputer MinoTauro.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007–2013] and Horizon 2020 under the Mont-Blanc Project (www.montblanc-project.eu), grant agreement n 288777, 610402 and 671697. The work has been financially supported by the Ministerio de Ciencia e Innovación, Spain (ENE- 2014-60577-R), the Russian Science Foundation, project 15-11-30039, CONICYT Becas Chile Doctorado 2012, the Juan de la Cierva posdoctoral grant (IJCI-2014-21034), and the Initial Training Network SEDITRANS (GA number: 607394), implemented within the 7th Framework Programme of the European Commission under call FP7-PEOPLE- 2013-ITN. Our calculations have been performed on the resources of the Barcelona Supercomputing Center. The authors thankfully acknowledge these institutions.Peer ReviewedPostprint (published version

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    METADOCK: A parallel metaheuristic schema for virtual screening methods

    Get PDF
    Virtual screening through molecular docking can be translated into an optimization problem, which can be tackled with metaheuristic methods. The interaction between two chemical compounds (typically a protein, enzyme or receptor, and a small molecule, or ligand) is calculated by using highly computationally demanding scoring functions that are computed at several binding spots located throughout the protein surface. This paper introduces METADOCK, a novel molecular docking methodology based on parameterized and parallel metaheuristics and designed to leverage heterogeneous computers based on heterogeneous architectures. The application decides the optimization technique at running time by setting a configuration schema. Our proposed solution finds a good workload balance via dynamic assignment of jobs to heterogeneous resources which perform independent metaheuristic executions when computing different molecular interactions required by the scoring functions in use. A cooperative scheduling of jobs optimizes the quality of the solution and the overall performance of the simulation, so opening a new path for further developments of virtual screening methods on high-performance contemporary heterogeneous platforms.Ingeniería, Industria y Construcció

    Constraint programming on a heterogeneous multicore architecture

    Get PDF
    As bibliotecas para programação com restrições são úteis ao desenvolverem-se aplicações em linguagens de programação normalmente mais utilizadas pois não necessitam que os programadores aprendam uma. Nova, linguagem, fornecendo ferramentas de programação declarativa para utilização com os sistemas convencionais. Algumas soluções para programação com restrições favorecem completude, tais como sistemas baseados em propagação. Outras estão mais interessadas em obter uma boa solução rapidamente, rejeitando a necessidade de encontram todas as soluções; esta sendo a alternativa utilizada nos sistemas de pesquisa local. Conceber soluções híbridas (propagação + pesquisa local) parece prometedor pois as vantagens de ambas alternativas podem ser combinadas numa única solução. As arquiteturas paralelas são cada vez mais comuns, em parte devido à disponibilidade em grande escala, de sistemas individuais mas também devido à tendência em generalizar o uso de processadores multicore ou seja., processadores com várias unidades de processamento. Nesta tese é proposta uma. Arquitetura para resolvedores de restrições mistos, de pendendo de métodos de propagação e pesquisa local, a qual foi concebida para funcionar eficazmente numa arquitetura. Heterogéneo multiprocessador. /ABSTRACT - Constraint programming libraries are useful when building applications developed mostly in mainstrearn programming languages: they do not require the developers to acquire skills for a new language, providing instead declarative programming tools for use within conventional systems. Some approaches to constraint programming favour completeness, such as propagation-based systems. Others are more interested in getting to a good solution fast, regardless of whether all solutions may be found; this approach is used in local search systems. Designing hybrid approaches (propagation + local search) seems promising since the advantages may be combined into a single approach. Parallel architectures are becoming more commonplace, partly due to the large-scale availability of individual systems but also because of the trend towards generalizing the use of multicore microprocessors. In this thesis an architecture for mixed constraint solvers is proposed, relying both on propagation and local search, which is designed to function effectively in a heterogeneous multicore architecture

    Cross-Layer Rapid Prototyping and Synthesis of Application-Specific and Reconfigurable Many-accelerator Platforms

    Get PDF
    Technological advances of recent years laid the foundation consolidation of informatisationof society, impacting on economic, political, cultural and socialdimensions. At the peak of this realization, today, more and more everydaydevices are connected to the web, giving the term ”Internet of Things”. The futureholds the full connection and interaction of IT and communications systemsto the natural world, delimiting the transition to natural cyber systems and offeringmeta-services in the physical world, such as personalized medical care, autonomoustransportation, smart energy cities etc. . Outlining the necessities of this dynamicallyevolving market, computer engineers are required to implement computingplatforms that incorporate both increased systemic complexity and also cover awide range of meta-characteristics, such as the cost and design time, reliabilityand reuse, which are prescribed by a conflicting set of functional, technical andconstruction constraints. This thesis aims to address these design challenges bydeveloping methodologies and hardware/software co-design tools that enable therapid implementation and efficient synthesis of architectural solutions, which specifyoperating meta-features required by the modern market. Specifically, this thesispresents a) methodologies to accelerate the design flow for both reconfigurableand application-specific architectures, b) coarse-grain heterogeneous architecturaltemplates for processing and communication acceleration and c) efficient multiobjectivesynthesis techniques both at high abstraction level of programming andphysical silicon level.Regarding to the acceleration of the design flow, the proposed methodologyemploys virtual platforms in order to hide architectural details and drastically reducesimulation time. An extension of this framework introduces the systemicco-simulation using reconfigurable acceleration platforms as co-emulation intermediateplatforms. Thus, the development cycle of a hardware/software productis accelerated by moving from a vertical serial flow to a circular interactive loop.Moreover the simulation capabilities are enriched with efficient detection and correctiontechniques of design errors, as well as control methods of performancemetrics of the system according to the desired specifications, during all phasesof the system development. In orthogonal correlation with the aforementionedmethodological framework, a new architectural template is proposed, aiming atbridging the gap between design complexity and technological productivity usingspecialized hardware accelerators in heterogeneous systems-on-chip and networkon-chip platforms. It is presented a novel co-design methodology for the hardwareaccelerators and their respective programming software, including the tasks allocationto the available resources of the system/network. The introduced frameworkprovides implementation techniques for the accelerators, using either conventionalprogramming flows with hardware description language or abstract programmingmodel flows, using techniques from high-level synthesis. In any case, it is providedthe option of systemic measures optimization, such as the processing speed,the throughput, the reliability, the power consumption and the design silicon area.Finally, on addressing the increased complexity in design tools of reconfigurablesystems, there are proposed novel multi-objective optimization evolutionary algo-rithms which exploit the modern multicore processors and the coarse-grain natureof multithreaded programming environments (e.g. OpenMP) in order to reduce theplacement time, while by simultaneously grouping the applications based on theirintrinsic characteristics, the effectively explore the design space effectively.The efficiency of the proposed architectural templates, design tools and methodologyflows is evaluated in relation to the existing edge solutions with applicationsfrom typical computing domains, such as digital signal processing, multimedia andarithmetic complexity, as well as from systemic heterogeneous environments, suchas a computer vision system for autonomous robotic space navigation and manyacceleratorsystems for HPC and workstations/datacenters. The results strengthenthe belief of the author, that this thesis provides competitive expertise to addresscomplex modern - and projected future - design challenges.Οι τεχνολογικές εξελίξεις των τελευταίων ετών έθεσαν τα θεμέλια εδραίωσης της πληροφοριοποίησης της κοινωνίας, επιδρώντας σε οικονομικές,πολιτικές, πολιτιστικές και κοινωνικές διαστάσεις. Στο απόγειο αυτής τη ςπραγμάτωσης, σήμερα, ολοένα και περισσότερες καθημερινές συσκευές συνδέονται στο παγκόσμιο ιστό, αποδίδοντας τον όρο «Ίντερνετ των πραγμάτων».Το μέλλον επιφυλάσσει την πλήρη σύνδεση και αλληλεπίδραση των συστημάτων πληροφορικής και επικοινωνιών με τον φυσικό κόσμο, οριοθετώντας τη μετάβαση στα συστήματα φυσικού κυβερνοχώρου και προσφέροντας μεταυπηρεσίες στον φυσικό κόσμο όπως προσωποποιημένη ιατρική περίθαλψη, αυτόνομες μετακινήσεις, έξυπνες ενεργειακά πόλεις κ.α. . Σκιαγραφώντας τις ανάγκες αυτής της δυναμικά εξελισσόμενης αγοράς, οι μηχανικοί υπολογιστών καλούνται να υλοποιήσουν υπολογιστικές πλατφόρμες που αφενός ενσωματώνουν αυξημένη συστημική πολυπλοκότητα και αφετέρου καλύπτουν ένα ευρύ φάσμα μεταχαρακτηριστικών, όπως λ.χ. το κόστος σχεδιασμού, ο χρόνος σχεδιασμού, η αξιοπιστία και η επαναχρησιμοποίηση, τα οποία προδιαγράφονται από ένα αντικρουόμενο σύνολο λειτουργικών, τεχνολογικών και κατασκευαστικών περιορισμών. Η παρούσα διατριβή στοχεύει στην αντιμετώπιση των παραπάνω σχεδιαστικών προκλήσεων, μέσω της ανάπτυξης μεθοδολογιών και εργαλείων συνσχεδίασης υλικού/λογισμικού που επιτρέπουν την ταχεία υλοποίηση καθώς και την αποδοτική σύνθεση αρχιτεκτονικών λύσεων, οι οποίες προδιαγράφουν τα μετα-χαρακτηριστικά λειτουργίας που απαιτεί η σύγχρονη αγορά. Συγκεκριμένα, στα πλαίσια αυτής της διατριβής, παρουσιάζονται α) μεθοδολογίες επιτάχυνσης της ροής σχεδιασμού τόσο για επαναδιαμορφούμενες όσο και για εξειδικευμένες αρχιτεκτονικές, β) ετερογενή αδρομερή αρχιτεκτονικά πρότυπα επιτάχυνσης επεξεργασίας και επικοινωνίας και γ) αποδοτικές τεχνικές πολυκριτηριακής σύνθεσης τόσο σε υψηλό αφαιρετικό επίπεδο προγραμματισμού,όσο και σε φυσικό επίπεδο πυριτίου.Αναφορικά προς την επιτάχυνση της ροής σχεδιασμού, προτείνεται μια μεθοδολογία που χρησιμοποιεί εικονικές πλατφόρμες, οι οποίες αφαιρώντας τις αρχιτεκτονικές λεπτομέρειες καταφέρνουν να μειώσουν σημαντικά το χρόνο εξομοίωσης. Παράλληλα, εισηγείται η συστημική συν-εξομοίωση με τη χρήση επαναδιαμορφούμενων πλατφορμών, ως μέσων επιτάχυνσης. Με αυτόν τον τρόπο, ο κύκλος ανάπτυξης ενός προϊόντος υλικού, μετατεθειμένος από την κάθετη σειριακή ροή σε έναν κυκλικό αλληλεπιδραστικό βρόγχο, καθίσταται ταχύτερος, ενώ οι δυνατότητες προσομοίωσης εμπλουτίζονται με αποδοτικότερες μεθόδους εντοπισμού και διόρθωσης σχεδιαστικών σφαλμάτων, καθώς και μεθόδους ελέγχου των μετρικών απόδοσης του συστήματος σε σχέση με τις επιθυμητές προδιαγραφές, σε όλες τις φάσεις ανάπτυξης του συστήματος. Σε ορθογώνια συνάφεια με το προαναφερθέν μεθοδολογικό πλαίσιο, προτείνονται νέα αρχιτεκτονικά πρότυπα που στοχεύουν στη γεφύρωση του χάσματος μεταξύ της σχεδιαστικής πολυπλοκότητας και της τεχνολογικής παραγωγικότητας, με τη χρήση συστημάτων εξειδικευμένων επιταχυντών υλικού σε ετερογενή συστήματα-σε-ψηφίδα καθώς και δίκτυα-σε-ψηφίδα. Παρουσιάζεται κατάλληλη μεθοδολογία συν-σχεδίασης των επιταχυντών υλικού και του λογισμικού προκειμένου να αποφασισθεί η κατανομή των εργασιών στους διαθέσιμους πόρους του συστήματος/δικτύου. Το μεθοδολογικό πλαίσιο προβλέπει την υλοποίηση των επιταχυντών είτε με συμβατικές μεθόδους προγραμματισμού σε γλώσσα περιγραφής υλικού είτε με αφαιρετικό προγραμματιστικό μοντέλο με τη χρήση τεχνικών υψηλού επιπέδου σύνθεσης. Σε κάθε περίπτωση, δίδεται η δυνατότητα στο σχεδιαστή για βελτιστοποίηση συστημικών μετρικών, όπως η ταχύτητα επεξεργασίας, η ρυθμαπόδοση, η αξιοπιστία, η κατανάλωση ενέργειας και η επιφάνεια πυριτίου του σχεδιασμού. Τέλος, προκειμένου να αντιμετωπισθεί η αυξημένη πολυπλοκότητα στα σχεδιαστικά εργαλεία επαναδιαμορφούμενων συστημάτων, προτείνονται νέοι εξελικτικοί αλγόριθμοι πολυκριτηριακής βελτιστοποίησης, οι οποίοι εκμεταλλευόμενοι τους σύγχρονους πολυπύρηνους επεξεργαστές και την αδρομερή φύση των πολυνηματικών περιβαλλόντων προγραμματισμού (π.χ. OpenMP), μειώνουν το χρόνο επίλυσης του προβλήματος της τοποθέτησης των λογικών πόρων σε φυσικούς,ενώ ταυτόχρονα, ομαδοποιώντας τις εφαρμογές βάση των εγγενών χαρακτηριστικών τους, διερευνούν αποτελεσματικότερα το χώρο σχεδίασης.Η αποδοτικότητά των προτεινόμενων αρχιτεκτονικών προτύπων και μεθοδολογιών επαληθεύτηκε σε σχέση με τις υφιστάμενες λύσεις αιχμής τόσο σε αυτοτελής εφαρμογές, όπως η ψηφιακή επεξεργασία σήματος, τα πολυμέσα και τα προβλήματα αριθμητικής πολυπλοκότητας, καθώς και σε συστημικά ετερογενή περιβάλλοντα, όπως ένα σύστημα όρασης υπολογιστών για αυτόνομα διαστημικά ρομποτικά οχήματα και ένα σύστημα πολλαπλών επιταχυντών υλικού για σταθμούς εργασίας και κέντρα δεδομένων, στοχεύοντας εφαρμογές υψηλής υπολογιστικής απόδοσης (HPC). Τα αποτελέσματα ενισχύουν την πεποίθηση του γράφοντα, ότι η παρούσα διατριβή παρέχει ανταγωνιστική τεχνογνωσία για την αντιμετώπιση των πολύπλοκων σύγχρονων και προβλεπόμενα μελλοντικών σχεδιαστικών προκλήσεων

    Exploring Task Mappings on Heterogeneous MPSoCs using a Bias-Elitist Genetic Algorithm

    Get PDF
    Exploration of task mappings plays a crucial role in achieving high performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms. The problem of optimally mapping a set of tasks onto a set of given heterogeneous processors for maximal throughput has been known, in general, to be NP-complete. The problem is further exacerbated when multiple applications (i.e., bigger task sets) and the communication between tasks are also considered. Previous research has shown that Genetic Algorithms (GA) typically are a good choice to solve this problem when the solution space is relatively small. However, when the size of the problem space increases, classic genetic algorithms still suffer from the problem of long evolution times. To address this problem, this paper proposes a novel bias-elitist genetic algorithm that is guided by domain-specific heuristics to speed up the evolution process. Experimental results reveal that our proposed algorithm is able to handle large scale task mapping problems and produces high-quality mapping solutions in only a short time period.Comment: 9 pages, 11 figures, uses algorithm2e.st

    A Three-Level Parallelisation Scheme and Application to the Nelder-Mead Algorithm

    Get PDF
    We consider a three-level parallelisation scheme. The second and third levels define a classical two-level parallelisation scheme and some load balancing algorithm is used to distribute tasks among processes. It is well-known that for many applications the efficiency of parallel algorithms of the second and third level starts to drop down after some critical parallelisation degree is reached. This weakness of the two-level template is addressed by introduction of one additional parallelisation level. As an alternative to the basic solver some new or modified algorithms are considered on this level. The idea of the proposed methodology is to increase the parallelisation degree by using less efficient algorithms in comparison with the basic solver. As an example we investigate two modified Nelder-Mead methods. For the selected application, a few partial differential equations are solved numerically on the second level, and on the third level the parallel Wang's algorithm is used to solve systems of linear equations with tridiagonal matrices. A greedy workload balancing heuristic is proposed, which is oriented to the case of a large number of available processors. The complexity estimates of the computational tasks are model-based, i.e. they use empirical computational data
    corecore