12 research outputs found

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    Αποθηκευτικά συστήματα με δυνατότητα κλιμάκωσης σε eXascale περιβάλλοντα

    Get PDF
    Οι επιστημονικοί υπολογισμοί μεγάλης κλίμακας είναι εξαιρετικά απαιτητικοί με αποτέλεσμα να έχουν μεγάλες ανάγκες σε υπολογιστική ισχύ. Οι παράλληλοι υπολογισμοί και τα παράλληλα συστήματα αρχείων αναγνωρίζονται ως η μόνη εφικτή λύση σε αυτού του είδους τα προβλήματα, ενώ οι διεργασίες εισόδου/εξόδου αποτελούν το σημαντικότερο σημείο συμφόρησης στην απόδοση των εφαρμογών. Οι σημαντικότεροι παράγοντες που επηρεάζουν την I/O απόδοση είναι ο αριθμός των παράλληλων διεργασιών που συμμετέχουν στις μεταφορές των δεδομένων, το μέγεθος της κάθε μεταφοράς καθώς και τα διάφορα I/O μοτίβα πρόσβασης. Τα διαμοιραζόμενα συστήματα αρχείων έχουν σημαντικούς περιορισμούς όταν εφαρμόζονται σε μεγάλης κλίμακας συστήματα, επειδή το εύρος ζώνης δεν κλιμακώνει οικονομικά αλλά και γιατί η I/O κίνηση στην δικτυακή υποδομή και στους αποθηκευτικούς κόμβους μπορεί να επηρεαστεί από άλλες ξένες διεργασίες/εφαρμογές. Στοχεύοντας στην επίλυση των πιο πάνω περιορισμών αναπτύχθηκε το πλαίσιο ΙΚΑΡΟΣ ως ένας μηχανισμός που επιτρέπει το συντονισμό, με δυναμικό τρόπο, της Ι/Ο αρχιτεκτονικής, χρησιμοποιώντας συγκεκριμένες παραμέτρους εισόδου. Το ΙΚΑΡΟΣ παρέχει συντονισμένες παράλληλες μεταφορές δεδομένων στην συνολική ροή (τοπική- απομακρυσμένη πρόσβαση), με αποτέλεσμα τη μείωση του ανταγωνισμού, για πόρους, μεταξύ των αποθηκευτικών και δικτυακών μέσων. Δημιουργεί, δυναμικά, αποκλειστικές/ήμι-αποκλειστικές συστοιχίες αποθηκευτικών μέσων ανά διεργασία, με αποτέλεσμα τη βελτίωση της Ι/Ο απόδοσης κατά 33% χρησιμοποιώντας το 1/3 των διαθέσιμων σκληρών δίσκων.High performance computing (HPC) has crossed the Petaflop mark and is reaching the Exaflop range quickly. The exascale system is projected to have millions of nodes, with thousands of cores for each node. At such an extreme scale, the substantial amount of concurrency can cause a critical contention issue for I/O system. This study proposes a dynamically coordinated I/O architecture for addressing some of the limitations that current parallel file systems and storage architectures are facing with very large-scale systems. The fundamental idea is to coordinate I/O accesses according to the topology/profile of the infrastructure, the load metrics, and the I/O demands of each application. The measurements have shown that by using IKAROS approach we can fully utilize the provided I/O and network resources, minimize disk and network contention, and achieve better performance

    Applications Development for the Computational Grid

    Get PDF

    Generic Metadata Handling in Scientific Data Life Cycles

    Get PDF
    Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management

    A grid and cloud-based framework for high throughput bioinformatics

    Get PDF
    Recent advances in genome sequencing technologies have unleashed a flood of new data. As a result, the computational analysis of bioinformatics data sets has been rapidly moving from a labbased desktop computer environment to exhaustive analyses performed by large dedicated computing resources. Traditionally, large computational problems have been performed on dedicated clusters of high performance machines that are typically local to, and owned by, a particular institution. The current trend in Grid computing has seen institutions pooling their computational resources in order to offload excess computational work to remote locations during busy periods. In the last year or so, commercial Cloud computing initiatives have matured enough to offer a viable remote source of reliable computational power. Collections of idle desktop computers have also been used as a source of computational power in the form of ‘volunteer Grids’. The field of bioinformatics is highly dynamic, with new or updated versions of software tools and databases continually being developed. Several different tools and datasets must often be combined into a coherent, automated workflow or pipeline. While existing solutions are available for constructing workflows, there is a clear need for long-lived analyses consisting of many interconnected steps to be able to migrate among Grid and cloud computational resources dynamically. This project involved research into the principles underlying the design and architecture of flexible, high-throughput bioinformatics processes. Following extensive research into requirements gathering, a novel Grid-based platform, Microbase, has been implemented that is based on service-oriented architectures and peer-to-peer data transfer technology. This platform has been shown to be amenable to utilising a wide range of hardware from commodity desktop computers, to high-performance cloud infrastructure. The system has been shown to drastically reduce the bandwidth requirements of bioinformatics data distribution, and therefore reduces both the financial and computational costs associated with cloud computing. The system is inherently modular in nature, comprising a service based notification system, a data storage system scheduler and a job manager. In keeping with e-Science principles, each module can operate in physical isolation from each other, distributed within an intranet or Internet. Moreover, since each module is loosely coupled via Web services, modules have the potential to be used in combination with external service oriented components or in isolation as part of another system. In order to demonstrate the utility of such an open source system to the bioinformatics community, a pipeline of inter-connected bioinformatics applications was developed using the Microbase system to form a high throughput application for the comparative and visual analysis of microbial genomes. This application, Automated Genome Analyser (AGA) has been developed to operate without user interaction. AGA exposes its results via Web-services which can be used by further analytical stages within Microbase, by external computational resources via a Web service interface or which can be queried by users via an interactive genome browser. In addition to providing the necessary infrastructure for scalable Grid applications, a modular development framework has been provided, which simplifies the process of writing Grid applications. Microbase has been adopted by a number of projects ranging from comparative genomics to synthetic biology simulations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    International VLBI Service for Geodesy and Astrometry 2011 Annual Report

    Get PDF
    This volume of reports is the 2011 Annual Report of the International VLBI Service for Geodesy and Astrometry (IVS). The individual reports were contributed by VLBI groups in the international geodetic and astrometric community who constitute the components of IVS. The 2011 Annual Report documents the work of these IVS components over the period January 1, 2011 through December 31, 2011. The reports document changes, activities, and progress of the IVS. The entire contents of this Annual Report also appear on the IVS Web site at http://ivscc.gsfc.nasa.gov/publications/ar2011

    International VLBI Service for Geodesy and Astrometry 2012 Annual Report

    Get PDF
    This volume of reports is the 2012 Annual Report of the International VLBI Service for Geodesy and Astrometry (IVS). The individual reports were contributed by VLBI groups in the international geodetic and astrometric community who constitute the permanent components of IVS. The IVS 2012 Annual Report documents the work of the IVS components for the calendar year 2012, our fourteenth year of existence. The reports describe changes, activities, and progress ofthe IVS. Many thanks to all IVS components who contributed to this Annual Report. With the exception of the first section and parts of the last section (described below), the contents of this Annual Report also appear on the IVS Web site athttp:ivscc.gsfc.nasa.gov/publications/ar201

    GSI Scientific Report 2009 [GSI Report 2010-1]

    Get PDF

    GSI Scientific Report 2009 [GSI Report 2010-1]

    Get PDF
    Displacement design response spectrum is an essential component for the currently-developing displacement-based seismic design and assessment procedures. This paper proposes a new and simple method for constructing displacement design response spectra on soft soil sites. The method takes into account modifications of the seismic waves by the soil layers, giving due considerations to factors such as the level of bedrock shaking, material non-linearity, seismic impedance contrast at the interface between soil and bedrock, and plasticity of the soil layers. The model is particularly suited to applications in regions with a paucity of recorded strong ground motion data, from which empirical models cannot be reliably developed
    corecore