Search CORE

12 research outputs found

Workflow models for heterogeneous distributed systems

Author: Colonnelli Iacopo
Publication venue
Publication date: 01/01/2022
Field of study

The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institutional Research Information System University of Turin

Αποθηκευτικά συστήματα με δυνατότητα κλιμάκωσης σε eXascale περιβάλλοντα

Author: Φιλιππίδης Χρίστος
Publication venue
Publication date: 01/01/2016
Field of study

Οι επιστημονικοί υπολογισμοί μεγάλης κλίμακας είναι εξαιρετικά απαιτητικοί με αποτέλεσμα να έχουν μεγάλες ανάγκες σε υπολογιστική ισχύ. Οι παράλληλοι υπολογισμοί και τα παράλληλα συστήματα αρχείων αναγνωρίζονται ως η μόνη εφικτή λύση σε αυτού του είδους τα προβλήματα, ενώ οι διεργασίες εισόδου/εξόδου αποτελούν το σημαντικότερο σημείο συμφόρησης στην απόδοση των εφαρμογών. Οι σημαντικότεροι παράγοντες που επηρεάζουν την I/O απόδοση είναι ο αριθμός των παράλληλων διεργασιών που συμμετέχουν στις μεταφορές των δεδομένων, το μέγεθος της κάθε μεταφοράς καθώς και τα διάφορα I/O μοτίβα πρόσβασης. Τα διαμοιραζόμενα συστήματα αρχείων έχουν σημαντικούς περιορισμούς όταν εφαρμόζονται σε μεγάλης κλίμακας συστήματα, επειδή το εύρος ζώνης δεν κλιμακώνει οικονομικά αλλά και γιατί η I/O κίνηση στην δικτυακή υποδομή και στους αποθηκευτικούς κόμβους μπορεί να επηρεαστεί από άλλες ξένες διεργασίες/εφαρμογές. Στοχεύοντας στην επίλυση των πιο πάνω περιορισμών αναπτύχθηκε το πλαίσιο ΙΚΑΡΟΣ ως ένας μηχανισμός που επιτρέπει το συντονισμό, με δυναμικό τρόπο, της Ι/Ο αρχιτεκτονικής, χρησιμοποιώντας συγκεκριμένες παραμέτρους εισόδου. Το ΙΚΑΡΟΣ παρέχει συντονισμένες παράλληλες μεταφορές δεδομένων στην συνολική ροή (τοπική- απομακρυσμένη πρόσβαση), με αποτέλεσμα τη μείωση του ανταγωνισμού, για πόρους, μεταξύ των αποθηκευτικών και δικτυακών μέσων. Δημιουργεί, δυναμικά, αποκλειστικές/ήμι-αποκλειστικές συστοιχίες αποθηκευτικών μέσων ανά διεργασία, με αποτέλεσμα τη βελτίωση της Ι/Ο απόδοσης κατά 33% χρησιμοποιώντας το 1/3 των διαθέσιμων σκληρών δίσκων.High performance computing (HPC) has crossed the Petaflop mark and is reaching the Exaflop range quickly. The exascale system is projected to have millions of nodes, with thousands of cores for each node. At such an extreme scale, the substantial amount of concurrency can cause a critical contention issue for I/O system. This study proposes a dynamically coordinated I/O architecture for addressing some of the limitations that current parallel file systems and storage architectures are facing with very large-scale systems. The fundamental idea is to coordinate I/O accesses according to the topology/profile of the infrastructure, the load metrics, and the I/O demands of each application. The measurements have shown that by using IKAROS approach we can fully utilize the provided I/O and network resources, minimize disk and network contention, and achieve better performance

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Applications Development for the Computational Grid

Author: Abramson David
Publication venue: Monash University, InformationTechnology
Publication date: 01/01/2011
Field of study

University of Queensland eSpace

Generic Metadata Handling in Scientific Data Life Cycles

Author: Grunzke Richard
Publication venue
Publication date: 12/04/2016
Field of study

Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management

Technische Universität Dresden: Qucosa

A grid and cloud-based framework for high throughput bioinformatics

Author: Flanagan Keith Stanley
Publication venue
Publication date: 01/01/2010
Field of study

Recent advances in genome sequencing technologies have unleashed a flood of new data. As a result, the computational analysis of bioinformatics data sets has been rapidly moving from a labbased desktop computer environment to exhaustive analyses performed by large dedicated computing resources. Traditionally, large computational problems have been performed on dedicated clusters of high performance machines that are typically local to, and owned by, a particular institution. The current trend in Grid computing has seen institutions pooling their computational resources in order to offload excess computational work to remote locations during busy periods. In the last year or so, commercial Cloud computing initiatives have matured enough to offer a viable remote source of reliable computational power. Collections of idle desktop computers have also been used as a source of computational power in the form of ‘volunteer Grids’. The field of bioinformatics is highly dynamic, with new or updated versions of software tools and databases continually being developed. Several different tools and datasets must often be combined into a coherent, automated workflow or pipeline. While existing solutions are available for constructing workflows, there is a clear need for long-lived analyses consisting of many interconnected steps to be able to migrate among Grid and cloud computational resources dynamically. This project involved research into the principles underlying the design and architecture of flexible, high-throughput bioinformatics processes. Following extensive research into requirements gathering, a novel Grid-based platform, Microbase, has been implemented that is based on service-oriented architectures and peer-to-peer data transfer technology. This platform has been shown to be amenable to utilising a wide range of hardware from commodity desktop computers, to high-performance cloud infrastructure. The system has been shown to drastically reduce the bandwidth requirements of bioinformatics data distribution, and therefore reduces both the financial and computational costs associated with cloud computing. The system is inherently modular in nature, comprising a service based notification system, a data storage system scheduler and a job manager. In keeping with e-Science principles, each module can operate in physical isolation from each other, distributed within an intranet or Internet. Moreover, since each module is loosely coupled via Web services, modules have the potential to be used in combination with external service oriented components or in isolation as part of another system. In order to demonstrate the utility of such an open source system to the bioinformatics community, a pipeline of inter-connected bioinformatics applications was developed using the Microbase system to form a high throughput application for the comparative and visual analysis of microbial genomes. This application, Automated Genome Analyser (AGA) has been developed to operate without user interaction. AGA exposes its results via Web-services which can be used by further analytical stages within Microbase, by external computational resources via a Web service interface or which can be queried by users via an interactive genome browser. In addition to providing the necessary infrastructure for scalable Grid applications, a modular development framework has been provided, which simplifies the process of writing Grid applications. Microbase has been adopted by a number of projects ranging from comparative genomics to synthetic biology simulations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Newcastle University eTheses

Software for Exascale Computing - SPPEXA 2016-2019

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

OAPEN Library

International VLBI Service for Geodesy and Astrometry 2011 Annual Report

Author: Baver Karen D.
Behrend Dirk
Publication venue
Publication date
Field of study

This volume of reports is the 2011 Annual Report of the International VLBI Service for Geodesy and Astrometry (IVS). The individual reports were contributed by VLBI groups in the international geodetic and astrometric community who constitute the components of IVS. The 2011 Annual Report documents the work of these IVS components over the period January 1, 2011 through December 31, 2011. The reports document changes, activities, and progress of the IVS. The entire contents of this Annual Report also appear on the IVS Web site at http://ivscc.gsfc.nasa.gov/publications/ar2011

NASA Technical Reports Server

International VLBI Service for Geodesy and Astrometry 2012 Annual Report

Author: Armstrong Kyla L.
Baver Karen D.
Behrend Dirk
Publication venue
Publication date
Field of study

This volume of reports is the 2012 Annual Report of the International VLBI Service for Geodesy and Astrometry (IVS). The individual reports were contributed by VLBI groups in the international geodetic and astrometric community who constitute the permanent components of IVS. The IVS 2012 Annual Report documents the work of the IVS components for the calendar year 2012, our fourteenth year of existence. The reports describe changes, activities, and progress ofthe IVS. Many thanks to all IVS components who contributed to this Annual Report. With the exception of the first section and parts of the last section (described below), the contents of this Annual Report also appear on the IVS Web site athttp:ivscc.gsfc.nasa.gov/publications/ar201

NASA Technical Reports Server

GSI Scientific Report 2009 [GSI Report 2010-1]

Author
Publication venue: 'LPPMPK - Sekolah Tinggi Teknologi Muhammadiyah Cileungsi'
Publication date: 01/01/2010
Field of study

GSI Repository

GSI Scientific Report 2009 [GSI Report 2010-1]

Author: Lam Nelson T. K.
Sheikh M. Neaz
Tsang Hing-Ho
Venkatesan Srikanth
Publication venue: 'LPPMPK - Sekolah Tinggi Teknologi Muhammadiyah Cileungsi'
Publication date: 01/01/2010
Field of study

Displacement design response spectrum is an essential component for the currently-developing displacement-based seismic design and assessment procedures. This paper proposes a new and simple method for constructing displacement design response spectra on soft soil sites. The method takes into account modifications of the seismic waves by the soil layers, giving due considerations to factors such as the level of bedrock shaking, material non-linearity, seismic impedance contrast at the interface between soil and bedrock, and plasticity of the soil layers. The model is particularly suited to applications in regions with a paucity of recorded strong ground motion data, from which empirical models cannot be reliably developed

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Research Online

GSI Repository