23 research outputs found

    Multi-level Meta-workflows: New Concept for Regularly Occurring Tasks in Quantum Chemistry

    Get PDF
    Background: In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Significance: Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. Conclusions: We investigated the operations that represent basic functionalities in Quantum Chemistry and developed that relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository

    Generic Metadata Handling in Scientific Data Life Cycles

    Get PDF
    Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management

    A Formal Approach to Support Interoperability in Scientific Meta-workflows

    Get PDF
    Scientific workflows orchestrate the execution of complex experiments frequently using distributed computing platforms. Meta-workflows represent an emerging type of such workflows which aim to reuse existing workflows from potentially different workflow systems to achieve more complex and experimentation minimizing workflow design and testing efforts. Workflow interoperability plays a profound role in achieving this objective. This paper is focused at fostering interoperability across meta-workflows that combine workflows of different workflow systems from diverse scientific domains. This is achieved by formalizing definitions of meta-workflow and its different types to standardize their data structures used to describe workflows to be published and shared via public repositories. The paper also includes thorough formalization of two workflow interoperability approaches based on this formal description: the coarse-grained and fine-grained workflow interoperability approach. The paper presents a case study from Astrophysics which successfully demonstrates the use of the concepts of meta-workflows and workflow interoperability within a scientific simulation platform

    Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)

    Get PDF
    The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities

    Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

    Full text link
    The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testing neural networks architectures on H2O platform for various activation functions, stopping metrics, and other parameters of machine learning algorithm. It was demonstrated for the use case of MNIST database of handwritten digits in single-threaded mode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase of precision. This result can have crucial influence for optimization of available and new machine learning methods, especially for image recognition problems.Comment: 15 pages, 11 figures, 4 tables; this paper summarizes the activities which were started recently and described shortly in the previous conference presentations arXiv:1706.02248 and arXiv:1707.04940; it is accepted for Springer book series "Advances in Intelligent Systems and Computing

    Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes

    Full text link
    The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation.Comment: 4 pages, 6 figures, 4 tables; XIIth International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT 2017), Lviv, Ukrain

    ZIH-Info

    Get PDF
    - Aktualisierung Ticketsystem - Umgang mit Spam-Mails - Verbesserung der Cloudstore-Performance - Mitteilung aus dem Medienzentrum - Mitteilung aus dem Dezernat 6 - ZIH-Publikationen - Veranstaltunge

    GRID-система на основе европейских стандартов EGI для крупномасштабных расчетов по оригинальному ускоренному методу квантовой химии

    Get PDF
    Based on the analysis of modern tools for creating GRID-type information systems that are part of the European EGI “standard” – UMD repository (including new versions of Globus Toolkit, ARC, dCache, etc.), the applying of GRID systems for computational chemistry is briefly discussed. The GRID system created by the authors combines two clusters with Linux CentOS 7 and is based on software from UMD-4. The relevance and effectiveness of batch processing systems (we use Torque 4.2.10) in quantum chemical calculations is increased for mass calculations of docking complexes (including for drug modeling problems), for which an improved semiempirical method with more efficient approximations was proposed, implemented in the Fortran-95 LSSDOCK software package. For such calculations, new approximation methods have been developed, including for DFT functionals, and their software implementation is carried out. Converters of calculation results by LSSDOCK into a natural for GRID XML-based format CML version 3 are developed. Using the CML format based on dCache software, a single tree of a virtual GRID filesystem distributed between heterogeneous nodes is used to store the results of LSSDOCK calculations.На основе анализа современных средств создания ИС GRID-типа, входящих в ставший европейским EGI-“стандартом” репозиторий UMD (включая новые версии Globus Toolkit, ARC, dCache и др.), кратко рассмотрено применение GRID-систем для задач вычислительной химии. Созданная авторами GRID-система объединяет два кластера с Linux CentOS 7 и базируется на программном обеспечении из UMD-4. Актуальность и эффективность применения систем пакетной обработки (у нас используется Torque 4.2.10) в квантовохимических расчетах повышается для массовых расчетов докинг-комплексов (в т.ч. для задач моделирования лекарств), для чего был предложен усовершенствованный полуэмпирический метод с более эффективными аппроксимациями, реализованный в программном комплексе LSSDOCK на Fortran-95. Для таких расчетов разработаны новые методы аппроксимаций, в т.ч. для функционалов DFT, и осуществляется их программная реализация. Разработаны конверторы результатов расчетов по LSSDOCK в естественный для GRID, основанный на XML, формат CML версии 3. С использованием CMLформата на базе программных средств dCache реализовано единое дерево виртуальной файловой GRID-системы, распределённой между гетерогенными узлами, которое используется для хранения результатов расчетов по LSSDOCK
    corecore