26 research outputs found

    Методичні вказівки до виконання лабораторних робіт з курсу "Сучасні технології розробки Інтернет-застосунків"

    Get PDF
    Методичні вказівки містять теоретичний матеріал та завдання до лабораторних робіт з курсу "Сучасні технології розробки Інтернет-застосунків" для студентів спеціальності "Прикладна та комп'ютерна лінгвістика". Навчально-методичне видання призначено для набуття необхідної методичної допомоги при виконанні лабораторних робіт. Методичний матеріал охоплює широке коло питань, пов'язаних з використанням серверної мови РНР та технологій MySQL для розробки web-додатків

    Enabling Scalable Data Processing and Management through Standards-based Job Execution and the Global Federated File System

    Get PDF
    Emerging challenges for scientific communities are to efficiently process big data obtained by experimentation and computational simulations. Supercomputing architectures are available to support scalable and high performant processing environment, but many of the existing algorithm implementations are still unable to cope with its architectural complexity. One approach is to have innovative technologies that effectively use these resources and also deal with geographically dispersed large datasets. Those technologies should be accessible in a way that data scientists who are running data intensive computations do not have to deal with technical intricacies of the underling execution system. Our work primarily focuses on providing data scientists with transparent access to these resources in order to easily analyze data. Impact of our work is given by describing how we enabled access to multiple high performance computing resources through an open standards-based middleware that takes advantage of a unified data management provided by the the Global Federated File System. Our architectural design and its associated implementation is validated by a usecase that requires massivley parallel DBSCAN outlier detection on a 3D point clouds dataset

    Enabling scientific workflow and gateways using the standards-based XSEDE architecture

    Get PDF
    The XSEDE project seeks to provide “a single virtual system that scientists can use to interactively share computing resources, data and experience.” The potential compute resources in XSEDE are diverse in many dimensions, node architectures, interconnects, memory, local queue management systems, and authentication policies to name a few. The diversity is particularly rich when one considers the NSF funded service providers and the many campuses that wish to participate via campus bridging activities. Resource diversity presents challenges to both application developers and application platform developers (e.g., developers of gateways, portals, and workflow engines). The XSEDE Execution Management Services (EMS) architecture is an instance of the Open Grid Services Architecture EMS and is used by higher level services such as gateways and workflow engines to provide end users with execution services that meet their needs. The contribution of this paper is to provide a concise explanation and concrete examples of how the EMS works, how it can be used to support scientific gateways and workflow engines, and how the XSEDE EMS and other OGSA EMS architectures can be used by applications developers to securely access heterogeneous distributed computing and data resources

    Stöðluð módel og högun til að sjálfvirknivæða stigfrjálsa dreifða gagnavinnslu og greiningu

    No full text
    Scientific communities engaging in big data analysis face numerous challenges in managing complex computations and the related data on emerging and distributed computing infrastructures. Large-scale data analysis requires applications with simplified access to multiple resource management systems. Several generic or domain-specific technologies have been developed to exploit diversified computing environments, but due to the heterogeneity of computing and data architectures they are not capable of enabling real science cases. Scientific gateways and workflows are one such example which requires the management of jobs on multiple kinds of batch systems using heterogeneous supercomputing architectures and access to advanced distributed file systems. To support these requirements, a unified architectural framework is presented in this dissertation that coalesces the right combination of standards and adequate middleware realisation. This framework manages concurrent access for diversified user communities through consistent and robust computing and data interfaces oriented to current application and infrastructure demands. The investigations reported in this dissertation were mainly motivated by physical and machine-learning models, represented by two scientific case studies: biophysics and Earth sciences. In the field of biophysics, the UltraScan scientific gateway is enhanced to enable the processing of domain-specific data through standards-based job and data management interfaces in HPC environments. The second domain deals with Earth sciences and automates the processing of machine-learning algorithms (e.g. classification of remote sensing images) using scalable and parallel implementations. As proof of concept, both the case studies are supported through open source implementations, in the form of middleware realisation, client APIs and their integration with state-of-the-art science gateway frameworks.Vísindasamfélög sem vinna með stórtæk gögn kljást við margskonar áskoranir í sambandi við meðhöndlun flókinna útreikninga, og gögnum þeim tengdum, á komandi og dreifðum kerfum. Stórtæk gagnagreining kallar á lausnir með einfölduðu aðgengi að margvíslegum tölvurekstrarkerfum. Margar almennar og sértækar aðferðir hafa verið þróaðar til að nota síbreytileg reiknikerfi, en vegna ólíkra reikniaðferða og þeirra gagnaskipan geta þær ekki framkvæmt alvöru vísindarannsóknir. Vísindalegar gagnagáttir og vinnuferli eru dæmi um slíkt sem þarfnast verkmeðhöndlunar á margvíslegum bunkakerfum á ólíkum ofurtölvuhögum og aðgengi að háþróuðum dreifðum skráarkerfum. Til að styðja þessar kröfur er í þessari doktorsritgerð kynntur högunarrammi sem sameinar réttu samsetninguna af stöðlum og uppsetningu fullnægjandi millibúnaðar. Þessi rammi meðhöndlar samhliða aðgang fyrir fjölbreytta notandahópa í gegnum öflug og áreiðanleg reikni- og gagnasnið sem eru sniðin að þörfum forrita og tölvukerfainnviðum. Rannsóknaniðurstöðurnar sem eru kynntar í þessari doktorsritgerð eru aðalega rökstuddar með raun- og vélarnámsmódelum frá tveimur dæmum frá jafnmörgum fræðasviðum: lífeðlisfræði og jarvísindum. Fyrir lífeðlisfræði er UltraScan vísindagáttin betrumbætt til þess að gera henni kleift að meðhöndla sértæk gögn í gegnum stöðluð verkumsjónar- og gagnastjórnunarsnið í háhraða tölvukerfum (HPC). Seinna fræðisviðið er jarðvísindi og gerir meðhöndlun vélarnámsaðferða sjálfvirka (t.d. greiningu fjarkönnunarmyndefnis) með stigvaxand útfærslum sem hægt er að keyra samhliða. Dæmin frá bæðum fræðisviðum eru studd með opnum hugbúnaði í formi millibúnaðarútfærslna, biðlaraforritaskil með bestu gáttarömmum sem fyrirfinnast í dag, til þess að sanna gildi þeirra

    A Reliable Grid Information Service Using a Unified Information Model

    No full text
    An information system is an essential part of every Grid system since it provides information about the entities (services, resources, user, etc.) a Grid consists of. The information provided by such a system may serve a variety of needs. It can for example be used for brokering purposes, to schedule workflows, to orchestrate services, or to predict the performance of a Grid. Such use cases very much determine the information model and the design of the information service. In this thesis, I present an open standard, platform-agnostic, and web services based information service. Since a Grid information service consolidates disseminated information of diversified Grid entities, the modeling of this information has a large impact on the Grid information service while executing its business functions. In this thesis, I formalize a resource schema which models an elementary concepts required to represent significant Grid entities. The question of reliability and fault tolerance is often not properly answered when it comes to the realization of distributed systems, although the deployment of any system in a distributed environment shows that those factors are of utmost importance. As the Grid information service developed during the diploma theses will be deployed in a highly-distributed Grid, and is one of the essential services for the Grid to operate properly, it is expected to satisfy certain reliability criteria. In this thesis, I present a reliability aspect of Grid information service through a multi-tier replication infrastructure by focusing on its architectural and implementation details

    Standards-based Models and Architectures to Automate Scalable and Distributed Data Processing and Analysis

    No full text
    Scientific communities engaging in big data analysis face numerous challenges in managing complex computations and the related data on emerging and distributed computing infrastructures. Large-scale data analysis requires applications with simplified access to multiple resource management systems. Several generic or domain-specific technologies have been developed to exploit diversified computing environments, but due to the heterogeneity of computing and data architectures they are not capable of enabling real science cases. Scientific gateways and workflows are one such example which requires the management of jobs on multiple kinds of batch systems using heterogeneous supercomputing architectures and access to advanced distributed file systems. To support these requirements, a unified architectural framework is presented in this dissertation that coalesces the right combination of standards and adequate middleware realisation. This framework manages concurrent access for diversified user communities through consistent and robust computing and data interfaces oriented to current application and infrastructure demands. The investigations reported in this dissertation were mainly motivated by physical and machine-learning models, represented by two scientific case studies: biophysics and Earth sciences. In the field of biophysics, the UltraScan scientific gateway is enhanced to enable the processing of domain-specific data through standards-based job and data management interfaces in HPC environments. The second domain deals with Earth sciences and automates the processing of machine-learning algorithms (e.g. classification of remote sensing images) using scalable and parallel implementations. As proof of concept, both the case studies are supported through open source implementations, in the form of middleware realisation, client APIs and their integration with state-of-the-art science gateway frameworks

    High productivity data processing analytics methods with applications

    No full text
    The term `big data analytics' emerged in order to engage in the ever increasing amount of scientific and engineering data with general analytics techniques that support the often more domain-specific data analysis process. It is recognized that the big data challenge can only be adequately addressed when knowledge of various different fields such as data mining, machine learning algorithms, parallel processing, and data management practices are effectively combined. This paper thus describes some of the `smart data analytics methods' that enable a high productivity data processing of large quantities of scientific data in order to enhance the data analysis efficiency. The paper thus aims to provide new insights how various fields can be successfully combined. Contributions of this paper include the concretization of the cross-industry standard process for data mining (CRISP-DM) process model in scientific environments using concrete machine learning algorithms (e.g. support vector machines that enable data classification) or data mining mechanisms (e.g. outlier detection in measurements). Serial and parallel approaches to specific data analysis challenges are discussed in the context of concrete earth science application data sets. Solutions also include various data visualizations that enable a better insight in the corresponding data analytics and analysis process

    Scalable Machine Learning with High Performance and Cloud Computing

    No full text
    Deep Learning is emerging as the leading AI technique owing to the current convergence of scalable computing capability (i.e., HCP and Cloud computing), easy access to large volumes of data, and the emergence of new algorithms enabling robust training of large-scale deep neural networks. The tutorial aims at providing a complete overview for an audience that is not familiar with these topics.Lecture 1: Introduction -Jülich Supercomputing Centre - Forschungszentrum Jülich-Machine learning and Deep Learning in remote sensing-Deep learning and SupercomputingLecture 2: Levels of Parallelism and High Performance Computing‍-The Free Lunch is Over-Hardware Levels of Parallelism-High Performance Computing (HPC)-Jupyter-JSC‍Lecture 3: Distributed Deep Learning‍-Distributed training-Horovod-DeepSpeedLecture 4: Hands-on Distributed Deep Learning‍-Become familiar with Horovod, a data distributed training framework-Understand how to modify existing code to enable parallelism-Understand the importance of distributing data beforehand-Understand what Horovod does looking at the lines of code to be added-Create a job script to execute Python code on the GPUs-Play around with model architecture, optimizer, learning rate‍Lecture 5: Big Data Analytics using Apache Spark‍-Apache Spark Basics-Developing on Spark and Clouds-Machine Learning on Spark

    Facilitating efficient data analysis of remotely sensed images using standards-based parameter sweep models

    No full text
    Classification of remote sensing images often use Support Vector Machines (SVMs) that require an n-fold cross-validation phase in order to do model selection. This phase is characterized by sweeping through a wide set of parameter combinations of SVM kernel and cost parameters. As a consequence this process is computationally expensive but represents a principled way of tuning a model for better accuracy and to prevent overfitting together with regularization that is in SVMs inherently solved in the optimization. Since the cross-validation technique is done in a principled way also known as ‘gridsearch’, we aim at supporting remote sensing scientists in two ways. Firstly by reducing the time-to-solution of the cross-validation by applying state-of-the-art parallel processing methods because the sweep of parameters and cross-validation runs itself can be nicely parallelized. Secondly by reducing manual labour by automating the parallel submission processes since manually performing cross-validation is very time consuming, unintuitive, and error-prone especially in large-scale cluster or supercomputing environments (e.g., batch job scripts, node/core/task parameters, etc.)

    Interoperable job execution and data access through UNICORE and the Global Federated File System

    No full text
    Computing middlewares play a vital role for abstracting complexities of backend resources by providing a seamless access to heterogeneous execution management services. Scientific communities are taking advantage of such technologies to focus on science rather than dealing with technical intricacies of accessing resources. Multi-disciplinary communities often bring dynamic requirements which are not trivial to realize. Specifically, to attain massivley parallel data processing on supercomputing resources which require an access to large data sets from widely distributed and dynamic sources located across organizational boundaries. In order to support this abstract scenario, we bring a combination that integrates UNICORE middleware and the Global Federated File System. Furthermore, the paper gives architectural and implementation perspective of UNICORE extension and its interaction with Global Federated File System space through computing, data and security standards
    corecore