2,834 research outputs found
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)
Computing plays an essential role in all aspects of high energy physics. As
computational technology evolves rapidly in new directions, and data throughput
and volume continue to follow a steep trend-line, it is important for the HEP
community to develop an effective response to a series of expected challenges.
In order to help shape the desired response, the HEP Forum for Computational
Excellence (HEP-FCE) initiated a roadmap planning activity with two key
overlapping drivers -- 1) software effectiveness, and 2) infrastructure and
expertise advancement. The HEP-FCE formed three working groups, 1) Applications
Software, 2) Software Libraries and Tools, and 3) Systems (including systems
software), to provide an overview of the current status of HEP computing and to
present findings and opportunities for the desired HEP computational roadmap.
The final versions of the reports are combined in this document, and are
presented along with introductory material.Comment: 72 page
Resource provisioning in Science Clouds: Requirements and challenges
Cloud computing has permeated into the information technology industry in the
last few years, and it is emerging nowadays in scientific environments. Science
user communities are demanding a broad range of computing power to satisfy the
needs of high-performance applications, such as local clusters,
high-performance computing systems, and computing grids. Different workloads
are needed from different computational models, and the cloud is already
considered as a promising paradigm. The scheduling and allocation of resources
is always a challenging matter in any form of computation and clouds are not an
exception. Science applications have unique features that differentiate their
workloads, hence, their requirements have to be taken into consideration to be
fulfilled when building a Science Cloud. This paper will discuss what are the
main scheduling and resource allocation challenges for any Infrastructure as a
Service provider supporting scientific applications
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus
This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campusโ usability to the actual learnersโ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft
Fog Computing
Everything that is not a computer, in the traditional sense, is being connected to the Internet. These devices are also referred to as the Internet of Things and they are pressuring the current network infrastructure. Not all devices are intensive data producers and part of them can be used beyond their original intent by sharing their computational resources. The combination of those two factors can be used either to perform insight over the data closer where is originated or extend into new services by making available computational resources, but not exclusively, at the edge of the network. Fog computing is a new computational paradigm that provides those devices a new form of cloud at a closer distance where IoT and other devices with connectivity capabilities can offload computation. In this dissertation, we have explored the fog computing paradigm, and also comparing with other paradigms, namely cloud, and edge computing. Then, we propose a novel architecture that can be used to form or be part of this new paradigm. The implementation was tested on two types of applications. The first application had the main objective of demonstrating the correctness of the implementation while the other application, had the goal of validating the characteristics of fog computing.Tudo o que nรฃo รฉ um computador, no sentido tradicional, estรก sendo conectado ร Internet. Esses dispositivos tambรฉm sรฃo chamados de Internet das Coisas e estรฃo pressionando a infraestrutura de rede atual. Nem todos os dispositivos sรฃo produtores intensivos de dados e parte deles pode ser usada alรฉm de sua intenรงรฃo original, compartilhando seus recursos computacionais. A combinaรงรฃo desses dois fatores pode ser usada para realizar processamento dos dados mais prรณximos de onde sรฃo originados ou estender para a criaรงรฃo de novos serviรงos, disponibilizando recursos computacionais perifรฉricos ร rede. Fog computing รฉ um novo paradigma computacional que fornece a esses dispositivos uma nova forma de nuvem a uma distรขncia mais prรณxima, onde โThingsโ e outros dispositivos com recursos de conectividade possam delegar processamento. Nesta dissertaรงรฃo, exploramos fog computing e tambรฉm comparamos com outros paradigmas, nomeadamente cloud e edge computing. Em seguida, propomos uma nova arquitetura que pode ser usada para formar ou fazer parte desse novo paradigma. A implementaรงรฃo foi testada em dois tipos de aplicativos. A primeira aplicaรงรฃo teve o objetivo principal de demonstrar a correรงรฃo da implementaรงรฃo, enquanto a outra aplicaรงรฃo, teve como objetivo validar as caracterรญsticas de fog computing
Real-Time Localization Using Software Defined Radio
Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devicesโ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devicesโ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system
ํด๋ผ์ฐ๋ ์ปดํจํ ํ๊ฒฝ๊ธฐ๋ฐ์์ ์์น ๋ชจ๋ธ๋ง๊ณผ ๋จธ์ ๋ฌ๋์ ํตํ ์ง๊ตฌ๊ณผํ ์๋ฃ์์ฑ์ ๊ดํ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์์ฐ๊ณผํ๋ํ ์ง๊ตฌํ๊ฒฝ๊ณผํ๋ถ, 2022. 8. ์กฐ์๊ธฐ.To investigate changes and phenomena on Earth, many scientists use high-resolution-model results based on numerical models or develop and utilize machine learning-based prediction models with observed data. As information technology advances, there is a need for a practical methodology for generating local and global high-resolution numerical modeling and machine learning-based earth science data.
This study recommends data generation and processing using high-resolution numerical models of earth science and machine learning-based prediction models in a cloud environment.
To verify the reproducibility and portability of high-resolution numerical ocean model implementation on cloud computing, I simulated and analyzed the performance of a numerical ocean model at various resolutions in the model domain, including the Northwest Pacific Ocean, the East Sea, and the Yellow Sea. With the containerization method, it was possible to respond to changes in various infrastructure environments and achieve computational reproducibility effectively.
The data augmentation of subsurface temperature data was performed using generative models to prepare large datasets for model training to predict the vertical temperature distribution in the ocean. To train the prediction model, data augmentation was performed using a generative model for observed data that is relatively insufficient compared to satellite dataset.
In addition to observation data, HYCOM datasets were used for performance comparison, and the data distribution of augmented data was similar to the input data distribution. The ensemble method, which combines stand-alone predictive models, improved the performance of the predictive model compared to that of the model based on the existing observed data. Large amounts of computational resources were required for data synthesis, and the synthesis was performed in a cloud-based graphics processing unit environment.
High-resolution numerical ocean model simulation, predictive model development, and the data generation method can improve predictive capabilities in the field of ocean science. The numerical modeling and generative models based on cloud computing used in this study can be broadly applied to various fields of earth science.์ง๊ตฌ์ ๋ณํ์ ํ์์ ์ฐ๊ตฌํ๊ธฐ ์ํด ๋ง์ ๊ณผํ์๋ค์ ์์น ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ ๊ณ ํด์๋ ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์ฌ์ฉํ๊ฑฐ๋ ๊ด์ธก๋ ๋ฐ์ดํฐ๋ก ๋จธ์ ๋ฌ๋ ๊ธฐ๋ฐ ์์ธก ๋ชจ๋ธ์ ๊ฐ๋ฐํ๊ณ ํ์ฉํ๋ค. ์ ๋ณด๊ธฐ์ ์ด ๋ฐ์ ํจ์ ๋ฐ๋ผ ์ง์ญ ๋ฐ ์ ์ง๊ตฌ์ ์ธ ๊ณ ํด์๋ ์์น ๋ชจ๋ธ๋ง๊ณผ ๋จธ์ ๋ฌ๋ ๊ธฐ๋ฐ ์ง๊ตฌ๊ณผํ ๋ฐ์ดํฐ ์์ฑ์ ์ํ ์ค์ฉ์ ์ธ ๋ฐฉ๋ฒ๋ก ์ด ํ์ํ๋ค.
๋ณธ ์ฐ๊ตฌ๋ ์ง๊ตฌ๊ณผํ์ ๊ณ ํด์๋ ์์น ๋ชจ๋ธ๊ณผ ๋จธ์ ๋ฌ๋ ๊ธฐ๋ฐ ์์ธก ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํ ๋ฐ์ดํฐ ์์ฑ ๋ฐ ์ฒ๋ฆฌ๊ฐ ํด๋ผ์ฐ๋ ํ๊ฒฝ์์ ํจ๊ณผ์ ์ผ๋ก ๊ตฌํ๋ ์ ์์์ ์ ์ํ๋ค.
ํด๋ผ์ฐ๋ ์ปดํจํ
์์ ๊ณ ํด์๋ ์์น ํด์ ๋ชจ๋ธ ๊ตฌํ์ ์ฌํ์ฑ๊ณผ ์ด์์ฑ์ ๊ฒ์ฆํ๊ธฐ ์ํด ๋ถ์ํํ์, ๋ํด, ํฉํด ๋ฑ ๋ชจ๋ธ ์์ญ์ ๋ค์ํ ํด์๋์์ ์์น ํด์ ๋ชจ๋ธ์ ์ฑ๋ฅ์ ์๋ฎฌ๋ ์ด์
ํ๊ณ ๋ถ์ํ์๋ค. ์ปจํ
์ด๋ํ ๋ฐฉ์์ ํตํด ๋ค์ํ ์ธํ๋ผ ํ๊ฒฝ ๋ณํ์ ๋์ํ๊ณ ๊ณ์ฐ ์ฌํ์ฑ์ ํจ๊ณผ์ ์ผ๋ก ํ๋ณดํ ์ ์์๋ค.
๋จธ์ ๋ฌ๋ ๊ธฐ๋ฐ ๋ฐ์ดํฐ ์์ฑ์ ์ ์ฉ์ ๊ฒ์ฆํ๊ธฐ ์ํด ์์ฑ ๋ชจ๋ธ์ ์ด์ฉํ ํ์ธต ์ดํ ์จ๋ ๋ฐ์ดํฐ์ ๋ฐ์ดํฐ ์ฆ๊ฐ์ ์คํํ์ฌ ํด์์ ์์ง ์จ๋ ๋ถํฌ๋ฅผ ์์ธกํ๋ ๋ชจ๋ธ ํ๋ จ์ ์ํ ๋์ฉ๋ ๋ฐ์ดํฐ ์ธํธ๋ฅผ ์ค๋นํ๋ค. ์์ธก๋ชจ๋ธ ํ๋ จ์ ์ํด ์์ฑ ๋ฐ์ดํฐ์ ๋นํด ์๋์ ์ผ๋ก ๋ถ์กฑํ ๊ด์ธก ๋ฐ์ดํฐ์ ๋ํด์ ์์ฑ ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ๋ฐ์ดํฐ ์ฆ๊ฐ์ ์ํํ์๋ค. ๋ชจ๋ธ์ ์์ธก์ฑ๋ฅ ๋น๊ต์๋ ๊ด์ธก ๋ฐ์ดํฐ ์ธ์๋ HYCOM ๋ฐ์ดํฐ ์ธํธ๋ฅผ ์ฌ์ฉํ์์ผ๋ฉฐ, ์ฆ๊ฐ ๋ฐ์ดํฐ์ ๋ฐ์ดํฐ ๋ถํฌ๋ ์
๋ ฅ ๋ฐ์ดํฐ ๋ถํฌ์ ์ ์ฌํจ์ ํ์ธํ์๋ค. ๋
๋ฆฝํ ์์ธก ๋ชจ๋ธ์ ๊ฒฐํฉํ ์์๋ธ ๋ฐฉ์์ ๊ธฐ์กด ๊ด์ธก ๋ฐ์ดํฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ๋ ์์ธก ๋ชจ๋ธ์ ์ฑ๋ฅ์ ๋นํด ํฅ์๋์๋ค. ๋ฐ์ดํฐํฉ์ฑ์ ์ํด ๋ง์ ์์ ๊ณ์ฐ ์์์ด ํ์ํ์ผ๋ฉฐ, ๋ฐ์ดํฐ ํฉ์ฑ์ ํด๋ผ์ฐ๋ ๊ธฐ๋ฐ GPU ํ๊ฒฝ์์ ์ํ๋์๋ค.
๊ณ ํด์๋ ์์น ํด์ ๋ชจ๋ธ ์๋ฎฌ๋ ์ด์
, ์์ธก ๋ชจ๋ธ ๊ฐ๋ฐ, ๋ฐ์ดํฐ ์์ฑ ๋ฐฉ๋ฒ์ ํด์ ๊ณผํ ๋ถ์ผ์์ ์์ธก ๋ฅ๋ ฅ์ ํฅ์์ํฌ ์ ์๋ค. ๋ณธ ์ฐ๊ตฌ์์ ์ฌ์ฉ๋ ํด๋ผ์ฐ๋ ์ปดํจํ
๊ธฐ๋ฐ์ ์์น ๋ชจ๋ธ๋ง ๋ฐ ์์ฑ ๋ชจ๋ธ์ ์ง๊ตฌ ๊ณผํ์ ๋ค์ํ ๋ถ์ผ์ ๊ด๋ฒ์ํ๊ฒ ์ ์ฉ๋ ์ ์๋ค.1. General Introduction 1
2. Performance of numerical ocean modeling on cloud computing 6
2.1. Introduction 6
2.2. Cloud Computing 9
2.2.1. Cloud computing overview 9
2.2.2. Commercial cloud computing services 12
2.3. Numerical model for performance analysis of commercial clouds 15
2.3.1. High Performance Linpack Benchmark 15
2.3.2. Benchmark Sustainable Memory Bandwidth and Memory Latency 16
2.3.3. Numerical Ocean Model 16
2.3.4. Deployment of Numerical Ocean Model and Benchmark Packages on Cloud Clusters 19
2.4. Simulation results 21
2.4.1. Benchmark simulation 21
2.4.2. Ocean model simulation 24
2.5. Analysis of ROMS performance on commercial clouds 26
2.5.1. Performance of ROMS according to H/W resources 26
2.5.2. Performance of ROMS according to grid size 34
2.6. Summary 41
3. Reproducibility of numerical ocean model on the cloud computing 44
3.1. Introduction 44
3.2. Containerization of numerical ocean model 47
3.2.1. Container virtualization 47
3.2.2. Container-based architecture for HPC 49
3.2.3. Container-based architecture for hybrid cloud 53
3.3. Materials and Methods 55
3.3.1. Comparison of traditional and container based HPC cluster workflows 55
3.3.2. Model domain and datasets for numerical simulation 57
3.3.3. Building the container image and registration in the repository 59
3.3.4. Configuring a numeric model execution cluster 64
3.4. Results and Discussion 74
3.4.1. Reproducibility 74
3.4.2. Portability and Performance 76
3.5. Conclusions 81
4. Generative models for the prediction of ocean temperature profile 84
4.1. Introduction 84
4.2. Materials and Methods 87
4.2.1. Model domain and datasets for predicting the subsurface temperature 87
4.2.2. Model architecture for predicting the subsurface temperature 90
4.2.3. Neural network generative models 91
4.2.4. Prediction Models 97
4.2.5. Accuracy 103
4.3. Results and Discussion 104
4.3.1. Data Generation 104
4.3.2. Ensemble Prediction 109
4.3.3. Limitations of this study and future works 111
4.4. Conclusion 111
5. Summary and conclusion 114
6. References 118
7. Abstract (in Korean) 140๋ฐ
- โฆ