2,834 research outputs found

    Survey and Analysis of Production Distributed Computing Infrastructures

    Full text link
    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

    High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

    Full text link
    Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

    Resource provisioning in Science Clouds: Requirements and challenges

    Full text link
    Cloud computing has permeated into the information technology industry in the last few years, and it is emerging nowadays in scientific environments. Science user communities are demanding a broad range of computing power to satisfy the needs of high-performance applications, such as local clusters, high-performance computing systems, and computing grids. Different workloads are needed from different computational models, and the cloud is already considered as a promising paradigm. The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads, hence, their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service provider supporting scientific applications

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

    Get PDF
    This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campusโ€™ usability to the actual learnersโ€™ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft

    Fog Computing

    Get PDF
    Everything that is not a computer, in the traditional sense, is being connected to the Internet. These devices are also referred to as the Internet of Things and they are pressuring the current network infrastructure. Not all devices are intensive data producers and part of them can be used beyond their original intent by sharing their computational resources. The combination of those two factors can be used either to perform insight over the data closer where is originated or extend into new services by making available computational resources, but not exclusively, at the edge of the network. Fog computing is a new computational paradigm that provides those devices a new form of cloud at a closer distance where IoT and other devices with connectivity capabilities can offload computation. In this dissertation, we have explored the fog computing paradigm, and also comparing with other paradigms, namely cloud, and edge computing. Then, we propose a novel architecture that can be used to form or be part of this new paradigm. The implementation was tested on two types of applications. The first application had the main objective of demonstrating the correctness of the implementation while the other application, had the goal of validating the characteristics of fog computing.Tudo o que nรฃo รฉ um computador, no sentido tradicional, estรก sendo conectado ร  Internet. Esses dispositivos tambรฉm sรฃo chamados de Internet das Coisas e estรฃo pressionando a infraestrutura de rede atual. Nem todos os dispositivos sรฃo produtores intensivos de dados e parte deles pode ser usada alรฉm de sua intenรงรฃo original, compartilhando seus recursos computacionais. A combinaรงรฃo desses dois fatores pode ser usada para realizar processamento dos dados mais prรณximos de onde sรฃo originados ou estender para a criaรงรฃo de novos serviรงos, disponibilizando recursos computacionais perifรฉricos ร  rede. Fog computing รฉ um novo paradigma computacional que fornece a esses dispositivos uma nova forma de nuvem a uma distรขncia mais prรณxima, onde โ€œThingsโ€ e outros dispositivos com recursos de conectividade possam delegar processamento. Nesta dissertaรงรฃo, exploramos fog computing e tambรฉm comparamos com outros paradigmas, nomeadamente cloud e edge computing. Em seguida, propomos uma nova arquitetura que pode ser usada para formar ou fazer parte desse novo paradigma. A implementaรงรฃo foi testada em dois tipos de aplicativos. A primeira aplicaรงรฃo teve o objetivo principal de demonstrar a correรงรฃo da implementaรงรฃo, enquanto a outra aplicaรงรฃo, teve como objetivo validar as caracterรญsticas de fog computing

    Real-Time Localization Using Software Defined Radio

    Get PDF
    Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devicesโ€™ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devicesโ€™ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system

    ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ํ™˜๊ฒฝ๊ธฐ๋ฐ˜์—์„œ ์ˆ˜์น˜ ๋ชจ๋ธ๋ง๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•œ ์ง€๊ตฌ๊ณผํ•™ ์ž๋ฃŒ์ƒ์„ฑ์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ์ง€๊ตฌํ™˜๊ฒฝ๊ณผํ•™๋ถ€, 2022. 8. ์กฐ์–‘๊ธฐ.To investigate changes and phenomena on Earth, many scientists use high-resolution-model results based on numerical models or develop and utilize machine learning-based prediction models with observed data. As information technology advances, there is a need for a practical methodology for generating local and global high-resolution numerical modeling and machine learning-based earth science data. This study recommends data generation and processing using high-resolution numerical models of earth science and machine learning-based prediction models in a cloud environment. To verify the reproducibility and portability of high-resolution numerical ocean model implementation on cloud computing, I simulated and analyzed the performance of a numerical ocean model at various resolutions in the model domain, including the Northwest Pacific Ocean, the East Sea, and the Yellow Sea. With the containerization method, it was possible to respond to changes in various infrastructure environments and achieve computational reproducibility effectively. The data augmentation of subsurface temperature data was performed using generative models to prepare large datasets for model training to predict the vertical temperature distribution in the ocean. To train the prediction model, data augmentation was performed using a generative model for observed data that is relatively insufficient compared to satellite dataset. In addition to observation data, HYCOM datasets were used for performance comparison, and the data distribution of augmented data was similar to the input data distribution. The ensemble method, which combines stand-alone predictive models, improved the performance of the predictive model compared to that of the model based on the existing observed data. Large amounts of computational resources were required for data synthesis, and the synthesis was performed in a cloud-based graphics processing unit environment. High-resolution numerical ocean model simulation, predictive model development, and the data generation method can improve predictive capabilities in the field of ocean science. The numerical modeling and generative models based on cloud computing used in this study can be broadly applied to various fields of earth science.์ง€๊ตฌ์˜ ๋ณ€ํ™”์™€ ํ˜„์ƒ์„ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๊ณผํ•™์ž๋“ค์€ ์ˆ˜์น˜ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ณ ํ•ด์ƒ๋„ ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ํ™œ์šฉํ•œ๋‹ค. ์ •๋ณด๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ ์ง€์—ญ ๋ฐ ์ „ ์ง€๊ตฌ์ ์ธ ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ๋ชจ๋ธ๋ง๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ง€๊ตฌ๊ณผํ•™ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์œ„ํ•œ ์‹ค์šฉ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ง€๊ตฌ๊ณผํ•™์˜ ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ๋ชจ๋ธ๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐ ์ฒ˜๋ฆฌ๊ฐ€ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ ํšจ๊ณผ์ ์œผ๋กœ ๊ตฌํ˜„๋  ์ˆ˜ ์žˆ์Œ์„ ์ œ์•ˆํ•œ๋‹ค. ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ…์—์„œ ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ํ•ด์–‘ ๋ชจ๋ธ ๊ตฌํ˜„์˜ ์žฌํ˜„์„ฑ๊ณผ ์ด์‹์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ถ์„œํƒœํ‰์–‘, ๋™ํ•ด, ํ™ฉํ•ด ๋“ฑ ๋ชจ๋ธ ์˜์—ญ์˜ ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„์—์„œ ์ˆ˜์น˜ ํ•ด์–‘ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ณ  ๋ถ„์„ํ•˜์˜€๋‹ค. ์ปจํ…Œ์ด๋„ˆํ™” ๋ฐฉ์‹์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์ธํ”„๋ผ ํ™˜๊ฒฝ ๋ณ€ํ™”์— ๋Œ€์‘ํ•˜๊ณ  ๊ณ„์‚ฐ ์žฌํ˜„์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์˜ ์ ์šฉ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์„ฑ ๋ชจ๋ธ์„ ์ด์šฉํ•œ ํ‘œ์ธต ์ดํ•˜ ์˜จ๋„ ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์‹คํ–‰ํ•˜์—ฌ ํ•ด์–‘์˜ ์ˆ˜์ง ์˜จ๋„ ๋ถ„ํฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์œ„ํ•œ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ค€๋น„ํ–ˆ๋‹ค. ์˜ˆ์ธก๋ชจ๋ธ ํ›ˆ๋ จ์„ ์œ„ํ•ด ์œ„์„ฑ ๋ฐ์ดํ„ฐ์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๋ถ€์กฑํ•œ ๊ด€์ธก ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ƒ์„ฑ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ฑ๋Šฅ ๋น„๊ต์—๋Š” ๊ด€์ธก ๋ฐ์ดํ„ฐ ์™ธ์—๋„ HYCOM ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋…๋ฆฝํ˜• ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ ์•™์ƒ๋ธ” ๋ฐฉ์‹์€ ๊ธฐ์กด ๊ด€์ธก ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์˜ˆ์ธก ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๋น„ํ•ด ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ๋ฐ์ดํ„ฐํ•ฉ์„ฑ์„ ์œ„ํ•ด ๋งŽ์€ ์–‘์˜ ๊ณ„์‚ฐ ์ž์›์ด ํ•„์š”ํ–ˆ์œผ๋ฉฐ, ๋ฐ์ดํ„ฐ ํ•ฉ์„ฑ์€ ํด๋ผ์šฐ๋“œ ๊ธฐ๋ฐ˜ GPU ํ™˜๊ฒฝ์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ํ•ด์–‘ ๋ชจ๋ธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜, ์˜ˆ์ธก ๋ชจ๋ธ ๊ฐœ๋ฐœ, ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐฉ๋ฒ•์€ ํ•ด์–‘ ๊ณผํ•™ ๋ถ„์•ผ์—์„œ ์˜ˆ์ธก ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉ๋œ ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ๊ธฐ๋ฐ˜์˜ ์ˆ˜์น˜ ๋ชจ๋ธ๋ง ๋ฐ ์ƒ์„ฑ ๋ชจ๋ธ์€ ์ง€๊ตฌ ๊ณผํ•™์˜ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.1. General Introduction 1 2. Performance of numerical ocean modeling on cloud computing 6 2.1. Introduction 6 2.2. Cloud Computing 9 2.2.1. Cloud computing overview 9 2.2.2. Commercial cloud computing services 12 2.3. Numerical model for performance analysis of commercial clouds 15 2.3.1. High Performance Linpack Benchmark 15 2.3.2. Benchmark Sustainable Memory Bandwidth and Memory Latency 16 2.3.3. Numerical Ocean Model 16 2.3.4. Deployment of Numerical Ocean Model and Benchmark Packages on Cloud Clusters 19 2.4. Simulation results 21 2.4.1. Benchmark simulation 21 2.4.2. Ocean model simulation 24 2.5. Analysis of ROMS performance on commercial clouds 26 2.5.1. Performance of ROMS according to H/W resources 26 2.5.2. Performance of ROMS according to grid size 34 2.6. Summary 41 3. Reproducibility of numerical ocean model on the cloud computing 44 3.1. Introduction 44 3.2. Containerization of numerical ocean model 47 3.2.1. Container virtualization 47 3.2.2. Container-based architecture for HPC 49 3.2.3. Container-based architecture for hybrid cloud 53 3.3. Materials and Methods 55 3.3.1. Comparison of traditional and container based HPC cluster workflows 55 3.3.2. Model domain and datasets for numerical simulation 57 3.3.3. Building the container image and registration in the repository 59 3.3.4. Configuring a numeric model execution cluster 64 3.4. Results and Discussion 74 3.4.1. Reproducibility 74 3.4.2. Portability and Performance 76 3.5. Conclusions 81 4. Generative models for the prediction of ocean temperature profile 84 4.1. Introduction 84 4.2. Materials and Methods 87 4.2.1. Model domain and datasets for predicting the subsurface temperature 87 4.2.2. Model architecture for predicting the subsurface temperature 90 4.2.3. Neural network generative models 91 4.2.4. Prediction Models 97 4.2.5. Accuracy 103 4.3. Results and Discussion 104 4.3.1. Data Generation 104 4.3.2. Ensemble Prediction 109 4.3.3. Limitations of this study and future works 111 4.4. Conclusion 111 5. Summary and conclusion 114 6. References 118 7. Abstract (in Korean) 140๋ฐ•
    • โ€ฆ
    corecore