Search CORE

2,834 research outputs found

Survey and Analysis of Production Distributed Computing Infrastructures

Author: Jha Shantenu
Katz Daniel S.
Parashar Manish
Rana Omer
Weissman Jon
Publication venue
Publication date: 13/08/2012
Field of study

This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

arXiv.org e-Print Archive

FigShare

High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

Author: Asai Makoto
Bauerdick Lothar
Borgland Anders
Calafiura Paolo
Dart Eli
Elmer Peter
Finkel Hal
Gottlieb Steve
Gutsche Oliver
Habib Salman
Hoeche Stefan
Izubuchi Taku
Kirby Michael
LeCompte Tom
Lyon Adam
Marshall Zach
Nugent Peter
Patton Simon
Petravick Don
Potekhin Maxim
Roser Robert
Sheldon Paul
Vay Jean-Luc
Viren Brett
Yanny Brian
Publication venue
Publication date: 28/10/2015
Field of study

Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

arXiv.org e-Print Archive

eScholarship - University of California

Resource provisioning in Science Clouds: Requirements and challenges

Author: Afgan
Antcheva
Armbrust
Beloglazov
Birkenheuer
Blomer
Calheiros
Campos Plasencia
Chauhan
Chen
Corradi
Expósito
Fernández Albor
Gunarathne
Hardt
Ismail
Jung
Juve
Kune
Lee
Manvi
McNab
Mell
Michelotto
Mogul
Montero
Oesterle
Oliveira
Orgerie
Ostermann
Rehr
Rodriguez
Rodríguez-Marrero
Shamsi
Smanchat
Somasundaram
Sotomayor
Srirama
Szabo
Tan
Tchernykh
Walker
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

Cloud computing has permeated into the information technology industry in the last few years, and it is emerging nowadays in scientific environments. Science user communities are demanding a broad range of computing power to satisfy the needs of high-performance applications, such as local clusters, high-performance computing systems, and computing grids. Different workloads are needed from different computational models, and the cloud is already considered as a promising paradigm. The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads, hence, their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service provider supporting scientific applications

arXiv.org e-Print Archive

Crossref

Digital.CSIC

ASCR/HEP Exascale Requirements Review Report

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Nuclear Physics Network Requirements Review Report

Author: Brown Benjamin
Dart Eli
Rai Gulshan
Rotman Lauren
Wefel Paul
Zurawski jason
Publication venue: eScholarship, University of California
Publication date: 05/05/2020
Field of study

eScholarship - University of California

Distributed-based massive processing of activity logs for efficient user modeling in a Virtual Campus

Author: Caballé Llobet Santiago
Xhafa Xhafa Fatos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This paper reports on a multi-fold approach for the building of user models based on the identification of navigation patterns in a virtual campus, allowing for adapting the campus’ usability to the actual learners’ needs, thus resulting in a great stimulation of the learning experience. However, user modeling in this context implies a constant processing and analysis of user interaction data during long-term learning activities, which produces huge amounts of valuable data stored typically in server log files. Due to the large or very large size of log files generated daily, the massive processing is a foremost step in extracting useful information. To this end, this work studies, first, the viability of processing large log data files of a real Virtual Campus using different distributed infrastructures. More precisely, we study the time performance of massive processing of daily log files implemented following the master-slave paradigm and evaluated using Cluster Computing and PlanetLab platforms. The study reveals the complexity and challenges of massive processing in the big data era, such as the need to carefully tune the log file processing in terms of chunk log data size to be processed at slave nodes as well as the bottleneck in processing in truly geographically distributed infrastructures due to the overhead caused by the communication time among the master and slave nodes. Then, an application of the massive processing approach resulting in log data processed and stored in a well-structured format is presented. We show how to extract knowledge from the log data analysis by using the WEKA framework for data mining purposes showing its usefulness to effectively build user models in terms of identifying interesting navigation patters of on-line learners. The study is motivated and conducted in the context of the actual data logs of the Virtual Campus of the Open University of Catalonia.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The Oberta in open access

Fog Computing

Author: Gonçalves Nuno Tiago Melo
Publication venue
Publication date: 01/01/2019
Field of study

Everything that is not a computer, in the traditional sense, is being connected to the Internet. These devices are also referred to as the Internet of Things and they are pressuring the current network infrastructure. Not all devices are intensive data producers and part of them can be used beyond their original intent by sharing their computational resources. The combination of those two factors can be used either to perform insight over the data closer where is originated or extend into new services by making available computational resources, but not exclusively, at the edge of the network. Fog computing is a new computational paradigm that provides those devices a new form of cloud at a closer distance where IoT and other devices with connectivity capabilities can offload computation. In this dissertation, we have explored the fog computing paradigm, and also comparing with other paradigms, namely cloud, and edge computing. Then, we propose a novel architecture that can be used to form or be part of this new paradigm. The implementation was tested on two types of applications. The first application had the main objective of demonstrating the correctness of the implementation while the other application, had the goal of validating the characteristics of fog computing.Tudo o que não é um computador, no sentido tradicional, está sendo conectado à Internet. Esses dispositivos também são chamados de Internet das Coisas e estão pressionando a infraestrutura de rede atual. Nem todos os dispositivos são produtores intensivos de dados e parte deles pode ser usada além de sua intenção original, compartilhando seus recursos computacionais. A combinação desses dois fatores pode ser usada para realizar processamento dos dados mais próximos de onde são originados ou estender para a criação de novos serviços, disponibilizando recursos computacionais periféricos à rede. Fog computing é um novo paradigma computacional que fornece a esses dispositivos uma nova forma de nuvem a uma distância mais próxima, onde “Things” e outros dispositivos com recursos de conectividade possam delegar processamento. Nesta dissertação, exploramos fog computing e também comparamos com outros paradigmas, nomeadamente cloud e edge computing. Em seguida, propomos uma nova arquitetura que pode ser usada para formar ou fazer parte desse novo paradigma. A implementação foi testada em dois tipos de aplicativos. A primeira aplicação teve o objetivo principal de demonstrar a correção da implementação, enquanto a outra aplicação, teve como objetivo validar as características de fog computing

Repositório Científico do Instituto Politécnico do Porto

Real-Time Localization Using Software Defined Radio

Author: Alyafawi Islam Fayez Abd
Publication venue: Universität Bern
Publication date: 25/08/2015
Field of study

Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devices’ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devices’ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system

BORIS Theses

Bern Open Repository and Information System (BORIS)

클라우드 컴퓨팅 환경기반에서 수치 모델링과 머신러닝을 통한 지구과학 자료생성에 관한 연구

Author: 정광욱
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 지구환경과학부, 2022. 8. 조양기.To investigate changes and phenomena on Earth, many scientists use high-resolution-model results based on numerical models or develop and utilize machine learning-based prediction models with observed data. As information technology advances, there is a need for a practical methodology for generating local and global high-resolution numerical modeling and machine learning-based earth science data. This study recommends data generation and processing using high-resolution numerical models of earth science and machine learning-based prediction models in a cloud environment. To verify the reproducibility and portability of high-resolution numerical ocean model implementation on cloud computing, I simulated and analyzed the performance of a numerical ocean model at various resolutions in the model domain, including the Northwest Pacific Ocean, the East Sea, and the Yellow Sea. With the containerization method, it was possible to respond to changes in various infrastructure environments and achieve computational reproducibility effectively. The data augmentation of subsurface temperature data was performed using generative models to prepare large datasets for model training to predict the vertical temperature distribution in the ocean. To train the prediction model, data augmentation was performed using a generative model for observed data that is relatively insufficient compared to satellite dataset. In addition to observation data, HYCOM datasets were used for performance comparison, and the data distribution of augmented data was similar to the input data distribution. The ensemble method, which combines stand-alone predictive models, improved the performance of the predictive model compared to that of the model based on the existing observed data. Large amounts of computational resources were required for data synthesis, and the synthesis was performed in a cloud-based graphics processing unit environment. High-resolution numerical ocean model simulation, predictive model development, and the data generation method can improve predictive capabilities in the field of ocean science. The numerical modeling and generative models based on cloud computing used in this study can be broadly applied to various fields of earth science.지구의 변화와 현상을 연구하기 위해 많은 과학자들은 수치 모델을 기반으로 한 고해상도 모델 결과를 사용하거나 관측된 데이터로 머신러닝 기반 예측 모델을 개발하고 활용한다. 정보기술이 발전함에 따라 지역 및 전 지구적인 고해상도 수치 모델링과 머신러닝 기반 지구과학 데이터 생성을 위한 실용적인 방법론이 필요하다. 본 연구는 지구과학의 고해상도 수치 모델과 머신러닝 기반 예측 모델을 기반으로 한 데이터 생성 및 처리가 클라우드 환경에서 효과적으로 구현될 수 있음을 제안한다. 클라우드 컴퓨팅에서 고해상도 수치 해양 모델 구현의 재현성과 이식성을 검증하기 위해 북서태평양, 동해, 황해 등 모델 영역의 다양한 해상도에서 수치 해양 모델의 성능을 시뮬레이션하고 분석하였다. 컨테이너화 방식을 통해 다양한 인프라 환경 변화에 대응하고 계산 재현성을 효과적으로 확보할 수 있었다. 머신러닝 기반 데이터 생성의 적용을 검증하기 위해 생성 모델을 이용한 표층 이하 온도 데이터의 데이터 증강을 실행하여 해양의 수직 온도 분포를 예측하는 모델 훈련을 위한 대용량 데이터 세트를 준비했다. 예측모델 훈련을 위해 위성 데이터에 비해 상대적으로 부족한 관측 데이터에 대해서 생성 모델을 사용하여 데이터 증강을 수행하였다. 모델의 예측성능 비교에는 관측 데이터 외에도 HYCOM 데이터 세트를 사용하였으며, 증강 데이터의 데이터 분포는 입력 데이터 분포와 유사함을 확인하였다. 독립형 예측 모델을 결합한 앙상블 방식은 기존 관측 데이터를 기반으로 하는 예측 모델의 성능에 비해 향상되었다. 데이터합성을 위해 많은 양의 계산 자원이 필요했으며, 데이터 합성은 클라우드 기반 GPU 환경에서 수행되었다. 고해상도 수치 해양 모델 시뮬레이션, 예측 모델 개발, 데이터 생성 방법은 해양 과학 분야에서 예측 능력을 향상시킬 수 있다. 본 연구에서 사용된 클라우드 컴퓨팅 기반의 수치 모델링 및 생성 모델은 지구 과학의 다양한 분야에 광범위하게 적용될 수 있다.1. General Introduction 1 2. Performance of numerical ocean modeling on cloud computing 6 2.1. Introduction 6 2.2. Cloud Computing 9 2.2.1. Cloud computing overview 9 2.2.2. Commercial cloud computing services 12 2.3. Numerical model for performance analysis of commercial clouds 15 2.3.1. High Performance Linpack Benchmark 15 2.3.2. Benchmark Sustainable Memory Bandwidth and Memory Latency 16 2.3.3. Numerical Ocean Model 16 2.3.4. Deployment of Numerical Ocean Model and Benchmark Packages on Cloud Clusters 19 2.4. Simulation results 21 2.4.1. Benchmark simulation 21 2.4.2. Ocean model simulation 24 2.5. Analysis of ROMS performance on commercial clouds 26 2.5.1. Performance of ROMS according to H/W resources 26 2.5.2. Performance of ROMS according to grid size 34 2.6. Summary 41 3. Reproducibility of numerical ocean model on the cloud computing 44 3.1. Introduction 44 3.2. Containerization of numerical ocean model 47 3.2.1. Container virtualization 47 3.2.2. Container-based architecture for HPC 49 3.2.3. Container-based architecture for hybrid cloud 53 3.3. Materials and Methods 55 3.3.1. Comparison of traditional and container based HPC cluster workflows 55 3.3.2. Model domain and datasets for numerical simulation 57 3.3.3. Building the container image and registration in the repository 59 3.3.4. Configuring a numeric model execution cluster 64 3.4. Results and Discussion 74 3.4.1. Reproducibility 74 3.4.2. Portability and Performance 76 3.5. Conclusions 81 4. Generative models for the prediction of ocean temperature profile 84 4.1. Introduction 84 4.2. Materials and Methods 87 4.2.1. Model domain and datasets for predicting the subsurface temperature 87 4.2.2. Model architecture for predicting the subsurface temperature 90 4.2.3. Neural network generative models 91 4.2.4. Prediction Models 97 4.2.5. Accuracy 103 4.3. Results and Discussion 104 4.3.1. Data Generation 104 4.3.2. Ensemble Prediction 109 4.3.3. Limitations of this study and future works 111 4.4. Conclusion 111 5. Summary and conclusion 114 6. References 118 7. Abstract (in Korean) 140박

SNU Open Repository and Archive