Search CORE

170 research outputs found

Achieving Reproducibility in Cloud Benchmarking: A Focus on FaaS Services

Author: NURSULTAN SAADAT
Publication venue
Publication date: 26/09/2023
Field of study

openThe cloud computing industry has witnessed a rapid growth in recent years, providing businesses with an opportunity to scale their operations dynamically. With the emergence of multiple cloud providers, it has become increasingly challenging to determine which provider offers the most scalable services for a particular workload. This master thesis aims to compare the scalability of three major cloud providers: Amazon Web Services (AWS), Google Cloud, and Microsoft Azure. The study focuses on benchmarking the scalability of their compute, storage, and database services. To achieve this, a set of well-defined benchmarks will be used to evaluate the performance of each provider. The benchmarks will be designed to simulate a range of workloads, from small to large-scale, to assess how each provider's services perform when under different load conditions. The results will be analyzed and compared to identify the strengths and weaknesses of each provider's services. This study will provide valuable insights into which cloud provider offers the most scalable services, and will help businesses make informed decisions when choosing a cloud provider for their specific needs. The findings of this study will contribute to the ongoing discussion on the performance of cloud services, and will offer guidance to businesses on selecting the most appropriate cloud provider to meet their scalability requirements.The cloud computing industry has witnessed a rapid growth in recent years, providing businesses with an opportunity to scale their operations dynamically. With the emergence of multiple cloud providers, it has become increasingly challenging to determine which provider offers the most scalable services for a particular workload. This master thesis aims to compare the scalability of three major cloud providers: Amazon Web Services (AWS), Google Cloud, and Microsoft Azure. The study focuses on benchmarking the scalability of their compute, storage, and database services. To achieve this, a set of well-defined benchmarks will be used to evaluate the performance of each provider. The benchmarks will be designed to simulate a range of workloads, from small to large-scale, to assess how each provider's services perform when under different load conditions. The results will be analyzed and compared to identify the strengths and weaknesses of each provider's services. This study will provide valuable insights into which cloud provider offers the most scalable services, and will help businesses make informed decisions when choosing a cloud provider for their specific needs. The findings of this study will contribute to the ongoing discussion on the performance of cloud services, and will offer guidance to businesses on selecting the most appropriate cloud provider to meet their scalability requirements

Padua Thesis and Dissertation Archive

Metaheuristics “In the Large”

Author: Addis
Adriaensen
Agarwal
Ahmed
Altunay
Applegate
Asta
Asta
Bartz-Beielstein
Battiti
Bengio
Birattari
Bleuler
Boussemart
Burke
Burke
Cahon
Camacho-Villalón
Chakhlevitch
Cloete
Collberg
Consoli
Cowling
Cox
Dantzig
De Beukelaer
Di Gaspero
Drake
Durillo
Foster
Foster
Fuellerer
García-Nieto
García-Sánchez
Glover
Goh
Guervós
Hackney
Hammond
Hansen
Hooker
Hoos
Hughes
Hunt
Imade
Johnson
Kendall
Khalloof
Kheiri
Kheiri
Khichane
Kirkpatrick
Kocsis
Kocsis
Koza
König
Lim
Lukasiewycz
Luke
López-Ibáñez
López-Ibáñez
Malan
Manna
Marmion
Martin
Merelo Guervós
Merelo-Guervós
Miranda
Munawar
Nagata
Nallaperuma
Nallaperuma
Neumann
Nikzad
Pamparà
Pamparà
Pamparà
Pappa
Parejo
Parejo
Parejo
Parkes
Peer
Pellerin
Popper
Puchinger
Qu
Raidl
Rice
Rosenberg
Ross
Rotem-Gal-Oz
Scheibenpflug
Senington
Smith-Miles
Song
Song
Soria-Alcaraz
Stützle
Sutter
Swan
Swan
Swan
Sörensen
Sörensen
SörensenKenneth and Arnold Florian and Palhazi Cuervo, Daniel
Taillard
Taylor
Thabtah
Valipour
Wagner
Wagner
Weyland
Wolpert
Woodward
Xu
Publication venue: 'Elsevier BV'
Publication date: 03/06/2021
Field of study

Many people have generously given their time to the various activities of the MitL initiative. Particular gratitude is due to Adam Barwell, John A. Clark, Patrick De Causmaecker, Emma Hart, Zoltan A. Kocsis, Ben Kovitz, Krzysztof Krawiec, John McCall, Nelishia Pillay, Kevin Sim, Jim Smith, Thomas Stutzle, Eric Taillard and Stefan Wagner. J. Swan acknowledges the support of UK EPSRC grant EP/J017515/1 and the EU H2020 SAFIRE Factories project. P. GarciaSanchez and J. J. Merelo acknowledges the support of TIN201785727-C4-2-P by the Spanish Ministry of Economy and Competitiveness. M. Wagner acknowledges the support of the Australian Research Council grants DE160100850 and DP200102364.Following decades of sustained improvement, metaheuristics are one of the great success stories of opti- mization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to sup- port the development, analysis and comparison of new approaches. To this end, we present the vision and progress of the Metaheuristics “In the Large”project. The conceptual underpinnings of the project are: truly extensible algorithm templates that support reuse without modification, white box problem descriptions that provide generic support for the injection of domain specific knowledge, and remotely accessible frameworks, components and problems that will enhance reproducibility and accelerate the field’s progress. We argue that, via such principled choice of infrastructure support, the field can pur- sue a higher level of scientific enquiry. We describe our vision and report on progress, showing how the adoption of common protocols for all metaheuristics can help liberate the potential of the field, easing the exploration of the design space of metaheuristics.UK Research & Innovation (UKRI)Engineering & Physical Sciences Research Council (EPSRC) EP/J017515/1EU H2020 SAFIRE Factories projectSpanish Ministry of Economy and Competitiveness TIN201785727-C4-2-PAustralian Research Council DE160100850 DP20010236

arXiv.org e-Print Archive

Crossref

Stirling Online Research Repository (RIOXX)

Repository@Nottingham

Adelaide Research & Scholarship

Repositorio Institucional Universidad de Granada

Institutional Repository Universiteit Antwerpen

Stirling Online Research Repository

Lancaster E-Prints

Experimenting on Architectures for High Performance Computing

Author: Nussbaum Lucas
Publication venue: HAL CCSD
Publication date: 06/03/2017
Field of study

National audienceOverview of HPC architectures, of challenges of reproducible research, and of the Grid'5000 testbe

INRIA a CCSD electronic archive server

Improving the accuracy of spoofed traffic inference in inter-domain traffic

Author: Müller Lucas Fernando
Publication venue
Publication date: 01/01/2020
Field of study

Ascertaining that a network will forward spoofed traffic usually requires an active probing vantage point in that network, effectively preventing a comprehensive view of this global Internet vulnerability. We argue that broader visibility into the spoofing problem may lie in the capability to infer lack of Source Address Validation (SAV) compliance from large, heavily aggregated Internet traffic data, such as traffic observable at Internet Exchange Points (IXPs). The key idea is to use IXPs as observatories to detect spoofed packets, by leveraging Autonomous System (AS) topology knowledge extracted from Border Gateway Protocol (BGP) data to infer which source addresses should legitimately appear across parts of the IXP switch fabric. In this thesis, we demonstrate that the existing literature does not capture several fundamental challenges to this approach, including noise in BGP data sources, heuristic AS relationship inference, and idiosyncrasies in IXP interconnec- tivity fabrics. We propose Spoofer-IX, a novel methodology to navigate these challenges, leveraging Customer Cone semantics of AS relationships to guide precise classification of inter-domain traffic as In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, and Unas- signed. We apply our methodology on extensive data analysis using real traffic data from two distinct IXPs in Brazil, a mid-size and a large-size infrastructure. In the mid-size IXP with more than 200 members, we find an upper bound volume of Out-of-cone traffic to be more than an order of magnitude less than the previous method inferred on the same data, revealing the practical importance of Customer Cone semantics in such analysis. We also found no significant improvement in deployment of SAV in networks using the mid-size IXP between 2017 and 2019. In hopes that our methods and tools generalize to use by other IXPs who want to avoid use of their infrastructure for launching spoofed-source DoS attacks, we explore the feasibility of scaling the system to larger and more diverse IXP infrastructures. To promote this goal, and broad replicability of our results, we make the source code of Spoofer-IX publicly available. This thesis illustrates the subtleties of scientific assessments of operational Internet infrastructure, and the need for a community focus on reproducing and repeating previous methods.A constatação de que uma rede encaminhará tráfego falsificado geralmente requer um ponto de vantagem ativo de medição nessa rede, impedindo efetivamente uma visão abrangente dessa vulnerabilidade global da Internet. Isto posto, argumentamos que uma visibilidade mais ampla do problema de spoofing pode estar na capacidade de inferir a falta de conformidade com as práticas de Source Address Validation (SAV) a partir de dados de tráfego da Internet altamente agregados, como o tráfego observável nos Internet Exchange Points (IXPs). A ideia chave é usar IXPs como observatórios para detectar pacotes falsificados, aproveitando o conhecimento da topologia de sistemas autônomos extraído dos dados do protocolo BGP para inferir quais endereços de origem devem aparecer legitimamente nas comunicações através da infra-estrutura de um IXP. Nesta tese, demonstramos que a literatura existente não captura diversos desafios fundamentais para essa abordagem, incluindo ruído em fontes de dados BGP, inferência heurística de relacionamento de sistemas autônomos e características específicas de interconectividade nas infraestruturas de IXPs. Propomos o Spoofer-IX, uma nova metodologia para superar esses desafios, utilizando a semântica do Customer Cone de relacionamento de sistemas autônomos para guiar com precisão a classificação de tráfego inter-domínio como In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, e Unassigned. Aplicamos nossa metodologia em análises extensivas sobre dados reais de tráfego de dois IXPs distintos no Brasil, uma infraestrutura de médio porte e outra de grande porte. No IXP de tamanho médio, com mais de 200 membros, encontramos um limite superior do volume de tráfego Out-of-cone uma ordem de magnitude menor que o método anterior inferiu sob os mesmos dados, revelando a importância prática da semântica do Customer Cone em tal análise. Além disso, não encontramos melhorias significativas na implantação do Source Address Validation (SAV) em redes usando o IXP de tamanho médio entre 2017 e 2019. Na esperança de que nossos métodos e ferramentas sejam aplicáveis para uso por outros IXPs que desejam evitar o uso de sua infraestrutura para iniciar ataques de negação de serviço através de pacotes de origem falsificada, exploramos a viabilidade de escalar o sistema para infraestruturas IXP maiores e mais diversas. Para promover esse objetivo e a ampla replicabilidade de nossos resultados, disponibilizamos publicamente o código fonte do Spoofer-IX. Esta tese ilustra as sutilezas das avaliações científicas da infraestrutura operacional da Internet e a necessidade de um foco da comunidade na reprodução e repetição de métodos anteriores

Lume 5.8

The Creation, Validation, and Application of Synthetic Power Grids

Author: Birchfield Adam Barlow
Publication venue
Publication date: 23/01/2019
Field of study

Public test cases representing large electric power systems at a high level of fidelity and quality are few to non-existent, despite the potential value such cases would have to the power systems research community. Legitimate concern for the security of large, high-voltage power grids has led to tight restrictions on accessing actual critical infrastructure data. To encourage and support innovation, synthetic electric grids are fictional, designed systems that mimic the complexity of actual electric grids but contain no confidential information. Synthetic grid design is driven by the requirement to match wide variety of metrics derived from statistics of actual grids. The creation approach presented here is a four-stage process which mimics actual power system planning. First, substations are geo-located and internally configured from seed public data on generators and population. The substation placement uses a modified hierarchical clustering to match a realistic distribution of load and generation substations, and the same technique is also used to assign nominal voltage levels to the substations. With buses and transformers built, the next stage constructs a network of transmission lines at each nominal voltage level to connect the synthetic substations with a transmission grid. The transmission planning stage uses a heuristic inspired by simulated annealing to balance the objectives associated with both geographic constraints and contingency reliability, using a linearized dc power flow sensitivity. In order to scale these systems to tens of thousands of buses, robust reactive power planning is needed as a third stage, accounting for power flow convergence issues. The iterative algorithm presented here supplements a synthetic transmission network that has been validated by a dc power flow with a realistic set of voltage control devices to meet a specified voltage profile, even with the constraints of difficult power flow convergence for large systems. Validation of the created synthetic grids is crucial to establishing their legitimacy for engineering research. The statistical analysis presented in this dissertation is based on actual grid data obtained from the three major North American interconnects. Metrics are defined and examined for system proportions and structure, element parameters, and complex network graph theory properties. Several example synthetic grids are shown as examples in this dissertation, up to 100,000 buses. These datasets are available online. The final part of this dissertation discusses these specific grid examples and extensions associated with synthetic grids, in applying them to geomagnetic disturbances, visualization, and engineering education

Texas A&M Repository

Recommended from our members

HEDCOS: High Efficiency Dynamic Combinatorial Optimization System using Ant Colony Optimization algorithm

Author: Skackauskas Jonas
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonDynamic combinatorial optimization is gaining popularity among industrial practitioners due to the ever-increasing scale of their optimization problems and efforts to solve them to remain competitive. Larger optimization problems are not only more computationally intense to optimize but also have more uncertainty within problem inputs. If some aspects of the problem are subject to dynamic change, it becomes a Dynamic Optimization Problem (DOP). In this thesis, a High Efficiency Dynamic Combinatorial Optimization System is built to solve challenging DOPs with high-quality solutions. The system is created using Ant Colony Optimization (ACO) baseline algorithm with three novel developments. First, introduced an extension method for ACO algorithm called Dynamic Impact. Dynamic Impact is designed to improve convergence and solution quality by solving challenging optimization problems with a non-linear relationship between resource consumption and fitness. This proposed method is tested against the real-world Microchip Manufacturing Plant Production Floor Optimization (MMPPFO) problem and the theoretical benchmark Multidimensional Knapsack Problem (MKP). Second, a non-stochastic dataset generation method was introduced to solve the dynamic optimization research replicability problem. This method uses a static benchmark dataset as a starting point and source of entropy to generate a sequence of dynamic states. Then using this method, 1405 Dynamic Multidimensional Knapsack Problem (DMKP) benchmark datasets were generated and published using famous static MKP benchmark instances as the initial state. Third, introduced a nature-inspired discrete dynamic optimization strategy for ACO by modelling real-world ants’ symbiotic relationship with aphids. ACO with Aphids strategy is designed to solve discrete domain DOPs with event-triggered discrete dynamism. The strategy improved inter-state convergence by allowing better solution recovery after dynamic environment changes. Aphids mediate the information from previous dynamic optimization states to maximize initial results performance and minimize the impact on convergence speed. This strategy is tested for DMKP and against identical ACO implementations using Full-Restart and Pheromone-Sharing strategies, with all other variables isolated. Overall, Dynamic Impact and ACO with Aphids developments are compounding. Using Dynamic Impact on single objective optimization of MMPPFO, the fitness value was improved by 33.2% over the ACO algorithm without Dynamic Impact. MKP benchmark instances of low complexity have been solved to a 100% success rate even when a high degree of solution sparseness is observed, and large complexity instances have shown the average gap improved by 4.26 times. ACO with Aphids has also demonstrated superior performance over the Pheromone-Sharing strategy in every test on average gap reduced by 29.2% for a total compounded dynamic optimization performance improvement of 6.02 times. Also, ACO with Aphids has outperformed the Full-Restart strategy for large datasets groups, and the overall average gap is reduced by 52.5% for a total compounded dynamic optimization performance improvement of 8.99 times

Brunel University Research Archive

A flexible approach to the estimation of water budgets and its connection to the travel time theory.

Author: Bancheri Marialaura
Publication venue
Publication date: 31/08/2017
Field of study

The increasing impacts of climate changes on water related sectors are leading the scientists' attentions to the development of comprehensive models, allowing better descriptions of the water and solute transport processes. "Getting the right answers for the right reasons", in terms of hydrological response, is one of the main goals of most of the recent literature. Semi-distributed hydrological models, based on the partition of basins in hydrological response units (HRUs) to be connected, eventually, to describe a whole catchment, proved to be robust in the reproduction of observed catchment dynamics. 'Embedded reservoirs' are often used for each HRU, to allow a consistent representation of the processes. In this work, a new semi-disitrbuted model for runoff and evapotranspiration is presented: five different reservoirs are inter-connected in order to capture the dynamics of snow, canopy, surface flow, root-zone and groundwater compartments. The knowledge of the mass of water and solute stored and released through different outputs (e.g. discharge, evapotranspiration) allows the analysis of the hydrological travel times and solute transport in catchments. The latter have been studied extensively, with some recent benchmark contributions in the last decade. However, the literature remains obscured by different terminologies and notations, as well as model assumptions are not fully explained. The thesis presents a detailed description of a new theoretical approach that reworks the theory from the point of view of the hydrological storages and fluxes involved. Major aspects of the new theory are the 'age-ranked' definition of the hydrological variables, the explicit treatment of evaporative fluxes and of their influence on the transport, the analysis of the outflows partitioning coefficients and the explicit formulation of the 'age-ranked' equations for solutes. Moreover, the work presents concepts in a new systematic and clarified way, helping the application of the theory. To give substance to the theory, a small catchment in the prealpine area was chosen as an example and the results illustrated. The rainfall-runoff model and the travel time theory were implemented and integrated in the semi-distributed hydrological system JGrass-NewAge. Thanks to the environmental modelling framework OMS3, each part of the hydrological cycle is implemented as a component that can be selected, adopted, and connected at run-time to obtain a user-customized hydrological model. The system is flexible, expandable and applicable in a variety of modelling solutions. In this work, the model code underwent to an extensive revision: new components were added (coupled storages water budget, travel times components); old components were enhanced (Kriging, shortwave, longwave, evapotranspiration, rain-snow separation, SWE and melting components); documentation was standardized and deployed. Since the Thesis regards in wide sense the building of a collaborative system, a discussion of some general purpose tools that were implemented or improved for supporting the present research is also presented. They include the description and the verification of a software component dealing with the long-wave radiation budget and another component dealing with an implementation of some Kriging procedure

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Unitn-eprints PhD

RepeatFS: A File System Providing Reproducibility Through Provenance and Automation

Author: Westbrook Anthony Stephen
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/09/2021
Field of study

Reproducibility is of central importance to the scientific process. The difficulty of consistently replicating and verifying experimental results is magnified in the era of big data, in which computational analysis often involves complex multi-application pipelines operating on terabytes of data. These processes result in thousands of possible permutations of data preparation steps, software versions, and command-line arguments. Existing reproducibility frameworks are cumbersome and involve redesigning computational methods. To address these issues, we developed two conceptual models and implemented them through RepeatFS, a file system that records, replicates, and verifies computational workflows with no alteration to the original methods. RepeatFS also provides provenance visualization and task automation. We used RepeatFS to successfully visualize and replicate a variety of bioinformatics tasks consisting of over a million operations with no alteration to the original methods. RepeatFS correctly identified all software inconsistencies that resulted in replication differences

UNH Scholars' Repository

Rethinking the Delivery Architecture of Data-Intensive Visualization

Author: Raji Mohammad
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 15/08/2019
Field of study

The web has transformed the way people create and consume information. However, data-intensive science applications have rarely been able to take full benefits of the web ecosystem so far. Analysis and visualization have remained close to large datasets on large servers and desktops, because of the vast resources that data-intensive applications require. This hampers the accessibility and on-demand availability of data-intensive science. In this work, I propose a novel architecture for the delivery of interactive, data-intensive visualization to the web ecosystem. The proposed architecture, codenamed Fabric, follows the idea of keeping the server-side oblivious of application logic as a set of scalable microservices that 1) manage data and 2) compute data products. Disconnected from application logic, the services allow interactive data-intensive visualization be simultaneously accessible to many users. Meanwhile, the client-side of this architecture perceives visualization applications as an interaction-in image-out black box with the sole responsibility of keeping track of application state and mapping interactions into well-defined and structured visualization requests. Fabric essentially provides a separation of concern that decouples the otherwise tightly coupled client and server seen in traditional data applications. Initial results show that as a result of this, Fabric enables high scalability of audience, scientific reproducibility, and improves control and protection of data products

University of Tennessee, Knoxville: Trace