170 research outputs found

    Achieving Reproducibility in Cloud Benchmarking: A Focus on FaaS Services

    Get PDF
    openThe cloud computing industry has witnessed a rapid growth in recent years, providing businesses with an opportunity to scale their operations dynamically. With the emergence of multiple cloud providers, it has become increasingly challenging to determine which provider offers the most scalable services for a particular workload. This master thesis aims to compare the scalability of three major cloud providers: Amazon Web Services (AWS), Google Cloud, and Microsoft Azure. The study focuses on benchmarking the scalability of their compute, storage, and database services. To achieve this, a set of well-defined benchmarks will be used to evaluate the performance of each provider. The benchmarks will be designed to simulate a range of workloads, from small to large-scale, to assess how each provider's services perform when under different load conditions. The results will be analyzed and compared to identify the strengths and weaknesses of each provider's services. This study will provide valuable insights into which cloud provider offers the most scalable services, and will help businesses make informed decisions when choosing a cloud provider for their specific needs. The findings of this study will contribute to the ongoing discussion on the performance of cloud services, and will offer guidance to businesses on selecting the most appropriate cloud provider to meet their scalability requirements.The cloud computing industry has witnessed a rapid growth in recent years, providing businesses with an opportunity to scale their operations dynamically. With the emergence of multiple cloud providers, it has become increasingly challenging to determine which provider offers the most scalable services for a particular workload. This master thesis aims to compare the scalability of three major cloud providers: Amazon Web Services (AWS), Google Cloud, and Microsoft Azure. The study focuses on benchmarking the scalability of their compute, storage, and database services. To achieve this, a set of well-defined benchmarks will be used to evaluate the performance of each provider. The benchmarks will be designed to simulate a range of workloads, from small to large-scale, to assess how each provider's services perform when under different load conditions. The results will be analyzed and compared to identify the strengths and weaknesses of each provider's services. This study will provide valuable insights into which cloud provider offers the most scalable services, and will help businesses make informed decisions when choosing a cloud provider for their specific needs. The findings of this study will contribute to the ongoing discussion on the performance of cloud services, and will offer guidance to businesses on selecting the most appropriate cloud provider to meet their scalability requirements

    Metaheuristics “In the Large”

    Get PDF
    Many people have generously given their time to the various activities of the MitL initiative. Particular gratitude is due to Adam Barwell, John A. Clark, Patrick De Causmaecker, Emma Hart, Zoltan A. Kocsis, Ben Kovitz, Krzysztof Krawiec, John McCall, Nelishia Pillay, Kevin Sim, Jim Smith, Thomas Stutzle, Eric Taillard and Stefan Wagner. J. Swan acknowledges the support of UK EPSRC grant EP/J017515/1 and the EU H2020 SAFIRE Factories project. P. GarciaSanchez and J. J. Merelo acknowledges the support of TIN201785727-C4-2-P by the Spanish Ministry of Economy and Competitiveness. M. Wagner acknowledges the support of the Australian Research Council grants DE160100850 and DP200102364.Following decades of sustained improvement, metaheuristics are one of the great success stories of opti- mization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to sup- port the development, analysis and comparison of new approaches. To this end, we present the vision and progress of the Metaheuristics “In the Large”project. The conceptual underpinnings of the project are: truly extensible algorithm templates that support reuse without modification, white box problem descriptions that provide generic support for the injection of domain specific knowledge, and remotely accessible frameworks, components and problems that will enhance reproducibility and accelerate the field’s progress. We argue that, via such principled choice of infrastructure support, the field can pur- sue a higher level of scientific enquiry. We describe our vision and report on progress, showing how the adoption of common protocols for all metaheuristics can help liberate the potential of the field, easing the exploration of the design space of metaheuristics.UK Research & Innovation (UKRI)Engineering & Physical Sciences Research Council (EPSRC) EP/J017515/1EU H2020 SAFIRE Factories projectSpanish Ministry of Economy and Competitiveness TIN201785727-C4-2-PAustralian Research Council DE160100850 DP20010236

    Experimenting on Architectures for High Performance Computing

    Get PDF
    National audienceOverview of HPC architectures, of challenges of reproducible research, and of the Grid'5000 testbe

    Improving the accuracy of spoofed traffic inference in inter-domain traffic

    Get PDF
    Ascertaining that a network will forward spoofed traffic usually requires an active probing vantage point in that network, effectively preventing a comprehensive view of this global Internet vulnerability. We argue that broader visibility into the spoofing problem may lie in the capability to infer lack of Source Address Validation (SAV) compliance from large, heavily aggregated Internet traffic data, such as traffic observable at Internet Exchange Points (IXPs). The key idea is to use IXPs as observatories to detect spoofed packets, by leveraging Autonomous System (AS) topology knowledge extracted from Border Gateway Protocol (BGP) data to infer which source addresses should legitimately appear across parts of the IXP switch fabric. In this thesis, we demonstrate that the existing literature does not capture several fundamental challenges to this approach, including noise in BGP data sources, heuristic AS relationship inference, and idiosyncrasies in IXP interconnec- tivity fabrics. We propose Spoofer-IX, a novel methodology to navigate these challenges, leveraging Customer Cone semantics of AS relationships to guide precise classification of inter-domain traffic as In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, and Unas- signed. We apply our methodology on extensive data analysis using real traffic data from two distinct IXPs in Brazil, a mid-size and a large-size infrastructure. In the mid-size IXP with more than 200 members, we find an upper bound volume of Out-of-cone traffic to be more than an order of magnitude less than the previous method inferred on the same data, revealing the practical importance of Customer Cone semantics in such analysis. We also found no significant improvement in deployment of SAV in networks using the mid-size IXP between 2017 and 2019. In hopes that our methods and tools generalize to use by other IXPs who want to avoid use of their infrastructure for launching spoofed-source DoS attacks, we explore the feasibility of scaling the system to larger and more diverse IXP infrastructures. To promote this goal, and broad replicability of our results, we make the source code of Spoofer-IX publicly available. This thesis illustrates the subtleties of scientific assessments of operational Internet infrastructure, and the need for a community focus on reproducing and repeating previous methods.A constatação de que uma rede encaminhará tráfego falsificado geralmente requer um ponto de vantagem ativo de medição nessa rede, impedindo efetivamente uma visão abrangente dessa vulnerabilidade global da Internet. Isto posto, argumentamos que uma visibilidade mais ampla do problema de spoofing pode estar na capacidade de inferir a falta de conformidade com as práticas de Source Address Validation (SAV) a partir de dados de tráfego da Internet altamente agregados, como o tráfego observável nos Internet Exchange Points (IXPs). A ideia chave é usar IXPs como observatórios para detectar pacotes falsificados, aproveitando o conhecimento da topologia de sistemas autônomos extraído dos dados do protocolo BGP para inferir quais endereços de origem devem aparecer legitimamente nas comunicações através da infra-estrutura de um IXP. Nesta tese, demonstramos que a literatura existente não captura diversos desafios fundamentais para essa abordagem, incluindo ruído em fontes de dados BGP, inferência heurística de relacionamento de sistemas autônomos e características específicas de interconectividade nas infraestruturas de IXPs. Propomos o Spoofer-IX, uma nova metodologia para superar esses desafios, utilizando a semântica do Customer Cone de relacionamento de sistemas autônomos para guiar com precisão a classificação de tráfego inter-domínio como In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, e Unassigned. Aplicamos nossa metodologia em análises extensivas sobre dados reais de tráfego de dois IXPs distintos no Brasil, uma infraestrutura de médio porte e outra de grande porte. No IXP de tamanho médio, com mais de 200 membros, encontramos um limite superior do volume de tráfego Out-of-cone uma ordem de magnitude menor que o método anterior inferiu sob os mesmos dados, revelando a importância prática da semântica do Customer Cone em tal análise. Além disso, não encontramos melhorias significativas na implantação do Source Address Validation (SAV) em redes usando o IXP de tamanho médio entre 2017 e 2019. Na esperança de que nossos métodos e ferramentas sejam aplicáveis para uso por outros IXPs que desejam evitar o uso de sua infraestrutura para iniciar ataques de negação de serviço através de pacotes de origem falsificada, exploramos a viabilidade de escalar o sistema para infraestruturas IXP maiores e mais diversas. Para promover esse objetivo e a ampla replicabilidade de nossos resultados, disponibilizamos publicamente o código fonte do Spoofer-IX. Esta tese ilustra as sutilezas das avaliações científicas da infraestrutura operacional da Internet e a necessidade de um foco da comunidade na reprodução e repetição de métodos anteriores

    The Creation, Validation, and Application of Synthetic Power Grids

    Get PDF
    Public test cases representing large electric power systems at a high level of fidelity and quality are few to non-existent, despite the potential value such cases would have to the power systems research community. Legitimate concern for the security of large, high-voltage power grids has led to tight restrictions on accessing actual critical infrastructure data. To encourage and support innovation, synthetic electric grids are fictional, designed systems that mimic the complexity of actual electric grids but contain no confidential information. Synthetic grid design is driven by the requirement to match wide variety of metrics derived from statistics of actual grids. The creation approach presented here is a four-stage process which mimics actual power system planning. First, substations are geo-located and internally configured from seed public data on generators and population. The substation placement uses a modified hierarchical clustering to match a realistic distribution of load and generation substations, and the same technique is also used to assign nominal voltage levels to the substations. With buses and transformers built, the next stage constructs a network of transmission lines at each nominal voltage level to connect the synthetic substations with a transmission grid. The transmission planning stage uses a heuristic inspired by simulated annealing to balance the objectives associated with both geographic constraints and contingency reliability, using a linearized dc power flow sensitivity. In order to scale these systems to tens of thousands of buses, robust reactive power planning is needed as a third stage, accounting for power flow convergence issues. The iterative algorithm presented here supplements a synthetic transmission network that has been validated by a dc power flow with a realistic set of voltage control devices to meet a specified voltage profile, even with the constraints of difficult power flow convergence for large systems. Validation of the created synthetic grids is crucial to establishing their legitimacy for engineering research. The statistical analysis presented in this dissertation is based on actual grid data obtained from the three major North American interconnects. Metrics are defined and examined for system proportions and structure, element parameters, and complex network graph theory properties. Several example synthetic grids are shown as examples in this dissertation, up to 100,000 buses. These datasets are available online. The final part of this dissertation discusses these specific grid examples and extensions associated with synthetic grids, in applying them to geomagnetic disturbances, visualization, and engineering education

    A flexible approach to the estimation of water budgets and its connection to the travel time theory.

    Get PDF
    The increasing impacts of climate changes on water related sectors are leading the scientists' attentions to the development of comprehensive models, allowing better descriptions of the water and solute transport processes. "Getting the right answers for the right reasons", in terms of hydrological response, is one of the main goals of most of the recent literature. Semi-distributed hydrological models, based on the partition of basins in hydrological response units (HRUs) to be connected, eventually, to describe a whole catchment, proved to be robust in the reproduction of observed catchment dynamics. 'Embedded reservoirs' are often used for each HRU, to allow a consistent representation of the processes. In this work, a new semi-disitrbuted model for runoff and evapotranspiration is presented: five different reservoirs are inter-connected in order to capture the dynamics of snow, canopy, surface flow, root-zone and groundwater compartments. The knowledge of the mass of water and solute stored and released through different outputs (e.g. discharge, evapotranspiration) allows the analysis of the hydrological travel times and solute transport in catchments. The latter have been studied extensively, with some recent benchmark contributions in the last decade. However, the literature remains obscured by different terminologies and notations, as well as model assumptions are not fully explained. The thesis presents a detailed description of a new theoretical approach that reworks the theory from the point of view of the hydrological storages and fluxes involved. Major aspects of the new theory are the 'age-ranked' definition of the hydrological variables, the explicit treatment of evaporative fluxes and of their influence on the transport, the analysis of the outflows partitioning coefficients and the explicit formulation of the 'age-ranked' equations for solutes. Moreover, the work presents concepts in a new systematic and clarified way, helping the application of the theory. To give substance to the theory, a small catchment in the prealpine area was chosen as an example and the results illustrated. The rainfall-runoff model and the travel time theory were implemented and integrated in the semi-distributed hydrological system JGrass-NewAge. Thanks to the environmental modelling framework OMS3, each part of the hydrological cycle is implemented as a component that can be selected, adopted, and connected at run-time to obtain a user-customized hydrological model. The system is flexible, expandable and applicable in a variety of modelling solutions. In this work, the model code underwent to an extensive revision: new components were added (coupled storages water budget, travel times components); old components were enhanced (Kriging, shortwave, longwave, evapotranspiration, rain-snow separation, SWE and melting components); documentation was standardized and deployed. Since the Thesis regards in wide sense the building of a collaborative system, a discussion of some general purpose tools that were implemented or improved for supporting the present research is also presented. They include the description and the verification of a software component dealing with the long-wave radiation budget and another component dealing with an implementation of some Kriging procedure

    RepeatFS: A File System Providing Reproducibility Through Provenance and Automation

    Get PDF
    Reproducibility is of central importance to the scientific process. The difficulty of consistently replicating and verifying experimental results is magnified in the era of big data, in which computational analysis often involves complex multi-application pipelines operating on terabytes of data. These processes result in thousands of possible permutations of data preparation steps, software versions, and command-line arguments. Existing reproducibility frameworks are cumbersome and involve redesigning computational methods. To address these issues, we developed two conceptual models and implemented them through RepeatFS, a file system that records, replicates, and verifies computational workflows with no alteration to the original methods. RepeatFS also provides provenance visualization and task automation. We used RepeatFS to successfully visualize and replicate a variety of bioinformatics tasks consisting of over a million operations with no alteration to the original methods. RepeatFS correctly identified all software inconsistencies that resulted in replication differences

    Rethinking the Delivery Architecture of Data-Intensive Visualization

    Get PDF
    The web has transformed the way people create and consume information. However, data-intensive science applications have rarely been able to take full benefits of the web ecosystem so far. Analysis and visualization have remained close to large datasets on large servers and desktops, because of the vast resources that data-intensive applications require. This hampers the accessibility and on-demand availability of data-intensive science. In this work, I propose a novel architecture for the delivery of interactive, data-intensive visualization to the web ecosystem. The proposed architecture, codenamed Fabric, follows the idea of keeping the server-side oblivious of application logic as a set of scalable microservices that 1) manage data and 2) compute data products. Disconnected from application logic, the services allow interactive data-intensive visualization be simultaneously accessible to many users. Meanwhile, the client-side of this architecture perceives visualization applications as an interaction-in image-out black box with the sole responsibility of keeping track of application state and mapping interactions into well-defined and structured visualization requests. Fabric essentially provides a separation of concern that decouples the otherwise tightly coupled client and server seen in traditional data applications. Initial results show that as a result of this, Fabric enables high scalability of audience, scientific reproducibility, and improves control and protection of data products
    • …
    corecore