1,209 research outputs found

    Generating realistic scaled complex networks

    Get PDF
    Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size.Comment: 26 pages, 13 figures, extended version, a preliminary version of the paper was presented at the 5th International Workshop on Complex Networks and their Application

    Deploying AI Frameworks on Secure HPC Systems with Containers

    Full text link
    The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale complex algorithms across thousands of nodes. Unfortunately, typical data scientists are not familiar with the unique requirements and characteristics of HPC environments. They usually develop their applications with high-level scripting languages or frameworks such as TensorFlow and the installation process often requires connection to external systems to download open source software during the build. HPC environments, on the other hand, are often based on closed source applications that incorporate parallel and distributed computing API's such as MPI and OpenMP, while users have restricted administrator privileges, and face security restrictions such as not allowing access to external systems. In this paper we discuss the issues associated with the deployment of AI frameworks in a secure HPC environment and how we successfully deploy AI frameworks on SuperMUC-NG with Charliecloud.Comment: 6 pages, 2 figures, 2019 IEEE High Performance Extreme Computing Conferenc

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Large-scale Data Analysis and Deep Learning Using Distributed Cyberinfrastructures and High Performance Computing

    Get PDF
    Data in many research fields continues to grow in both size and complexity. For instance, recent technological advances have caused an increased throughput in data in various biological-related endeavors, such as DNA sequencing, molecular simulations, and medical imaging. In addition, the variance in the types of data (textual, signal, image, etc.) adds an additional complexity in analyzing the data. As such, there is a need for uniquely developed applications that cater towards the type of data. Several considerations must be made when attempting to create a tool for a particular dataset. First, we must consider the type of algorithm required for analyzing the data. Next, since the size and complexity of the data imposes high computation and memory requirements, it is important to select a proper hardware environment on which to build the application. By carefully both developing the algorithm and selecting the hardware, we can provide an effective environment in which to analyze huge amounts of highly complex data in a large-scale manner. In this dissertation, I go into detail regarding my applications using big data and deep learning techniques to analyze complex and large data. I investigate how big data frameworks, such as Hadoop, can be applied to problems such as large-scale molecular dynamics simulations. Following this, many popular deep learning frameworks are evaluated and compared to find those that suit certain hardware setups and deep learning models. Then, we explore an application of deep learning to a biomedical problem, namely ADHD diagnosis from fMRI data. Lastly, I demonstrate a framework for real-time and fine-grained vehicle detection and classification. With each of these works in this dissertation, a unique large-scale analysis algorithm or deep learning model is implemented that caters towards the problem and leverages specialized computing resources

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    Control Strategies for Improving Cloud Service Robustness

    Get PDF
    This thesis addresses challenges in increasing the robustness of cloud-deployed applications and services to unexpected events and dynamic workloads. Without precautions, hardware failures and unpredictable large traffic variations can quickly degrade the performance of an application due to mismatch between provisioned resources and capacity needs. Similarly, disasters, such as power outages and fire, are unexpected events on larger scale that threatens the integrity of the underlying infrastructure on which an application is deployed.First, the self-adaptive software concept of brownout is extended to replicated cloud applications. By monitoring the performance of each application replica, brownout is able to counteract temporary overload situations by reducing the computational complexity of jobs entering the system. To avoid existing load balancers interfering with the brownout functionality, brownout-aware load balancers are introduced. Simulation experiments show that the proposed load balancers outperform existing load balancers in providing a high quality of service to as many end users as possible. Experiments in a testbed environment further show how a replicated brownout-enabled application is able to maintain high performance during overloads as compared to its non-brownout equivalent.Next, a feedback controller for cloud autoscaling is introduced. Using a novel way of modeling the dynamics of typical cloud application, a mechanism similar to the classical Smith predictor to compensate for delays in reconfiguring resource provisioning is presented. Simulation experiments show that the feedback controller is able to achieve faster control of the response times of a cloud application as compared to a threshold-based controller.Finally, a solution for handling the trade-off between performance and disaster tolerance for geo-replicated cloud applications is introduced. An automated mechanism for differentiating application traffic and replication traffic, and dynamically managing their bandwidth allocations using an MPC controller is presented and evaluated in simulation. Comparisons with commonly used static approaches reveal that the proposed solution in overload situations provides increased flexibility in managing the trade-off between performance and data consistency

    Explorar kubernetes e devOps num contexto de IoT

    Get PDF
    Containerized solutions and container orchestration technologies have recently been of great interest to organizations as a way of accelerating both software development and delivery processes. However, adopting these is a rather complex shift that may impact an organization and teams that were already established. This is where development cultures such as DevOps emerge to ease such shift amongst teams, promoting collaboration and automation of development and deployment processes throughout. The purpose of the current dissertation is to illustrate the path that led to the use of DevOps and containerization as means to support the development and deployment of a proof of concept system, Firefighter Sync – an Internet of Things based solution applied to a firefighting monitoring scenario. The goal, besides implementing Firefighter Sync, was to propose and deploy a development and operations ecosystem based on DevOps practices to achieve a full automation pipeline for both the development and operations processes. Firefighter Sync enabled the exploration of such state-of-the-art solutions such as Kubernetes to support container-based deployment and Jenkins for a fully automated CI/CD pipeline. Firefighter Sync clearly illustrates that addressing the development of a system from a DevOps perspective from the very beginning, although it requires an accentuated learning curve due to the large range of concepts and technologies addressed throughout, has illustrated to effectively impact the development process as well as ease the solution for future evolution. A good example is the automation process pipeline, that whilst allowing an easy integration of new features within a DevOps process – implies addressing the development and operations as a whole – it abstracts specific technological concerns turning these transversals to the traditional stages from development to deployment.Soluções de contentores e orquestração de contentores têm vindo a tornar-se de grande interesse para as organizações como uma forma de acelerar os processos de desenvolvimento e entrega de software. No entanto, adotá-las é uma mudança bastante complexa que pode impactar uma organização e equipas já estabelecidas. É aqui que surgem culturas como o DevOps para facilitar essa mudança, promovendo a colaboração e a automação dos processos de desenvolvimento e deployment entre equipas. O objetivo desta dissertação é ilustrar o caminho que levou ao uso de DevOps e à conteinerização de modo a apoiar o desenvolvimento e o deployment de um sistema como prova de conceito, o Firefighter Sync – uma solução baseada na Internet das Coisas aplicada a um cenário de monitorização de combate a incêndios. Além de implementar o Firefighter Sync, o objetivo era também propor e implementar um ecossistema de desenvolvimento e operações com base nas práticas de DevOps para alcançar uma pipeline de automação completa para os processos de desenvolvimento e operações. O Firefighter Sync permitiu explorar soluções que constituem o estado da arte neste contexto, como o Kubernetes para apoiar o deployment baseado em contentores e o Jenkins para suportar a pipeline de CI/CD totalmente automatizada. O Firefighter Sync ilustra claramente que abordar o desenvolvimento de um sistema a partir da perspectiva de DevOps, embora exija uma curva de aprendizagem acentuada devido à grande variedade de conceitos e tecnologias inerentes ao longo do processo, demonstrou tornar mais eficiente o processo de desenvolvimento, bem como facilitar evolução futura. Um exemplo é a pipeline de automação, que permite uma fácil integração de novos recursos dentro de um processo de DevOps – que implica abordar o desenvolvimento e as operações como um todo – abstraindo assim preocupações tecnológicas específicas, transformando essas transversais nas fases tradicionais do desenvolvimento ao deployment.Mestrado em Engenharia Informátic
    corecore