68 research outputs found
Cloud engineering is search based software engineering too
Many of the problems posed by the migration of computation to cloud platforms can be formulated and solved using techniques associated with Search Based Software Engineering (SBSE). Much of cloud software engineering involves problems of optimisation: performance, allocation, assignment and the dynamic balancing of resources to achieve pragmatic trade-offs between many competing technical and business objectives. SBSE is concerned with the application of computational search and optimisation to solve precisely these kinds of software engineering challenges. Interest in both cloud computing and SBSE has grown rapidly in the past five years, yet there has been little work on SBSE as a means of addressing cloud computing challenges. Like many computationally demanding activities, SBSE has the potential to benefit from the cloud; ‘SBSE in the cloud’. However, this paper focuses, instead, of the ways in which SBSE can benefit cloud computing. It thus develops the theme of ‘SBSE for the cloud’, formulating cloud computing challenges in ways that can be addressed using SBSE
A Black-box Monitoring Approach to Measure Microservices Runtime Performance
Microservices changed cloud computing by moving the applications' complexity from one monolithic executable to thousands of network interactions between small components. Given the increasing deployment sizes, the architectural exploitation challenges, and the impact on data-centers' power consumption, we need to efficiently track this complexity. Within this article, we propose a black-box monitoring approach to track microservices at scale, focusing on architectural metrics, power consumption, application performance, and network performance. The proposed approach is transparent w.r.t. the monitored applications, generates less overhead w.r.t. black-box approaches available in the state-of-the-art, and provides fine-grain accurate metrics
Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges
[EN] If last decade viewed computational services as a utility then surely
this decade has transformed computation into a commodity. Computation
is now progressively integrated into the physical networks in
a seamless way that enables cyber-physical systems (CPS) and the
Internet of Things (IoT) meet their latency requirements. Similar to
the concept of ¿platform as a service¿ or ¿software as a service¿, both
cloudlets and fog computing have found their own use cases. Edge
devices (that we call end or user devices for disambiguation) play the
role of personal computers, dedicated to a user and to a set of correlated
applications. In this new scenario, the boundaries between
the network node, the sensor, and the actuator are blurring, driven
primarily by the computation power of IoT nodes like single board
computers and the smartphones. The bigger data generated in this
type of networks needs clever, scalable, and possibly decentralized
computing solutions that can scale independently as required. Any
node can be seen as part of a graph, with the capacity to serve as a
computing or network router node, or both. Complex applications can
possibly be distributed over this graph or network of nodes to improve
the overall performance like the amount of data processed over time.
In this paper, we identify this new computing paradigm that we call
Social Dispersed Computing, analyzing key themes in it that includes
a new outlook on its relation to agent based applications. We architect
this new paradigm by providing supportive application examples that
include next generation electrical energy distribution networks, next
generation mobility services for transportation, and applications for
distributed analysis and identification of non-recurring traffic congestion
in cities. The paper analyzes the existing computing paradigms
(e.g., cloud, fog, edge, mobile edge, social, etc.), solving the ambiguity
of their definitions; and analyzes and discusses the relevant foundational
software technologies, the remaining challenges, and research
opportunities.Garcia Valls, MS.; Dubey, A.; Botti, V. (2018). Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges. Journal of Systems Architecture. 91:83-102. https://doi.org/10.1016/j.sysarc.2018.05.007S831029
lLTZVisor: a lightweight TrustZone-assisted hypervisor for low-end ARM devices
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresVirtualization is a well-established technology in the server and desktop space
and has recently been spreading across different embedded industries. Facing
multiple challenges derived by the advent of the Internet of Things (IoT) era,
these industries are driven by an upgrowing interest in consolidating and isolating
multiple environments with mixed-criticality features, to address the complex IoT
application landscape. Even though this is true for majority mid- to high-end
embedded applications, low-end systems still present little to no solutions proposed
so far.
TrustZone technology, designed by ARM to improve security on its processors,
was adopted really well in the embedded market. As such, the research community
became active in exploring other TrustZone’s capacities for isolation, like
an alternative form of system virtualization. The lightweight TrustZone-assisted
hypervisor (LTZVisor), that mainly targets the consolidation of mixed-criticality
systems on the same hardware platform, is one design example that takes advantage
of TrustZone technology for ARM application processors. With the recent
introduction of this technology to the new generation of ARM microcontrollers, an
opportunity to expand this breakthrough form of virtualization to low-end devices
arose.
This work proposes the development of the lLTZVisor hypervisor, a refactored
LTZVisor version that aims to provide strong isolation on resource-constrained
devices, while achieving a low-memory footprint, determinism and high efficiency.
The key for this is to implement a minimal, reliable, secure and predictable virtualization
layer, supported by the TrustZone technology present on the newest
generation of ARM microcontrollers (Cortex-M23/33).Virtualização é uma tecnologia já bem estabelecida no âmbito de servidores e
computadores pessoais que recentemente tem vindo a espalhar-se através de várias
indústrias de sistemas embebidos. Face aos desafios provenientes do surgimento
da era Internet of Things (IoT), estas indústrias são guiadas pelo crescimento
do interesse em consolidar e isolar múltiplos sistemas com diferentes níveis de
criticidade, para atender ao atual e complexo cenário aplicativo IoT. Apesar de
isto se aplicar à maioria de aplicações embebidas de média e alta gama, sistemas
de baixa gama apresentam-se ainda com poucas soluções propostas.
A tecnologia TrustZone, desenvolvida pela ARM de forma a melhorar a segurança
nos seus processadores, foi adoptada muito bem pelo mercado dos sistemas embebidos.
Como tal, a comunidade científica começou a explorar outras aplicações
da tecnologia TrustZone para isolamento, como uma forma alternativa de virtualização
de sistemas. O "lightweight TrustZone-assisted hypervisor (LTZVisor)",
que tem sobretudo como fim a consolidação de sistemas de criticidade mista na
mesma plataforma de hardware, é um exemplo que tira vantagem da tecnologia
TrustZone para os processadores ARM de alta gama. Com a recente introdução
desta tecnologia para a nova geração de microcontroladores ARM, surgiu uma
oportunidade para expandir esta forma inovadora de virtualização para dispositivos
de baixa gama.
Este trabalho propõe o desenvolvimento do hipervisor lLTZVisor, uma versão
reestruturada do LTZVisor que visa em proporcionar um forte isolamento em dispositivos
com recursos restritos, simultâneamente atingindo um baixo footprint de
memória, determinismo e alta eficiência. A chave para isto está na implementação
de uma camada de virtualização mínima, fiável, segura e previsível, potencializada
pela tecnologia TrustZone presente na mais recente geração de microcontroladores
ARM (Cortex-M23/33)
Infrastructural Security for Virtualized Grid Computing
The goal of the grid computing paradigm is to make computer power as easy to access as an electrical power grid. Unlike the power grid, the computer grid uses remote resources located at a service provider. Malicious users can abuse the provided resources, which not only affects their own systems but also those of the provider and others.
Resources are utilized in an environment where sensitive programs and data from competitors are processed on shared resources, creating again the potential for misuse. This is one of the main security issues, since in a business environment competitors distrust each other, and the fear of industrial espionage is always present. Currently, human trust is the strategy used to deal with these threats. The relationship between grid users and resource providers ranges from highly trusted to highly untrusted. This wide trust relationship occurs because grid computing itself changed from a research topic with few users to a widely deployed product that included early commercial adoption. The traditional open research communities have very low security requirements, while in contrast, business customers often operate on sensitive data that represents intellectual property; thus, their security demands are very high. In traditional grid computing, most users share the same resources concurrently. Consequently, information regarding other users and their jobs can usually be acquired quite easily. This includes, for example, that a user can see which processes are running on another user´s system. For business users, this is unacceptable since even the meta-data of their jobs is classified. As a consequence, most commercial customers are not convinced that their intellectual property in the form of software and data is protected in the grid.
This thesis proposes a novel infrastructural security solution that advances the concept of virtualized grid computing. The work started back in 2007 and led to the development of the XGE, a virtual grid management software. The XGE itself uses operating system virtualization to provide a virtualized landscape. Users’ jobs are no longer executed in a shared manner; they are executed within special sandboxed environments. To satisfy the requirements of a traditional grid setup, the solution can be coupled with an installed scheduler and grid middleware on the grid head node. To protect the prominent grid head node, a novel dual-laned demilitarized zone is introduced to make attacks more difficult. In a traditional grid setup, the head node and the computing nodes are installed in the same network, so a successful attack could also endanger the user´s software and data. While the zone complicates attacks, it is, as all security solutions, not a perfect solution. Therefore, a network intrusion detection system is enhanced with grid specific signatures. A novel software called Fence is introduced that supports end-to-end encryption, which means that all data remains encrypted until it reaches its final destination. It transfers data securely between the user´s computer, the head node and the nodes within the shielded, internal network. A lightweight kernel rootkit detection system assures that only trusted kernel modules can be loaded. It is no longer possible to load untrusted modules such as kernel rootkits. Furthermore, a malware scanner for virtualized grids scans for signs of malware in all running virtual machines. Using virtual machine introspection, that scanner remains invisible for most types of malware and has full access to all system calls on the monitored system. To speed up detection, the load is distributed to multiple detection engines simultaneously. To enable multi-site service-oriented grid applications, the novel concept of public virtual nodes is presented. This is a virtualized grid node with a public IP address shielded by a set of dynamic firewalls. It is possible to create a set of connected, public nodes, either present on one or more remote grid sites. A special web service allows users to modify their own rule set in both directions and in a controlled manner.
The main contribution of this thesis is the presentation of solutions that convey the security of grid computing infrastructures. This includes the XGE, a software that transforms a traditional grid into a virtualized grid. Design and implementation details including experimental evaluations are given for all approaches. Nearly all parts of the software are available as open source software. A summary of the contributions and an outlook to future work conclude this thesis
An Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments
Cloud-based machine learning tools for enhanced Big Data applications}, where the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classification, which is very relevant at now in the specific context of \emph{Big Data}. The so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applications. Implementation-wise, our method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}, which represent classical ``commodity'' settings for Cloud-based big data applications. Given a number of known reference workloads, and an unknown workload, in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one. The depicted scenario turns out to be useful in a plethora of modern information system applications. We name this problem as \emph{coarse-grained workload classification}, because, instead of characterizing the unknown workload in terms of finer behaviors, such as CPU, memory, disk, or network intensive patterns, we classify the whole unknown workload as one of the (possible) reference workloads. Reference workloads represent a category of workloads that are relevant in a given applicative environment. In particular, we focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}. Today, \emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farms. In virtualization frameworks, workload classification is very useful for accounting, security reasons, or user profiling. Hence, our research makes more sense in such environments, and it turns out to be very useful in a special context like Cloud Computing, which is emerging now. In this respect, our approach consists of running several machine learning-based classifiers of different workload models, and then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}, in order to magnify the accuracy of the final classification. Experimental assessment and analysis clearly confirm the benefits derived from our classification framework. The running programs which produce unknown workloads to be classified are treated in a similar way. A fundamental aspect of this paper concerns the successful use of data fusion in workload classification. Different types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination, giving a classification accuracy of slightly less than . The acquisition of data from the running process, the pre-processing algorithms, and the workload classification are described in detail. Various classical algorithms have been used for classification to classify the workloads, and the results are compared
Mesure et analyse de latences dans les systèmes parallèles en temps réel
RÉSUMÉ
Avec les infrastructures de type infonuagiques qui augmentent de plus en plus et les services qui exploitent les avantages de la parallélisation, on arrive à un point où les analyses de problèmes de performance sont de plus en plus complexes. En particulier, avec la parallélisation, un problème de latence sur un des composants peut ralentir une requête complète. Avec la multiplication du nombre de serveurs responsables d'une seule requête, la quantité de tests et de combinaisons à valider pour trouver une source de latence peut augmenter de manière exponentielle. Les problèmes à analyser ne sont pas nouveaux, ils sont similaires à ceux étudiés dans les systèmes temps-réel. La problématique cependant se situe au niveau de la détection automatisée en temps réel des problèmes dans des conditions réelles d'exploitation, et la mise à l'échelle de la collecte de données de contexte permettant la résolution.
Dans cette thèse, nous proposons le \texttt{latency-tracker} comme solution efficace pour la mesure et l'analyse en temps réel de latences, et de le combiner avec le traceur \texttt{LTTng} pour la collecte et l'extraction de traces localement et sur le réseau. L'objectif principal est de rendre ces analyses complexes assez efficaces et non-intrusives pour fonctionner sur des machines de production, que ce soit sur des serveurs ou des appareils embarqués dédiés aux applications temps réel. Cette approche de la détection et de la compréhension des problèmes de latence dans l'ordre des dizaines de micro-secondes au niveau du noyau Linux est nouvelle et il n'existe pas d'équivalent à l'heure actuelle.
En mesurant l'impact de tous les composants ajoutés dans le chemin critique des applications de manière individuelle, nous démontrons qu'il est possible d'utiliser cette approche dans des environnements très exigeants. Les mesures se concentrent au niveau de la consommation des ressources, jusqu'à l'effet sur les lignes de cache, mais également sur la mise à l'échelle sur des applications concurrentes et distribuées.
La contribution principale de cette recherche se situe au niveau de l'ensemble des algorithmes développés permettant de mesurer précisément les latences avec un impact minimal, et de collecter assez d'informations de contexte pour en expliquer les causes. Ce faible impact permet l'application de ces méthodes dans des situations réelles où il était jusqu'à présent impossible de faire ce type de mesures sans modifier les conditions d'exécution. La spécialisation et l'optimisation des techniques actuelles d'agrégation, et la combinaison avec le domaine du traçage, donne ainsi naissance au domaine du traçage à état.----------ABSTRACT
Today's server infrastructures are more and more organized around the cloud and virtualization technologies, and the services that run on these infrastructures tend to heavily use parallelisation to scale up to the demand. With this type of distributed systems, the performance analyses are becoming increasingly complex. Indeed, the work required to answer a single request can be divided among multiple servers, and a problem with any of the nodes can slow down the whole request. Finding the exact source of an abnormal latency in this kind of configuration can be really difficult and requires a lot of time. The problems we encounter are not new, they are similar to the ones faced by real-time systems. The biggest issue is to automatically detect in real-time these problems in production, and to have a scalable way to collect the context information required to understand and solve the problems.
In this thesis, we propose the \texttt{latency-tracker} as a solution to efficiently measure and analyse latency problems in real-time, and to combine it with the \texttt{LTTng} tracer to gather and extract traces locally and on the network. The main objective is to make these complex analyses efficient enough to run on production machines, either servers in data-centers, or embedded platforms dedicated to real-time tasks. This approach to detect and explain latency issues in the order of tens of microseconds in the Linux kernel is new and there is no equivalent solution today.
By individually measuring the impact of all the components added in the critical path of the applications, we demonstrate that it is possible to use this approach in very demanding environments. We measure the impact on the usage of resources, down to the impact on cache lines, but we also study the scalability of our approach on highly concurrent and distributed applications.
The main contribution of this research is the set of algorithms developed to accurately measure latencies with a minimal impact, and to collect and extract enough context informations to understand the latency causes. This low impact enables the use of these methodologies in production, under real loads, which would be impossible with the existing tools today without risking to modify the execution conditions. We specialize and optimize the current techniques related to event agregation, and combine it with tracing to create the new domain of stateful tracing
Recommended from our members
Provenance-based computing
Relying on computing systems that become increasingly complex is difficult:
with many factors potentially affecting the result of a computation or its
properties, understanding where problems appear and fixing them is a
challenging proposition. Typically, the process of finding solutions is driven
by trial and error or by experience-based insights.
In this dissertation, I examine the idea of using provenance metadata (the set
of elements that have contributed to the existence of a piece of data, together
with their relationships) instead. I show that considering provenance a
primitive of computation enables the exploration of system behaviour, targeting
both retrospective analysis (root cause analysis, performance tuning) and
hypothetical scenarios (what-if questions). In this context, provenance can be
used as part of feedback loops, with a double purpose: building software that
is able to adapt for meeting certain quality and performance targets
(semi-automated tuning) and enabling human operators to exert high-level
runtime control with limited previous knowledge of a system's internal architecture.
My contributions towards this goal are threefold: providing low-level
mechanisms for meaningful provenance collection considering OS-level resource
multiplexing, proving that such provenance data can be used in inferences about
application behaviour and generalising this to a set of primitives necessary for
fine-grained provenance disclosure in a wider context.
To derive such primitives in a bottom-up manner, I first present Resourceful, a
framework that enables capturing OS-level measurements in the context of
application activities. It is the contextualisation that allows tying the
measurements to provenance in a meaningful way, and I look at a number of
use-cases in understanding application performance. This also provides a good
setup for evaluating the impact and overheads of fine-grained provenance
collection.
I then show that the collected data enables new ways of understanding
performance variation by attributing it to specific components within a
system. The resulting set of tools, Soroban, gives developers and operation
engineers a principled way of examining the impact of various configuration, OS and virtualization parameters on application behaviour.
Finally, I consider how this supports the idea that provenance should be
disclosed at application level and discuss why such disclosure is necessary for
enabling the use of collected metadata efficiently and at a granularity which
is meaningful in relation to application semantics.CHESS Scholarship Scheme
EPSR
Observing the clouds : a survey and taxonomy of cloud monitoring
This research was supported by a Royal Society Industry Fellowship and an Amazon Web Services (AWS) grant. Date of Acceptance: 10/12/2014Monitoring is an important aspect of designing and maintaining large-scale systems. Cloud computing presents a unique set of challenges to monitoring including: on-demand infrastructure, unprecedented scalability, rapid elasticity and performance uncertainty. There are a wide range of monitoring tools originating from cluster and high-performance computing, grid computing and enterprise computing, as well as a series of newer bespoke tools, which have been designed exclusively for cloud monitoring. These tools express a number of common elements and designs, which address the demands of cloud monitoring to various degrees. This paper performs an exhaustive survey of contemporary monitoring tools from which we derive a taxonomy, which examines how effectively existing tools and designs meet the challenges of cloud monitoring. We conclude by examining the socio-technical aspects of monitoring, and investigate the engineering challenges and practices behind implementing monitoring strategies for cloud computing.Publisher PDFPeer reviewe
- …