98 research outputs found

    Managing Dynamic Enterprise and Urgent Workloads on Clouds Using Layered Queuing and Historical Performance Models

    No full text
    The automatic allocation of enterprise workload to resources can be enhanced by being able to make what-if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: i.) comparatively evaluate the layered queuing and historical techniques; ii.) evaluate the effectiveness of the management algorithm in different operating scenarios; and iii.) provide guidance on using prediction-based workload and resource management

    Towards effective dynamic resource allocation for enterprise applications

    Get PDF
    The growing use of online services requires substantial supporting infrastructure. The efficient deployment of applications relies on the cost effectiveness of commercial hosting providers who deliver an agreed quality of service as governed by a service level agreement for a fee. The priorities of the commercial hosting provider are to maximise revenue, by delivering agreed service levels, and minimise costs, through high resource utilisation. In order to deliver high service levels and resource utilisation, it may be necessary to reorganise resources during periods of high demand. This reorganisation process may be manual or alternatively controlled by an autonomous process governed by a dynamic resource allocation algorithm. Dynamic resource allocation has been shown to improve service levels and utilisation and hence, profitability. In this thesis several facets of dynamic resource allocation are examined to asses its suitability for the modern data centre. Firstly, three theoretically derived policies are implemented as a middleware for a modern multi-tier Web application and their performance is examined under a range of workloads in a real world test bed. The scalability of state-of-the art resource allocation policies are explored in two dimensions, namely the number of applications and the quantity of servers under control of the resources allocation policy. The results demonstrate that current policies presented in the literature demonstrate poor scalability in one or both of these dimensions. A new policy is proposed which has significantly improved scalability characteristics and the new policy is demonstrated at scale through simulation. The placement of applications in across a datacenter makes them susceptible to failures in shared infrastructure. To address this issue an application placement mechanism is developed to augment any dynamic resource allocation policy. The results of this placement mechanism demonstrate a significant improvement in the worst case when compared to a random allocation mechanism. A model for the reallocation of resources in a dynamic resource allocation system is also devised. The model demonstrates that the assumption of a constant resource reallocation cost is invalid under both physical reallocation and migration of virtualised resources

    Fault-Tolerant, Scalable and Interoperable IoT Platform

    Get PDF
    Tese de mestrado, Engenharia Informática (Engenharia de Software) Universidade de Lisboa, Faculdade de Ciências, 2020Nowadays the growth of Internet usage is quite visible. Everyday the number of devices connected to the Internet increases, everything may be a smart device capable of interacting with the Internet, from smartphones, smartwatches, refrigerators and much more. All of these devices are called things in the Internet of Things. Many of them are usually constrained devices due to it’s size, usually very small with low capacities such as memory and/or processing power. These kind of devices need to be very efficient in all of their actives. For example, the battery lifetime should be maximized as possible so that the necessity to change each device’s battery could be minimized. There are many technologies that allow communication between devices. Besides the technologies, protocols may be involved in the communication between each device in an IoT system. Communication Protocols define the behaviour that is followed by things when communicating with each other. For example, in some protocols acknowledgments must be used to ensure data arrival, while in others this feature is not enforced. There are many communication Protocols available in the literature. The use of communication protocols and communication models bring many benefits to IoT systems, but they may also benefit from using the cloud. One of the biggest struggles in IoT is the fact that things are very constrained devices in terms of resources (CPU and RAM). With the cloud this would no longer be an issue. Plus, the cloud is able of providing device management, scalability, storage and real time transmission. The characteristics of the communication protocols were studied and an innovative system architecture based on micro-services, Kubernetes and Kafka is proposed in this thesis. This proposal tries to address issues such as scalability, interoperability, fault tolerance, resiliency, availability and simple management of large IoT systems. Supported by Kubernetes, which is an open-source technology that allows micro-services to be extensible, configurable and automatically managed with fault tolerance and Kafka, which is a distributed event log that uses the publish-subscribe pattern, the proposed architecture is able to deal with high number of devices producing and consuming data at the same time. The proposed Fault-Tolerant and Interoperable IoT Architecture is a cluster composed of many components (micro-services) that were implemented using docker containers. The current implementation of the system supports the MQTT, CoAP and REST protocols for data incoming and the same plus websockets for data output. Since the system is based on micro-services, more protocols may be added in a simple way (just a new micro-service must be added). The system is able to convert any protocol into another protocol, e.g., if a message arrives at the system through MQTT protocol, it can be consumed using the CoAP or REST protocol. When messages are sent to the system the payload is stored in Kafka independently of the protocol, and when clients request it, it is consumed from Kafka and encapsulated by the client protocol to be sent to the client. In order to evaluate and demonstrate the capabilities of our proposal a set of experiments were made, which allows to collect information about the performance of the Communication Protocols, the system as a whole, Kubernetes and Kafka. From the experiments we were able to conclude that the message size is not so much important, since the system is able to deal with messages from 39 bytes to 2000 bytes. Since we are designing the system for IoT applications, we considered that messages with 2000 Bytes are big messages. Also, it was recognized that the system is able to recover from crashed nodes and to respond well in terms of average delay and packet loss when low and high throughput are compared. In this situation, there is a significant impact of the RAM usage, but the system still works without problems. In terms of scalability, the evaluation of the system through its cluster under-layer platform (Kubernetes) allowed us to understand that there is no direct relation between the time spent toconstant. However, the same conclusion is not true for the number of instances that are needed at high layer (application layer). Here, time spent to increase the number of instances of a specific application is directly proportional to the number of instances that are already running. In respect to data redundancy and persistence, the experiments showed that the average delay and packet loss of a message sent from a Producer to a Receiver is approximately the same regardless of the number of Kafka instances being used. Additionally, using a high number of partitions has a negative impact on the system’s behaviour

    Investigating into Cloud Resource Management Mechanisms

    No full text
    Driven by the rapid growth of the demand for efficient and economical computational power, cloud computing has led the world into a new era. It delivers computing resources as services, whereby shared resources are provided to cloud users over the network in order to offer dynamic flexible resource provisioning for reliable and guaranteed services by using pay-as-you-use pricing model. Since multiple cloud users can request cloud resources simultaneously, cloud resource management mechanisms must operate in an efficient manner to satisfy demand of cloud users. Therefore, investigating cloud resource management mechanisms to achieve cloud resource efficiency is one of key elements that benefits both cloud providers and users. In this thesis, we present cloud resource management mechanisms for two different cloud infrastructures, i.e. virtual machine-based (VM-based) and application-based infrastructure. The VM-based infrastructure is an infrastructure that provides multi-tenancy for cloud users at VM-level, i.e. each cloud user directly controls their VMs in the cloud environment. The application-based infrastructure provides multi-tenancy at application level, in the other word, each cloud user directly control their applications in the cloud environment. For the VM-based infrastructure, we introduce two heuristics metrics to capture multi-dimensional characteristics of logical machines. By using a multivariate probabilistic model, we develop an algorithm to improve resource utilisation for the VM-based infrastructure. We then designed and implemented an application-based infrastructure called Elastic Application Container system (EAC system) to support multi-tenant cloud use. Based on the characteristics of the application-based and the VM-based infrastructure, we developed auto-scaling algorithms that can automatically scale cloud resources in the EAC system. In general, the cloud resource management mechanisms proposed in this thesis aims to investigate resource management mechanisms for cloud resource utilisation in the VM-based infrastructure and to provide suitable cloud resource provisioning mechanisms for the application-based infrastructure.Imperial Users Onl

    Service level agreement specification for IoT application workflow activity deployment, configuration and monitoring

    Get PDF
    PhD ThesisCurrently, we see the use of the Internet of Things (IoT) within various domains such as healthcare, smart homes, smart cars, smart-x applications, and smart cities. The number of applications based on IoT and cloud computing is projected to increase rapidly over the next few years. IoT-based services must meet the guaranteed levels of quality of service (QoS) to match users’ expectations. Ensuring QoS through specifying the QoS constraints using service level agreements (SLAs) is crucial. Also because of the potentially highly complex nature of multi-layered IoT applications, lifecycle management (deployment, dynamic reconfiguration, and monitoring) needs to be automated. To achieve this it is essential to be able to specify SLAs in a machine-readable format. currently available SLA specification languages are unable to accommodate the unique characteristics (interdependency of its multi-layers) of the IoT domain. Therefore, in this research, we propose a grammar for a syntactical structure of an SLA specification for IoT. The grammar is based on a proposed conceptual model that considers the main concepts that can be used to express the requirements for most common hardware and software components of an IoT application on an end-to-end basis. We follow the Goal Question Metric (GQM) approach to evaluate the generality and expressiveness of the proposed grammar by reviewing its concepts and their predefined lists of vocabularies against two use-cases with a number of participants whose research interests are mainly related to IoT. The results of the analysis show that the proposed grammar achieved 91.70% of its generality goal and 93.43% of its expressiveness goal. To enhance the process of specifying SLA terms, We then developed a toolkit for creating SLA specifications for IoT applications. The toolkit is used to simplify the process of capturing the requirements of IoT applications. We demonstrate the effectiveness of the toolkit using a remote health monitoring service (RHMS) use-case as well as applying a user experience measure to evaluate the tool by applying a questionnaire-oriented approach. We discussed the applicability of our tool by including it as a core component of two different applications: 1) a contextaware recommender system for IoT configuration across layers; and 2) a tool for automatically translating an SLA from JSON to a smart contract, deploying it on different peer nodes that represent the contractual parties. The smart contract is able to monitor the created SLA using Blockchain technology. These two applications are utilized within our proposed SLA management framework for IoT. Furthermore, we propose a greedy heuristic algorithm to decentralize workflow activities of an IoT application across Edge and Cloud resources to enhance response time, cost, energy consumption and network usage. We evaluated the efficiency of our proposed approach using iFogSim simulator. The performance analysis shows that the proposed algorithm minimized cost, execution time, networking, and Cloud energy consumption compared to Cloud-only and edge-ward placement approaches

    Cost-effective resource management for distributed computing

    Get PDF
    Current distributed computing and resource management infrastructures (e.g., Cluster and Grid) suffer from a wide variety of problems related to resource management, which include scalability bottleneck, resource allocation delay, limited quality-of-service (QoS) support, and lack of cost-aware and service level agreement (SLA) mechanisms. This thesis addresses these issues by presenting a cost-effective resource management solution which introduces the possibility of managing geographically distributed resources in resource units that are under the control of a Virtual Authority (VA). A VA is a collection of resources controlled, but not necessarily owned, by a group of users or an authority representing a group of users. It leverages the fact that different resources in disparate locations will have varying usage levels. By creating smaller divisions of resources called VAs, users would be given the opportunity to choose between a variety of cost models, and each VA could rent resources from resource providers when necessary, or could potentially rent out its own resources when underloaded. The resource management is simplified since the user and owner of a resource recognize only the VA because all permissions and charges are associated directly with the VA. The VA is controlled by a ’rental’ policy which is supported by a pool of resources that the system may rent from external resource providers. As far as scheduling is concerned, the VA is independent from competitors and can instead concentrate on managing its own resources. As a result, the VA offers scalable resource management with minimal infrastructure and operating costs. We demonstrate the feasibility of the VA through both a practical implementation of the prototype system and an illustration of its quantitative advantages through the use of extensive simulations. First, the VA concept is demonstrated through a practical implementation of the prototype system. Further, we perform a cost-benefit analysis of current distributed resource infrastructures to demonstrate the potential cost benefit of such a VA system. We then propose a costing model for evaluating the cost effectiveness of the VA approach by using an economic approach that captures revenues generated from applications and expenses incurred from renting resources. Based on our costing methodology, we present rental policies that can potentially offer effective mechanisms for running distributed and parallel applications without a heavy upfront investment and without the cost of maintaining idle resources. By using real workload trace data, we test the effectiveness of our proposed rental approaches. Finally, we propose an extension to the VA framework that promotes long-term negotiations and rentals based on service level agreements or long-term contracts. Based on the extended framework, we present new SLA-aware policies and evaluate them using real workload traces to demonstrate their effectiveness in improving rental decisions

    Security and trust in cloud computing and IoT through applying obfuscation, diversification, and trusted computing technologies

    Get PDF
    Cloud computing and Internet of Things (IoT) are very widely spread and commonly used technologies nowadays. The advanced services offered by cloud computing have made it a highly demanded technology. Enterprises and businesses are more and more relying on the cloud to deliver services to their customers. The prevalent use of cloud means that more data is stored outside the organization’s premises, which raises concerns about the security and privacy of the stored and processed data. This highlights the significance of effective security practices to secure the cloud infrastructure. The number of IoT devices is growing rapidly and the technology is being employed in a wide range of sectors including smart healthcare, industry automation, and smart environments. These devices collect and exchange a great deal of information, some of which may contain critical and personal data of the users of the device. Hence, it is highly significant to protect the collected and shared data over the network; notwithstanding, the studies signify that attacks on these devices are increasing, while a high percentage of IoT devices lack proper security measures to protect the devices, the data, and the privacy of the users. In this dissertation, we study the security of cloud computing and IoT and propose software-based security approaches supported by the hardware-based technologies to provide robust measures for enhancing the security of these environments. To achieve this goal, we use obfuscation and diversification as the potential software security techniques. Code obfuscation protects the software from malicious reverse engineering and diversification mitigates the risk of large-scale exploits. We study trusted computing and Trusted Execution Environments (TEE) as the hardware-based security solutions. Trusted Platform Module (TPM) provides security and trust through a hardware root of trust, and assures the integrity of a platform. We also study Intel SGX which is a TEE solution that guarantees the integrity and confidentiality of the code and data loaded onto its protected container, enclave. More precisely, through obfuscation and diversification of the operating systems and APIs of the IoT devices, we secure them at the application level, and by obfuscation and diversification of the communication protocols, we protect the communication of data between them at the network level. For securing the cloud computing, we employ obfuscation and diversification techniques for securing the cloud computing software at the client-side. For an enhanced level of security, we employ hardware-based security solutions, TPM and SGX. These solutions, in addition to security, ensure layered trust in various layers from hardware to the application. As the result of this PhD research, this dissertation addresses a number of security risks targeting IoT and cloud computing through the delivered publications and presents a brief outlook on the future research directions.Pilvilaskenta ja esineiden internet ovat nykyään hyvin tavallisia ja laajasti sovellettuja tekniikkoja. Pilvilaskennan pitkälle kehittyneet palvelut ovat tehneet siitä hyvin kysytyn teknologian. Yritykset enenevässä määrin nojaavat pilviteknologiaan toteuttaessaan palveluita asiakkailleen. Vallitsevassa pilviteknologian soveltamistilanteessa yritykset ulkoistavat tietojensa käsittelyä yrityksen ulkopuolelle, minkä voidaan nähdä nostavan esiin huolia taltioitavan ja käsiteltävän tiedon turvallisuudesta ja yksityisyydestä. Tämä korostaa tehokkaiden turvallisuusratkaisujen merkitystä osana pilvi-infrastruktuurin turvaamista. Esineiden internet -laitteiden lukumäärä on nopeasti kasvanut. Teknologiana sitä sovelletaan laajasti monilla sektoreilla, kuten älykkäässä terveydenhuollossa, teollisuusautomaatiossa ja älytiloissa. Sellaiset laitteet keräävät ja välittävät suuria määriä informaatiota, joka voi sisältää laitteiden käyttäjien kannalta kriittistä ja yksityistä tietoa. Tästä syystä johtuen on erittäin merkityksellistä suojata verkon yli kerättävää ja jaettavaa tietoa. Monet tutkimukset osoittavat esineiden internet -laitteisiin kohdistuvien tietoturvahyökkäysten määrän olevan nousussa, ja samaan aikaan suuri osuus näistä laitteista ei omaa kunnollisia teknisiä ominaisuuksia itse laitteiden tai niiden käyttäjien yksityisen tiedon suojaamiseksi. Tässä väitöskirjassa tutkitaan pilvilaskennan sekä esineiden internetin tietoturvaa ja esitetään ohjelmistopohjaisia tietoturvalähestymistapoja turvautumalla osittain laitteistopohjaisiin teknologioihin. Esitetyt lähestymistavat tarjoavat vankkoja keinoja tietoturvallisuuden kohentamiseksi näissä konteksteissa. Tämän saavuttamiseksi työssä sovelletaan obfuskaatiota ja diversifiointia potentiaalisiana ohjelmistopohjaisina tietoturvatekniikkoina. Suoritettavan koodin obfuskointi suojaa pahantahtoiselta ohjelmiston takaisinmallinnukselta ja diversifiointi torjuu tietoturva-aukkojen laaja-alaisen hyödyntämisen riskiä. Väitöskirjatyössä tutkitaan luotettua laskentaa ja luotettavan laskennan suoritusalustoja laitteistopohjaisina tietoturvaratkaisuina. TPM (Trusted Platform Module) tarjoaa turvallisuutta ja luottamuksellisuutta rakentuen laitteistopohjaiseen luottamukseen. Pyrkimyksenä on taata suoritusalustan eheys. Työssä tutkitaan myös Intel SGX:ää yhtenä luotettavan suorituksen suoritusalustana, joka takaa suoritettavan koodin ja datan eheyden sekä luottamuksellisuuden pohjautuen suojatun säiliön, saarekkeen, tekniseen toteutukseen. Tarkemmin ilmaistuna työssä turvataan käyttöjärjestelmä- ja sovellusrajapintatasojen obfuskaation ja diversifioinnin kautta esineiden internet -laitteiden ohjelmistokerrosta. Soveltamalla samoja tekniikoita protokollakerrokseen, työssä suojataan laitteiden välistä tiedonvaihtoa verkkotasolla. Pilvilaskennan turvaamiseksi työssä sovelletaan obfuskaatio ja diversifiointitekniikoita asiakaspuolen ohjelmistoratkaisuihin. Vankemman tietoturvallisuuden saavuttamiseksi työssä hyödynnetään laitteistopohjaisia TPM- ja SGX-ratkaisuja. Tietoturvallisuuden lisäksi nämä ratkaisut tarjoavat monikerroksisen luottamuksen rakentuen laitteistotasolta ohjelmistokerrokseen asti. Tämän väitöskirjatutkimustyön tuloksena, osajulkaisuiden kautta, vastataan moniin esineiden internet -laitteisiin ja pilvilaskentaan kohdistuviin tietoturvauhkiin. Työssä esitetään myös näkemyksiä jatkotutkimusaiheista

    Artificial intelligence driven anomaly detection for big data systems

    Get PDF
    The main goal of this thesis is to contribute to the research on automated performance anomaly detection and interference prediction by implementing Artificial Intelligence (AI) solutions for complex distributed systems, especially for Big Data platforms within cloud computing environments. The late detection and manual resolutions of performance anomalies and system interference in Big Data systems may lead to performance violations and financial penalties. Motivated by this issue, we propose AI-based methodologies for anomaly detection and interference prediction tailored to Big Data and containerized batch platforms to better analyze system performance and effectively utilize computing resources within cloud environments. Therefore, new precise and efficient performance management methods are the key to handling performance anomalies and interference impacts to improve the efficiency of data center resources. The first part of this thesis contributes to performance anomaly detection for in-memory Big Data platforms. We examine the performance of Big Data platforms and justify our choice of selecting the in-memory Apache Spark platform. An artificial neural network-driven methodology is proposed to detect and classify performance anomalies for batch workloads based on the RDD characteristics and operating system monitoring metrics. Our method is evaluated against other popular machine learning algorithms (ML), as well as against four different monitoring datasets. The results prove that our proposed method outperforms other ML methods, typically achieving 98–99% F-scores. Moreover, we prove that a random start instant, a random duration, and overlapped anomalies do not significantly impact the performance of our proposed methodology. The second contribution addresses the challenge of anomaly identification within an in-memory streaming Big Data platform by investigating agile hybrid learning techniques. We develop TRACK (neural neTwoRk Anomaly deteCtion in sparK) and TRACK-Plus, two methods to efficiently train a class of machine learning models for performance anomaly detection using a fixed number of experiments. Our model revolves around using artificial neural networks with Bayesian Optimization (BO) to find the optimal training dataset size and configuration parameters to efficiently train the anomaly detection model to achieve high accuracy. The objective is to accelerate the search process for finding the size of the training dataset, optimizing neural network configurations, and improving the performance of anomaly classification. A validation based on several datasets from a real Apache Spark Streaming system is performed, demonstrating that the proposed methodology can efficiently identify performance anomalies, near-optimal configuration parameters, and a near-optimal training dataset size while reducing the number of experiments up to 75% compared with naïve anomaly detection training. The last contribution overcomes the challenges of predicting completion time of containerized batch jobs and proactively avoiding performance interference by introducing an automated prediction solution to estimate interference among colocated batch jobs within the same computing environment. An AI-driven model is implemented to predict the interference among batch jobs before it occurs within system. Our interference detection model can alleviate and estimate the task slowdown affected by the interference. This model assists the system operators in making an accurate decision to optimize job placement. Our model is agnostic to the business logic internal to each job. Instead, it is learned from system performance data by applying artificial neural networks to establish the completion time prediction of batch jobs within the cloud environments. We compare our model with three other baseline models (queueing-theoretic model, operational analysis, and an empirical method) on historical measurements of job completion time and CPU run-queue size (i.e., the number of active threads in the system). The proposed model captures multithreading, operating system scheduling, sleeping time, and job priorities. A validation based on 4500 experiments based on the DaCapo benchmarking suite was carried out, confirming the predictive efficiency and capabilities of the proposed model by achieving up to 10% MAPE compared with the other models.Open Acces
    corecore