112 research outputs found

    End-to-End Trust Fulfillment of Big Data Workflow Provisioning over Competing Clouds

    Get PDF
    Cloud Computing has emerged as a promising and powerful paradigm for delivering data- intensive, high performance computation, applications and services over the Internet. Cloud Computing has enabled the implementation and success of Big Data, a relatively recent phenomenon consisting of the generation and analysis of abundant data from various sources. Accordingly, to satisfy the growing demands of Big Data storage, processing, and analytics, a large market has emerged for Cloud Service Providers, offering a myriad of resources, platforms, and infrastructures. The proliferation of these services often makes it difficult for consumers to select the most suitable and trustworthy provider to fulfill the requirements of building complex workflows and applications in a relatively short time. In this thesis, we first propose a quality specification model to support dual pre- and post-cloud workflow provisioning, consisting of service provider selection and workflow quality enforcement and adaptation. This model captures key properties of the quality of work at different stages of the Big Data value chain, enabling standardized quality specification, monitoring, and adaptation. Subsequently, we propose a two-dimensional trust-enabled framework to facilitate end-to-end Quality of Service (QoS) enforcement that: 1) automates cloud service provider selection for Big Data workflow processing, and 2) maintains the required QoS levels of Big Data workflows during runtime through dynamic orchestration using multi-model architecture-driven workflow monitoring, prediction, and adaptation. The trust-based automatic service provider selection scheme we propose in this thesis is comprehensive and adaptive, as it relies on a dynamic trust model to evaluate the QoS of a cloud provider prior to taking any selection decisions. It is a multi-dimensional trust model for Big Data workflows over competing clouds that assesses the trustworthiness of cloud providers based on three trust levels: (1) presence of the most up-to-date cloud resource verified capabilities, (2) reputational evidence measured by neighboring users and (3) a recorded personal history of experiences with the cloud provider. The trust-based workflow orchestration scheme we propose aims to avoid performance degradation or cloud service interruption. Our workflow orchestration approach is not only based on automatic adaptation and reconfiguration supported by monitoring, but also on predicting cloud resource shortages, thus preventing performance degradation. We formalize the cloud resource orchestration process using a state machine that efficiently captures different dynamic properties of the cloud execution environment. In addition, we use a model checker to validate our monitoring model in terms of reachability, liveness, and safety properties. We evaluate both our automated service provider selection scheme and cloud workflow orchestration, monitoring and adaptation schemes on a workflow-enabled Big Data application. A set of scenarios were carefully chosen to evaluate the performance of the service provider selection, workflow monitoring and the adaptation schemes we have implemented. The results demonstrate that our service selection outperforms other selection strategies and ensures trustworthy service provider selection. The results of evaluating automated workflow orchestration further show that our model is self-adapting, self-configuring, reacts efficiently to changes and adapts accordingly while enforcing QoS of workflows

    A transfer learning-aided decision support system for multi-cloud brokers

    Get PDF
    Decision-making in a cloud environment is a formidable task due to the proliferation of service offerings, pricing models, and technology standards. A customer entering the diverse cloud market is likely to be overwhelmed with a host of difficult choices in terms of service selection. This applies to all levels of service, but Infrastructure as a Service (IaaS) level is particularly important for the end user given the fact that IaaS provides more choices and control for application developers. In the IaaS domain, however, there is no straightforward method to compare virtual machine performance and, more generally cost/performance trade-offs, within or across cloud providers. A wrong decision can result in a financial loss as well as a reduced application performance. A cloud broker can help in resolving such issues by acting as an intermediary between the cloud provider and the cloud consumer – hence, serving as a decision support system for assisting the customer in the decision process. In this thesis, we exploit machine learning for building an intelligent decision support system which assists customers in making application-driven decisions in a multi-cloud environment. The thesis examines a representative set of appropriate inference and prediction based learning techniques, that are essential for capturing application behaviour on different deployment setups, such as Polynomial Regression and Support Vector Regression (SVR). In addition, the thesis examines the efficiency of the learning techniques, recognising that machine learning can impose significant training overhead. The thesis also introduces a novel transfer learning aided technique, leading to substantial reduction in this overhead. By definition, transfer learning aims to solve the new problem faster or with a better solution by using the previously learned knowledge. Quantitatively, we observed a reduction of approximately 60% in the learning time and cost by transferring the existing knowledge about the application and cloud platform in order to learn a new prediction model for some other application or cloud provider. Intensive experimentation has been performed in this study for learning and evaluation of proposed decision support system. Explicitly, we have used three different representative applications over two cloud providers, namely Amazon and Google. Our proposed decision support system, enriched with transfer learning methods, is capable of generating decisions that are viable across different applications in a multi-cloud environment. Finally, we also discuss lessons learned in terms of architectural principles and techniques for intelligent multi-cloud brokerage

    Security in Cloud Computing: Evaluation and Integration

    Get PDF
    Au cours de la dernière décennie, le paradigme du Cloud Computing a révolutionné la manière dont nous percevons les services de la Technologie de l’Information (TI). Celui-ci nous a donné l’opportunité de répondre à la demande constamment croissante liée aux besoins informatiques des usagers en introduisant la notion d’externalisation des services et des données. Les consommateurs du Cloud ont généralement accès, sur demande, à un large éventail bien réparti d’infrastructures de TI offrant une pléthore de services. Ils sont à même de configurer dynamiquement les ressources du Cloud en fonction des exigences de leurs applications, sans toutefois devenir partie intégrante de l’infrastructure du Cloud. Cela leur permet d’atteindre un degré optimal d’utilisation des ressources tout en réduisant leurs coûts d’investissement en TI. Toutefois, la migration des services au Cloud intensifie malgré elle les menaces existantes à la sécurité des TI et en crée de nouvelles qui sont intrinsèques à l’architecture du Cloud Computing. C’est pourquoi il existe un réel besoin d’évaluation des risques liés à la sécurité du Cloud durant le procédé de la sélection et du déploiement des services. Au cours des dernières années, l’impact d’une efficace gestion de la satisfaction des besoins en sécurité des services a été pris avec un sérieux croissant de la part des fournisseurs et des consommateurs. Toutefois, l’intégration réussie de l’élément de sécurité dans les opérations de la gestion des ressources du Cloud ne requiert pas seulement une recherche méthodique, mais aussi une modélisation méticuleuse des exigences du Cloud en termes de sécurité. C’est en considérant ces facteurs que nous adressons dans cette thèse les défis liés à l’évaluation de la sécurité et à son intégration dans les environnements indépendants et interconnectés du Cloud Computing. D’une part, nous sommes motivés à offrir aux consommateurs du Cloud un ensemble de méthodes qui leur permettront d’optimiser la sécurité de leurs services et, d’autre part, nous offrons aux fournisseurs un éventail de stratégies qui leur permettront de mieux sécuriser leurs services d’hébergements du Cloud. L’originalité de cette thèse porte sur deux aspects : 1) la description innovatrice des exigences des applications du Cloud relativement à la sécurité ; et 2) la conception de modèles mathématiques rigoureux qui intègrent le facteur de sécurité dans les problèmes traditionnels du déploiement des applications, d’approvisionnement des ressources et de la gestion de la charge de travail au coeur des infrastructures actuelles du Cloud Computing. Le travail au sein de cette thèse est réalisé en trois phases.----------ABSTRACT: Over the past decade, the Cloud Computing paradigm has revolutionized the way we envision IT services. It has provided an opportunity to respond to the ever increasing computing needs of the users by introducing the notion of service and data outsourcing. Cloud consumers usually have online and on-demand access to a large and distributed IT infrastructure providing a plethora of services. They can dynamically configure and scale the Cloud resources according to the requirements of their applications without becoming part of the Cloud infrastructure, which allows them to reduce their IT investment cost and achieve optimal resource utilization. However, the migration of services to the Cloud increases the vulnerability to existing IT security threats and creates new ones that are intrinsic to the Cloud Computing architecture, thus the need for a thorough assessment of Cloud security risks during the process of service selection and deployment. Recently, the impact of effective management of service security satisfaction has been taken with greater seriousness by the Cloud Service Providers (CSP) and stakeholders. Nevertheless, the successful integration of the security element into the Cloud resource management operations does not only require methodical research, but also necessitates the meticulous modeling of the Cloud security requirements. To this end, we address throughout this thesis the challenges to security evaluation and integration in independent and interconnected Cloud Computing environments. We are interested in providing the Cloud consumers with a set of methods that allow them to optimize the security of their services and the CSPs with a set of strategies that enable them to provide security-aware Cloud-based service hosting. The originality of this thesis lies within two aspects: 1) the innovative description of the Cloud applications’ security requirements, which paved the way for an effective quantification and evaluation of the security of Cloud infrastructures; and 2) the design of rigorous mathematical models that integrate the security factor into the traditional problems of application deployment, resource provisioning, and workload management within current Cloud Computing infrastructures. The work in this thesis is carried out in three phases

    An Approach to Guide Users Towards Less Revealing Internet Browsers

    Get PDF
    When browsing the Internet, HTTP headers enable both clients and servers send extra data in their requests or responses such as the User-Agent string. This string contains information related to the sender’s device, browser, and operating system. Previous research has shown that there are numerous privacy and security risks result from exposing sensitive information in the User-Agent string. For example, it enables device and browser fingerprinting and user tracking and identification. Our large analysis of thousands of User-Agent strings shows that browsers differ tremendously in the amount of information they include in their User-Agent strings. As such, our work aims at guiding users towards using less exposing browsers. In doing so, we propose to assign an exposure score to browsers based on the information they expose and vulnerability records. Thus, our contribution in this work is as follows: first, provide a full implementation that is ready to be deployed and used by users. Second, conduct a user study to identify the effectiveness and limitations of our proposed approach. Our implementation is based on using more than 52 thousand unique browsers. Our performance and validation analysis show that our solution is accurate and efficient. The source code and data set are publicly available and the solution has been deployed

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Multimodal Approach for Big Data Analytics and Applications

    Get PDF
    The thesis presents multimodal conceptual frameworks and their applications in improving the robustness and the performance of big data analytics through cross-modal interaction or integration. A joint interpretation of several knowledge renderings such as stream, batch, linguistics, visuals and metadata creates a unified view that can provide a more accurate and holistic approach to data analytics compared to a single standalone knowledge base. Novel approaches in the thesis involve integrating multimodal framework with state-of-the-art computational models for big data, cloud computing, natural language processing, image processing, video processing, and contextual metadata. The integration of these disparate fields has the potential to improve computational tools and techniques dramatically. Thus, the contributions place multimodality at the forefront of big data analytics; the research aims at mapping and under- standing multimodal correspondence between different modalities. The primary contribution of the thesis is the Multimodal Analytics Framework (MAF), a collaborative ensemble framework for stream and batch processing along with cues from multiple input modalities like language, visuals and metadata to combine benefits from both low-latency and high-throughput. The framework is a five-step process: Data ingestion. As a first step towards Big Data analytics, a high velocity, fault-tolerant streaming data acquisition pipeline is proposed through a distributed big data setup, followed by mining and searching patterns in it while data is still in transit. The data ingestion methods are demonstrated using Hadoop ecosystem tools like Kafka and Flume as sample implementations. Decision making on the ingested data to use the best-fit tools and methods. In Big Data Analytics, the primary challenges often remain in processing heterogeneous data pools with a one-method-fits all approach. The research introduces a decision-making system to select the best-fit solutions for the incoming data stream. This is the second step towards building a data processing pipeline presented in the thesis. The decision-making system introduces a Fuzzy Graph-based method to provide real-time and offline decision-making. Lifelong incremental machine learning. In the third step, the thesis describes a Lifelong Learning model at the processing layer of the analytical pipeline, following the data acquisition and decision making at step two for downstream processing. Lifelong learning iteratively increments the training model using a proposed Multi-agent Lambda Architecture (MALA), a collaborative ensemble architecture between the stream and batch data. As part of the proposed MAF, MALA is one of the primary contributions of the research.The work introduces a general-purpose and comprehensive approach in hybrid learning of batch and stream processing to achieve lifelong learning objectives. Improving machine learning results through ensemble learning. As an extension of the Lifelong Learning model, the thesis proposes a boosting based Ensemble method as the fourth step of the framework, improving lifelong learning results by reducing the learning error in each iteration of a streaming window. The strategy is to incrementally boost the learning accuracy on each iterating mini-batch, enabling the model to accumulate knowledge faster. The base learners adapt more quickly in smaller intervals of a sliding window, improving the machine learning accuracy rate by countering the concept drift. Cross-modal integration between text, image, video and metadata for more comprehensive data coverage than a text-only dataset. The final contribution of this thesis is a new multimodal method where three different modalities: text, visuals (image and video) and metadata, are intertwined along with real-time and batch data for more comprehensive input data coverage than text-only data. The model is validated through a detailed case study on the contemporary and relevant topic of the COVID-19 pandemic. While the remainder of the thesis deals with text-only input, the COVID-19 dataset analyzes both textual and visual information in integration. Post completion of this research work, as an extension to the current framework, multimodal machine learning is investigated as a future research direction
    • …
    corecore