137 research outputs found

    DATA-DRIVEN PRODUCT RETURNS PREDICTION: A CLOUD-BASED ENSEMBLE SELECTION APPROACH

    Get PDF
    The number of product returns represents a considerable cost factor in e-commerce, especially in the apparel sector. The application of advanced information technologies and predictive analytics, enabling to capture and analyze massive amounts of user data, pave the way for a more efficient management of product returns and reverse logistics. However, we identify a lack of data-driven approaches in this area, especially regarding product returns prediction. In this paper, we present an ensemble selection approach for predicting product returns in the apparel sector. Computational experiments indicate that our approach produces satisfying results in terms of prediction quality. We further explore the correlation between sample sizes and computational times. Thereby, we demonstrate that the run-time increases exponentially when using more data records. To address heavy run-time overheads resulting from high processing and memory requirements of classifiers, we present a framework to embed ensemble selection processes into a highly scalable cloud environment. The framework explains the provisioning of cloud resources and parallelization of tasks according to ensemble selection processes. It further builds a basis for considering data streams, data splitting, and a dynamic adoption of changing customer behavior over time, which has not been considered in related work so far. The envisioned forecasting support system aids retailers in reducing product returns and increasing profit margins

    Security in Cloud Computing: Evaluation and Integration

    Get PDF
    Au cours de la dernière décennie, le paradigme du Cloud Computing a révolutionné la manière dont nous percevons les services de la Technologie de l’Information (TI). Celui-ci nous a donné l’opportunité de répondre à la demande constamment croissante liée aux besoins informatiques des usagers en introduisant la notion d’externalisation des services et des données. Les consommateurs du Cloud ont généralement accès, sur demande, à un large éventail bien réparti d’infrastructures de TI offrant une pléthore de services. Ils sont à même de configurer dynamiquement les ressources du Cloud en fonction des exigences de leurs applications, sans toutefois devenir partie intégrante de l’infrastructure du Cloud. Cela leur permet d’atteindre un degré optimal d’utilisation des ressources tout en réduisant leurs coûts d’investissement en TI. Toutefois, la migration des services au Cloud intensifie malgré elle les menaces existantes à la sécurité des TI et en crée de nouvelles qui sont intrinsèques à l’architecture du Cloud Computing. C’est pourquoi il existe un réel besoin d’évaluation des risques liés à la sécurité du Cloud durant le procédé de la sélection et du déploiement des services. Au cours des dernières années, l’impact d’une efficace gestion de la satisfaction des besoins en sécurité des services a été pris avec un sérieux croissant de la part des fournisseurs et des consommateurs. Toutefois, l’intégration réussie de l’élément de sécurité dans les opérations de la gestion des ressources du Cloud ne requiert pas seulement une recherche méthodique, mais aussi une modélisation méticuleuse des exigences du Cloud en termes de sécurité. C’est en considérant ces facteurs que nous adressons dans cette thèse les défis liés à l’évaluation de la sécurité et à son intégration dans les environnements indépendants et interconnectés du Cloud Computing. D’une part, nous sommes motivés à offrir aux consommateurs du Cloud un ensemble de méthodes qui leur permettront d’optimiser la sécurité de leurs services et, d’autre part, nous offrons aux fournisseurs un éventail de stratégies qui leur permettront de mieux sécuriser leurs services d’hébergements du Cloud. L’originalité de cette thèse porte sur deux aspects : 1) la description innovatrice des exigences des applications du Cloud relativement à la sécurité ; et 2) la conception de modèles mathématiques rigoureux qui intègrent le facteur de sécurité dans les problèmes traditionnels du déploiement des applications, d’approvisionnement des ressources et de la gestion de la charge de travail au coeur des infrastructures actuelles du Cloud Computing. Le travail au sein de cette thèse est réalisé en trois phases.----------ABSTRACT: Over the past decade, the Cloud Computing paradigm has revolutionized the way we envision IT services. It has provided an opportunity to respond to the ever increasing computing needs of the users by introducing the notion of service and data outsourcing. Cloud consumers usually have online and on-demand access to a large and distributed IT infrastructure providing a plethora of services. They can dynamically configure and scale the Cloud resources according to the requirements of their applications without becoming part of the Cloud infrastructure, which allows them to reduce their IT investment cost and achieve optimal resource utilization. However, the migration of services to the Cloud increases the vulnerability to existing IT security threats and creates new ones that are intrinsic to the Cloud Computing architecture, thus the need for a thorough assessment of Cloud security risks during the process of service selection and deployment. Recently, the impact of effective management of service security satisfaction has been taken with greater seriousness by the Cloud Service Providers (CSP) and stakeholders. Nevertheless, the successful integration of the security element into the Cloud resource management operations does not only require methodical research, but also necessitates the meticulous modeling of the Cloud security requirements. To this end, we address throughout this thesis the challenges to security evaluation and integration in independent and interconnected Cloud Computing environments. We are interested in providing the Cloud consumers with a set of methods that allow them to optimize the security of their services and the CSPs with a set of strategies that enable them to provide security-aware Cloud-based service hosting. The originality of this thesis lies within two aspects: 1) the innovative description of the Cloud applications’ security requirements, which paved the way for an effective quantification and evaluation of the security of Cloud infrastructures; and 2) the design of rigorous mathematical models that integrate the security factor into the traditional problems of application deployment, resource provisioning, and workload management within current Cloud Computing infrastructures. The work in this thesis is carried out in three phases

    Simulation of a workflow execution as a real Cloud by adding noise

    Get PDF
    Cloud computing provides a cheap and elastic platform for executing large scientific workflow applications, but it rises two challenges in prediction of makespan (total execution time): performance instability of Cloud instances and variant scheduling of dynamic schedulers. Estimating the makespan is necessary for IT managers in order to calculate the cost of execution, for which they can use Cloud simulators. However, the ideal simulated environment produces the same output for the same workflow schedule and input parameters and thus can not reproduce the Cloud variant behavior. In this paper, we define a model and a methodology to add a noise to the simulation in order to equalise its behavior with the Clouds’ one. We propose several metrics to model a Cloud fluctuating behavior and then by injecting them within the simulator, it starts to behave as close as the real Cloud. Instead of using a normal distribution naively by using mean value and standard deviation of workflow tasks’ runtime, we inject two noises in the tasks’ runtime: noisiness of tasks within a workflow (defined as average runtime deviation) and noisiness provoked by the environment over the whole workflow (defined as average environmental deviation). In order to measure the quality of simulation by quantifying the relative difference between the simulated and measured values, we introduce the parameter inaccuracy. A series of experiments with different workflows and Cloud resources were conducted in order to evaluate our model and methodology. The results show that the inaccuracy of the makespan’s mean value was reduced up to 59 times compared to naively using the normal distribution. Additionally, we analyse the impact of particular workflow and Cloud parameters, which shows that the Cloud performance instability is simulated more correctly for small instance type (inaccuracy of up to 11.5%), instead of medium (inaccuracy of up to 35%), regardless of the workflow. Since our approach requires collecting data by executing the workflow in the Cloud in order to learn its behavior, we conduct a comprehensive sensitivity analysis. We determine the minimum amount of data that needs to be collected or minimum number of test cases that needs to be repeated for each experiment in order to get less than 12% inaccuracy for our noising parameter. Additionally, in order to reduce the number of experiments and determine the dependency of our model against Cloud resource and workflow parameters, the conducted comprehensive sensitivity analysis shows that the correctness of our model is independent of workflow parallel section size. With our sensitivity analysis, we show that we can reduce the inaccuracy of the naive approach with only 40% of total number of executions per experiment in the learning phase. In our case, 20 executions per experiment instead of 50, and only half of all experiments, which means down to 20%, i.e. 120 test cases instead of 600

    Cloud Workload Allocation Approaches for Quality of Service Guarantee and Cybersecurity Risk Management

    Get PDF
    It has become a dominant trend in industry to adopt cloud computing --thanks to its unique advantages in flexibility, scalability, elasticity and cost efficiency -- for providing online cloud services over the Internet using large-scale data centers. In the meantime, the relentless increase in demand for affordable and high-quality cloud-based services, for individuals and businesses, has led to tremendously high power consumption and operating expense and thus has posed pressing challenges on cloud service providers in finding efficient resource allocation policies. Allowing several services or Virtual Machines (VMs) to commonly share the cloud\u27s infrastructure enables cloud providers to optimize resource usage, power consumption, and operating expense. However, servers sharing among users and VMs causes performance degradation and results in cybersecurity risks. Consequently, how to develop efficient and effective resource management policies to make the appropriate decisions to optimize the trade-offs among resource usage, service quality, and cybersecurity loss plays a vital role in the sustainable future of cloud computing. In this dissertation, we focus on cloud workload allocation problems for resource optimization subject to Quality of Service (QoS) guarantee and cybersecurity risk constraints. To facilitate our research, we first develop a cloud computing prototype that we utilize to empirically validate the performance of different proposed cloud resource management schemes under a close to practical, but also isolated and well-controlled, environment. We then focus our research on the resource management policies for real-time cloud services with QoS guarantee. Based on queuing model with reneging, we establish and formally prove a series of fundamental principles, between service timing characteristics and their resource demands, and based on which we develop several novel resource management algorithms that statically guarantee the QoS requirements for cloud users. We then study the problem of mitigating cybersecurity risk and loss in cloud data centers via cloud resource management. We employ game theory to model the VM-to-VM interdependent cybersecurity risks in cloud clusters. We then conduct a thorough analysis based on our game-theory-based model and develop several algorithms for cybersecurity risk management. Specifically, we start our cybersecurity research from a simple case with only two types of VMs and next extend it to a more general case with an arbitrary number of VM types. Our intensive numerical and experimental results show that our proposed algorithms can significantly outperform the existing methodologies for large-scale cloud data centers in terms of resource usage, cybersecurity loss, and computational effectiveness

    A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

    Get PDF
    We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

    Advances in Grid Computing

    Get PDF
    This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

    Task scheduling for application integration: A strategy for large volumes of data

    Get PDF
    Enterprise Application Integration is the research field, which provides methodologies, techniques and tools for modelling and implementing integration processes. An integration process performs the orchestration of a set of applications to keep them synchronised or to allow the creation of new features. It can be represented by a workflow composed of tasks and communication channels. Integration platforms are tools for the design and execution of integration processes in which, the runtime system is the component responsible for execution time of the tasks and the allocation of computational resources that perform them. The processing of a large volume of data, corresponding to execution of millions of tasks, can cause situations of overload, characterised by the accumulation of tasks in internal queues awaiting computational resources in the runtime systems, resulting in unacceptable response time for the external applications and users. Our research hypothesis is that the runtime systems of the integration platforms use simplistic heuristics for scheduling tasks, which does not allow them to maintain acceptable levels of performance when there are overload situations. In this research work, we developed (i) a representation for integration processes, (ii) a characterisation for your task schedules, (iii) a heuristic to deal with situations of overload, (iv) a mathematical model for a performance metric of the execution of integration processes and (v) a simulation tool for task scheduling heuristics. Our research results indicate that, in situations of overload, our heuristic promotes a balanced workload distribution and an increase in the performance of the execution of the integration processes.Integração de Aplicações Empresariais é o campo de pesquisa, que fornece metodologias, técnicas e ferramentas para modelar e implementar processos de integração. Um processo de integração executa a orquestração de um conjunto de aplicações para mantê-las sincronizadas ou para permitir a criação de novas funcionalidades. Ele pode ser representado por um fluxo de trabalho composto por tarefas e canais de comunicação. Plataformas de integração são ferramentas para projetar e executar processos de integração, nas quais o motor de execução é o componente responsável pelo tempo de execução das tarefas e pela alocação de recursos computacionais que as executam. O processamento de um grande volume de dados, correspondendo a execução de milhões de tarefas, pode causar situações de sobrecarga, caracterizadas pelo acúmulo de tarefas em filas internas que aguardam recursos computacionais nos motores de execução, resultando em tempos de resposta inaceitáveis para aplicações e usuários externos. Nossa hipótese de pesquisa é que os motores de execução das plataformas de integração usam heurísticas simplistas para agendar tarefas, o que não lhes permitem manter níveis aceitáveis de desempenho em situações de sobrecarga. Neste trabalho de pesquisa, desenvolvemos (i) uma representação para processos de integração, (ii) uma caracterização para seus agendamentos de tarefas, (iii) uma heurística para lidar com situações de sobrecarga, (iv) um modelo matemático para uma métrica de desempenho da execução de processos de integração e (v) uma ferramenta de simulação para heurísticas de agendamento de tarefas. Nossos resultados de pesquisa indicam que, em situações de sobrecarga, nossa heurística promove uma distribuição equilibrada da carga de trabalho e um aumento no desempenho da execução dos processos de integração

    Real-Time Virtualization and Cloud Computing

    Get PDF
    In recent years, we have observed three major trends in the development of complex real-time embedded systems. First, to reduce cost and enhance flexibility, multiple systems are sharing common computing platforms via virtualization technology, instead of being deployed separately on physically isolated hosts. Second, multi-core processors are increasingly being used in real-time systems. Third, developers are exploring the possibilities of deploying real-time applications as virtual machines in a public cloud. The integration of real-time systems as virtual machines (VMs) atop common multi-core platforms in a public cloud raises significant new research challenges in meeting the real-time latency requirements of applications. In order to address the challenges of running real-time VMs in the cloud, we first present RT-Xen, a novel real-time scheduling framework within the popular Xen hypervisor. We start with single-core scheduling in RT-Xen, and present the first work that empirically studies and compares different real-time scheduling schemes on a same platform. We then introduce RT-Xen 2.0, which focuses on multi-core scheduling and spanning multiple design spaces, including priority schemes, server schemes, and scheduling policies. Experimental results demonstrate that when combined with compositional scheduling theory, RT-Xen can deliver real-time performance to an application running in a VM, while the default credit scheduler cannot. After that, we present RT-OpenStack, a cloud management system designed to support co-hosting real-time and non-real-time VMs in a cloud. RT-OpenStack studies the problem of running real-time VMs together with non-real-time VMs in a public cloud. Leveraging the resource interface and real-time scheduling provided by RT-Xen, RT-OpenStack provides real-time performance guarantees to real-time VMs, while achieving high resource utilization by allowing non-real-time VMs to share the remaining CPU resources through a novel VM-to-host mapping scheme. Finally, we present RTCA, a real-time communication architecture for VMs sharing a same host, which maintains low latency for high priority inter-domain communication (IDC) traffic in the face of low priority IDC traffic
    • …
    corecore