51 research outputs found

    Effective and Economical Content Delivery and Storage Strategies for Cloud Systems

    Get PDF
    Cloud computing has proved to be an effective infrastructure to host various applications and provide reliable and stable services. Content delivery and storage are two main services provided by the cloud. A high-performance cloud can reduce the cost of both cloud providers and customers, while providing high application performance to cloud clients. Thus, the performance of such cloud-based services is closely related to three issues. First, when delivering contents from the cloud to users or transferring contents between cloud datacenters, it is important to reduce the payment costs and transmission time. Second, when transferring contents between cloud datacenters, it is important to reduce the payment costs to the internet service providers (ISPs). Third, when storing contents in the datacenters, it is crucial to reduce the file read latency and power consumption of the datacenters. In this dissertation, we study how to effectively deliver and store contents on the cloud, with a focus on cloud gaming and video streaming services. In particular, we aim to address three problems. i) Cost-efficient cloud computing system to support thin-client Massively Multiplayer Online Game (MMOG): how to achieve high Quality of Service (QoS) in cloud gaming and reduce the cloud bandwidth consumption; ii) Cost-efficient inter-datacenter video scheduling: how to reduce the bandwidth payment cost by fully utilizing link bandwidth when cloud providers transfer videos between datacenters; iii) Energy-efficient adaptive file replication: how to adapt to time-varying file popularities to achieve a good tradeoff between data availability and efficiency, as well as reduce the power consumption of the datacenters. In this dissertation, we propose methods to solve each of aforementioned challenges on the cloud. As a result, we build a cloud system that has a cost-efficient system to support cloud clients, an inter-datacenter video scheduling algorithm for video transmission on the cloud and an adaptive file replication algorithm for cloud storage system. As a result, the cloud system not only benefits the cloud providers in reducing the cloud cost, but also benefits the cloud customers in reducing their payment cost and improving high cloud application performance (i.e., user experience). Finally, we conducted extensive experiments on many testbeds, including PeerSim, PlanetLab, EC2 and a real-world cluster, which demonstrate the efficiency and effectiveness of our proposed methods. In our future work, we will further study how to further improve user experience in receiving contents and reduce the cost due to content transfer

    An auto-scaling framework for analyzing big data in the cloud environment

    Get PDF
    Processing big data on traditional computing infrastructure is a challenge as the volume of data is large and thus high computational complexity. Recently, Apache Hadoop has emerged as a distributed computing infrastructure to deal with big data. Adopting Hadoop to dynamically adjust its computing resources based on real-time workload is itself a demanding task, thus conventionally a pre-configuration with adequate resources to compute the peak data load is set up. However, this may cause a considerable wastage of computing resources when the usage levels are much lower than the preset load. In consideration of this, this paper investigates an auto-scaling framework on cloud environment aiming to minimise the cost of resource use by automatically adjusting the virtual nodes depending on the real-time data load. A cost-effective auto-scaling (CEAS) framework is first proposed for an Amazon Web Services (AWS) Cloud environment. The proposed CEAS framework allows us to scale the computing resources of Hadoop cluster so as to either reduce the computing resource use when the workload is low or scale-up the computing resources to speed up the data processing and analysis within an adequate time. To validate the effectiveness of the proposed framework, a case study with real-time sentiment analysis on the universities’ tweets is provided to analyse the reviews/tweets of the people posted on social media. Such a dynamic scaling method offers a reference to improving the Twitter data analysis in a more cost-effective and flexible way

    The State-of-the-Art in Air Pollution Monitoring and Forecasting Systems using IoT, Big Data, and Machine Learning

    Full text link
    The quality of air is closely linked with the life quality of humans, plantations, and wildlife. It needs to be monitored and preserved continuously. Transportations, industries, construction sites, generators, fireworks, and waste burning have a major percentage in degrading the air quality. These sources are required to be used in a safe and controlled manner. Using traditional laboratory analysis or installing bulk and expensive models every few miles is no longer efficient. Smart devices are needed for collecting and analyzing air data. The quality of air depends on various factors, including location, traffic, and time. Recent researches are using machine learning algorithms, big data technologies, and the Internet of Things to propose a stable and efficient model for the stated purpose. This review paper focuses on studying and compiling recent research in this field and emphasizes the Data sources, Monitoring, and Forecasting models. The main objective of this paper is to provide the astuteness of the researches happening to improve the various aspects of air polluting models. Further, it casts light on the various research issues and challenges also.Comment: 30 pages, 11 figures, Wireless Personal Communications. Wireless Pers Commun (2023

    An auto-scaling framework for analyzing big data in the cloud environment

    Get PDF
    Processing big data on traditional computing infrastructure is a challenge as the volume of data is large and thus high computational complexity. Recently, Apache Hadoop has emerged as a distributed computing infrastructure to deal with big data. Adopting Hadoop to dynamically adjust its computing resources based on real-time workload is itself a demanding task, thus conventionally a pre-configuration with adequate resources to compute the peak data load is set up. However, this may cause a considerable wastage of computing resources when the usage levels are much lower than the preset load. In consideration of this, this paper investigates an auto-scaling framework on cloud environment aiming to minimise the cost of resource use by automatically adjusting the virtual nodes depending on the real-time data load. A cost-effective auto-scaling (CEAS) framework is first proposed for an Amazon Web Services (AWS) Cloud environment. The proposed CEAS framework allows us to scale the computing resources of Hadoop cluster so as to either reduce the computing resource use when the workload is low or scale-up the computing resources to speed up the data processing and analysis within an adequate time. To validate the effectiveness of the proposed framework, a case study with real-time sentiment analysis on the universities’ tweets is provided to analyse the reviews/tweets of the people posted on social media. Such a dynamic scaling method offers a reference to improving the Twitter data analysis in a more cost-effective and flexible way

    RAIDX: RAID EXTENDED FOR HETEROGENEOUS ARRAYS

    Get PDF
    The computer hard drive market has diversified with the establishment of solid state disks (SSDs) as an alternative to magnetic hard disks (HDDs). Each hard drive technology has its advantages: the SSDs are faster than HDDs but the HDDs are cheaper. Our goal is to construct a parallel storage system with HDDs and SSDs such that the parallel system is as fast as the SSDs. Achieving this goal is challenging since the slow HDDs store more data and become bottlenecks, while the SSDs remain idle. RAIDX is a parallel storage system designed for disks of different speeds, capacities and technologies. The RAIDX hardware consists of an array of disks; the RAIDX software consists of data structures and algorithms that allow the disks to be viewed as a single storage unit that has capacity equal to the sum of the capacities of its disks, failure rate lower than the failure rate of its individual disks, and speeds close to that of its faster disks. RAIDX achieves its performance goals with the aid of its novel parallel data organization technique that allows storage data to be moved on the fly without impacting the upper level file system. We show that storage data accesses satisfy the locality of reference principle, whereby only a small fraction of storage data are accessed frequently. RAIDX has a monitoring program that identifies frequently accessed blocks and a migration program that moves frequently accessed blocks to faster disks. The faster disks are caches that store the solo copy of frequently accessed data. Experimental evaluation has shown that a HDD+SSD RAIDX array is as fast as an all-SSD array when the workload shows locality of reference

    RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

    Full text link
    RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

    Inferring latent user attributes in streams on multimodal social data using spark

    Get PDF
    The principal goal of this work can be expressed in two simple words Apache Spark; basically this framework help a developer to deal with big data. Our scope is to understand how Spark operates and use it to deal with Big Data using APIs offered for implement different classifier

    System Architecture and Web Development for Healthcare Big Data Driven Application

    Get PDF
    With the current increase in the volume, variety, and complexity of data, Big Data is increasingly becoming a needed paradigm in any sector of activity. Given these facts, the need for computer systems capable of responding to these same data, especially in processing, storage, and presentation is increasing. All of these points are fundamental so that it becomes to work with the data in a way that it is possible to extract value and knowledge from it; whether to intensify productivity on an assembly line, increase a business’ revenue, or improve the quality of life of a given population. The question then arises ofhowwe can developsuchcomputersystems in the context ofBigDataappliedtothehealthcaresector. Torespondtothechallengesimposedbythis scenario, it is necessary to integrate multiple data sources, process and present them to the end-user in an understandable and timely manner so that their use is viable. As a solution proposal, a system architecture based on microservices is presented, in which the presentation of data uses the latest Web development tools. Such an architecture uses a Cloud infrastructure to take advantage of the inherent advantages, such as scalability, security, and flexibility. From the analysis of data from different sources, with various clinical practices which add volume on which to infer, it is expected that advanced data processing techniques will support the development of new treatment methodologies, support current methods, or even create fertile ground for the creation of practices that could improve the quality of oncological patients.Com o aumento no volume, variedade e complexidade dos dados, cada vez mais se caminha para um paradigma de Big data em todo e qualquer sector de ativade. Perante tais factos, surge cada vez mais a necessidade de existirem sistemas informáticos capazes de dar resposta às necessidades impostas por estes mesmos dados, sobretudo em aspetos como o seu processamento, armazenamento e apresentação. Todos estes pontos são fundamentais para que seja possível trabalhar os dados de forma a que se possa extrair valor e conhecimento destes, seja com vista à intensificação da produtividade numa linha de montagem, aumento das receitas de um negócio ou melhorar a qualidade de vida de uma determinada população. Surge então a questão de como podemos desenvolver tais sistemas informáticos num contexto de Big data aplicado ao sector da saúde. Tendo em conta a forma de responder aos desafios impostos, é necessária a integração de múltiplas fontes de dados assim como a capacidade de as tratar e apresentar ao utilizador final de forma compreensível e em tempo útil, para que a sua utilização seja viável. Como proposta de solução, é proposta uma arquitetura de sistema baseada em microsserviços em que a apresentação dos dados recorre às mais recentes ferramentas de desenvolvimento Web. Tal arquitetura serve-se da infraestrutura Cloud por forma a tirar partido das vantagens inerentes à mesma, tais como escalabilidade, segurança e flexibilidade. Com a análise e integração de diferentes fontes de dados e recorrendo a técnicas avançadas de processamento de dados,é esperado que sejam oferecidas novas perspetivas que possam apoiar o desenvolvimento de novos métodos de tratamento, adoptar aqueles que já existam e criar solo fértil para a criação de novas práticas, em que o objetivo passa por melhorar a qualidade de vida de pacientes oncológicos
    • …
    corecore