696 research outputs found

    SDN-enabled Resource Provisioning Framework for Geo-Distributed Streaming Analytics

    Get PDF
    Geographically distributed (geo-distributed) datacenters for stream data processing typically comprise multiple edges and core datacenters connected through Wide-Area Network (WAN) with a master node responsible for allocating tasks to worker nodes. Since WAN links significantly impact the performance of distributed task execution, the existing task assignment approach is unsuitable for distributed stream data processing with low latency and high throughput demand. In this paper, we propose SAFA, a resource provisioning framework using the Software-Defined Networking (SDN) concept with an SDN controller responsible for monitoring the WAN, selecting an appropriate subset of worker nodes, and assigning tasks to the designated worker nodes. We implemented the data plane of the framework in P4 and the control plane components in Python. We tested the performance of the proposed system on Apache Spark, Apache Storm, and Apache Flink using the Yahoo! streaming benchmark on a set of custom topologies. The results of the experiments validate that the proposed approach is viable for distributed stream processing and confirm that it can improve at least 1.64× the processing time of incoming events of the current stream processing systems.</p

    A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing

    Full text link
    Compared to traditional distributed computing environments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different datacenters from the cloud computing environment, resulting in serious data transmission delays. Edge computing reduces the data transmission delays and supports the fixed storing manner for scientific workflow private datasets, but there is a bottleneck in its storage capacity. It is a challenge to combine the advantages of both edge computing and cloud computing to rationalize the data placement of scientific workflow, and optimize the data transmission time across different datacenters. Traditional data placement strategies maintain load balancing with a given number of datacenters, which results in a large data transmission time. In this study, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow. This approach considered the characteristics of data placement combining edge computing and cloud computing. In addition, it considered the impact factors impacting transmission delay, such as the band-width between datacenters, the number of edge datacenters, and the storage capacity of edge datacenters. The crossover operator and mutation operator of the genetic algorithm were adopted to avoid the premature convergence of the traditional particle swarm optimization algorithm, which enhanced the diversity of population evolution and effectively reduced the data transmission time. The experimental results show that the data placement strategy based on GA-DPSO can effectively reduce the data transmission time during workflow execution combining edge computing and cloud computing

    Optimized Contract-based Model for Resource Allocation in Federated Geo-distributed Clouds

    Get PDF
    In the era of Big Data, with data growing massively in scale and velocity, cloud computing and its pay-as-you-go modelcontinues to provide significant cost benefits and a seamless service delivery model for cloud consumers. The evolution of small-scaleand large-scale geo-distributed datacenters operated and managed by individual Cloud Service Providers (CSPs) raises newchallenges in terms of effective global resource sharing and management of autonomously-controlled individual datacenter resourcestowards a globally efficient resource allocation model. Earlier solutions for geo-distributed clouds have focused primarily on achievingglobal efficiency in resource sharing, that although tries to maximize the global resource allocation, results in significant inefficiencies inlocal resource allocation for individual datacenters and individual cloud provi ders leading to unfairness in their revenue and profitearned. In this paper, we propose a new contracts-based resource sharing model for federated geo-distributed clouds that allows CSPsto establish resource sharing contracts with individual datacentersapriorifor defined time intervals during a 24 hour time period. Based on the established contracts, individual CSPs employ a contracts cost and duration aware job scheduling and provisioning algorithm that enables jobs to complete and meet their response time requirements while achieving both global resource allocation efficiency and local fairness in the profit earned. The proposed techniques are evaluated through extensive experiments using realistic workloads generated using the SHARCNET cluster trace. The experiments demonstrate the effectiveness, scalability and resource sharing fairness of the proposed model

    Low Latency Geo-distributed Data Analytics

    Full text link
    Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines
    • …
    corecore