2,196 research outputs found

    Low Latency Geo-distributed Data Analytics

    Full text link
    Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines
    • …
    corecore