58,424 research outputs found

    LHView: Location Aware Hybrid Partial View

    Get PDF
    The rise of the Cloud creates enormous business opportunities for companies to provide global services, which requires applications supporting the operation of those services to scale while minimizing maintenance costs, either due to unnecessary allocation of resources or due to excessive human supervision and administration. Solutions designed to support such systems have tackled fundamental challenges from individual component failure to transient network partitions. A fundamental aspect that all scalable large systems have to deal with is the membership of the system, i.e, tracking the active components that compose the system. Most systems rely on membership management protocols that operate at the application level, many times exposing the interface of a logical overlay network, that should guarantee high scalability, efficiency, and robustness. Although these protocols are capable of repairing the overlay in face of large numbers of individual components faults, when scaling to global settings (i.e, geo-distributed scenarios), this robustness is a double edged-sword because it is extremely complex for a node in a system to distinguish between a set of simultaneously node failures and a (transient) network partition. Thus the occurrence of a network partition creates isolated sub-sets of nodes incapable of reconnecting even after the recovery from the partition. This work address this challenges by proposing a novel datacenter-aware membership protocol to tolerate network partitions by applying existing overlay management techniques and classification techniques that may allow the system to efficiently cope with such events without compromising the remaining properties of the overlay network. Furthermore, we strive to achieve these goals with a solution that requires minimal human intervention

    Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

    Full text link
    The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
    • …
    corecore