1,895 research outputs found

    Towards Optimizing Storage Costs on the Cloud

    Full text link
    We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the choice of compression schemes to apply, for given data partitions with temporal access predictions. Secondly, we propose a model to learn the compression performance of multiple algorithms across data partitions in different formats to generate compression performance predictions on the fly, as inputs to the optimizer. Thirdly, we propose to approach the data partitioning problem fundamentally differently than the current default in most data lakes where partitioning is in the form of ingestion batches. We propose access pattern aware data partitioning and formulate an optimization problem that optimizes the size and reading costs of partitions subject to access patterns. We study the various optimization problems theoretically as well as empirically, and provide theoretical bounds as well as hardness results. We propose a unified pipeline of cost minimization, called SCOPe that combines the different modules. We extensively compare the performance of our methods with related baselines from the literature on TPC-H data as well as enterprise datasets (ranging from GB to PB in volume) and show that SCOPe substantially improves over the baselines. We show significant cost savings compared to platform baselines, of the order of 50% to 83% on enterprise Data Lake datasets that range from terabytes to petabytes in volume.Comment: The first two authors contributed equally. 12 pages, Accepted to the International Conference on Data Engineering (ICDE) 202

    Tunable Security for Deployable Data Outsourcing

    Get PDF
    Security mechanisms like encryption negatively affect other software quality characteristics like efficiency. To cope with such trade-offs, it is preferable to build approaches that allow to tune the trade-offs after the implementation and design phase. This book introduces a methodology that can be used to build such tunable approaches. The book shows how the proposed methodology can be applied in the domains of database outsourcing, identity management, and credential management

    Privacy In Multi-Agent And Dynamical Systems

    Get PDF
    The use of private data is pivotal for numerous services including location--based ones, collaborative recommender systems, and social networks. Despite the utility these services provide, the usage of private data raises privacy concerns to their owners. Noise--injecting techniques, such as differential privacy, address these concerns by adding artificial noise such that an adversary with access to the published response cannot confidently infer the private data. Particularly, in multi--agent and dynamical environments, privacy--preserving techniques need to be expressive enough to capture time--varying privacy needs, multiple data owners, and multiple data users. Current work in differential privacy assumes that a single response gets published and a single predefined privacy guarantee is provided. This work relaxes these assumptions by providing several problem formulations and their approaches. In the setting of a social network, a data owner has different privacy needs against different users. We design a coalition--free privacy--preserving mechanism that allows a data owner to diffuse their private data over a network. We also formulate the problem of multiple data owners that provide their data to multiple data users. Also, for time--varying privacy needs, we prove that, for a class of existing privacy--preserving mechanism, it is possible to effectively relax privacy constraints gradually. Additionally, we provide a privacy--aware mechanism for time--varying private data, where we wish to protect only the current value of it. Finally, in the context of location--based services, we provide a mechanism where the strength of the privacy guarantees varies with the local population density. These contributions increase the applicability of differential privacy and set future directions for more flexible and expressive privacy guarantees

    A mobile cloud computing framework integrating multilevel encoding for performance monitoring in telerehabilitation

    Full text link
    Recent years have witnessed a surge in telerehabilitation and remote healthcare systems blessed by the emerging low-cost wearable devices to monitor biological and biokinematic aspects of human beings. Although such telerehabilitation systems utilise cloud computing features and provide automatic biofeedback and performance evaluation, there are demands for overall optimisation to enable these systems to operate with low battery consumption and low computational power and even with weak or no network connections. This paper proposes a novel multilevel data encoding scheme satisfying these requirements in mobile cloud computing applications, particularly in the field of telerehabilitation. We introduce architecture for telerehabilitation platform utilising the proposed encoding scheme integrated with various types of sensors. The platform is usable not only for patients to experience telerehabilitation services but also for therapists to acquire essential support from analysis oriented decision support system (AODSS) for more thorough analysis and making further decisions on treatment

    Optimizing Resource Management in Cloud Analytics Services

    Get PDF
    The fundamental challenge in the cloud today is how to build and optimize machine learning and data analytical services. Machine learning and data analytical platforms are changing computing infrastructure from expensive private data centers to easily accessible online services. These services pack user requests as jobs and run them on thousands of machines in parallel in geo-distributed clusters. The scale and the complexity of emerging jobs lead to increasing challenges for the clusters at all levels, from power infrastructure to system architecture and corresponding software framework design. These challenges come in many forms. Today's clusters are built on commodity hardware and hardware failures are unavoidable. Resource competition, network congestion, and mixed generations of hardware make the hardware environment complex and hard to model and predict. Such heterogeneity becomes a crucial roadblock for efficient parallelization on both the task level and job level. Another challenge comes from the increasing complexity of the applications. For example, machine learning services run jobs made up of multiple tasks with complex dependency structures. This complexity leads to difficulties in framework designs. The scale, especially when services span geo-distributed clusters, leads to another important hurdle for cluster design. Challenges also come from the power infrastructure. Power infrastructure is very expensive and accounts for more than 20% of the total costs to build a cluster. Power sharing optimization to maximize the facility utilization and smooth peak hour usages is another roadblock for cluster design. In this thesis, we focus on solutions for these challenges at the task level, on the job level, with respect to the geo-distributed data cloud design and for power management in colocation data centers. At the task level, a crucial hurdle to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers in simple workloads. We apply straggler mitigation for approximation jobs for the first time. We present GRASS, which carefully uses speculation to mitigate the impact of stragglers in approximation jobs. GRASS's design is based on the analysis of a model we develop to capture the optimal speculation levels for approximation jobs. Evaluations with production workloads from Facebook and Microsoft Bing in an EC2 cluster of 200 nodes show that GRASS increases accuracy of deadline-bound jobs by 47% and speeds up error-bound jobs by 38%. Moving from task level to job level, task level speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. Thus, we present Hopper, a job-level speculation-aware scheduler that integrates the tradeoffs associated with speculation into job scheduling decisions based on a model generalized from the task-level speculation model. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation. As computing resources move from local clusters to geo-distributed cloud services, we are expecting the same transformation for data storage. We study two crucial pieces of a geo-distributed data cloud system: data acquisition and data placement. Starting from developing the optimal algorithm for the case of a data cloud made up of a single data center, we propose a near-optimal, polynomial-time algorithm for a geo-distributed data cloud in general. We show, via a case study, that the resulting design, Datum, is near-optimal (within 1.6%) in practical settings. Efficient power management is a fundamental challenge for data centers when providing reliable services. Power oversubscription in data centers is very common and may occasionally trigger an emergency when the aggregate power demand exceeds the capacity. We study power capping solutions for handling such emergencies in a colocation data center, where the operator supplies power to multiple tenants. We propose a novel market mechanism based on supply function bidding, called COOP, to financially incentivize and coordinate tenants' power reduction for minimizing total performance loss while satisfying multiple power capping constraints. We demonstrate that COOP is "win-win", increasing the operator's profit (through oversubscription) and reducing tenants' costs (through financial compensation for their power reduction during emergencies).</p
    • …
    corecore