1,533 research outputs found
Proactive cloud management for highly heterogeneous multi-cloud infrastructures
Various literature studies demonstrated that the cloud computing paradigm can help to improve availability and performance of applications subject to the problem of software anomalies. Indeed, the cloud resource provisioning model enables users to rapidly access new processing resources, even distributed over different geographical regions, that can be promptly used in the case of, e.g., crashes or hangs of running machines, as well as to balance the load in the case of overloaded machines. Nevertheless, managing a complex geographically-distributed cloud deploy could be a complex and time-consuming task. Autonomic Cloud Manager (ACM) Framework is an autonomic framework for supporting proactive management of applications deployed over multiple cloud regions. It uses machine learning models to predict failures of virtual machines and to proactively redirect the load to healthy machines/cloud regions. In this paper, we study different policies to perform efficient proactive load balancing across cloud regions in order to mitigate the effect of software anomalies. These policies use predictions about the mean time to failure of virtual machines. We consider the case of heterogeneous cloud regions, i.e regions with different amount of resources, and we provide an experimental assessment of these policies in the context of ACM Framework
Health data in cloud environments
The process of provisioning healthcare involves massive healthcare data which exists in different forms on disparate data sources and in different formats. Consequently, health information systems encounter interoperability problems at many levels. Integrating these disparate systems requires the support at all levels of a very expensive infrastructures. Cloud computing dramatically reduces the expense and complexity of managing IT systems. Business customers do not need to invest in their own costly IT infrastructure, but can delegate and deploy their services effectively to Cloud vendors and service providers. It is inevitable that electronic health records (EHRs) and healthcare-related services will be deployed on cloud platforms to reduce the cost and complexity of handling and integrating medical records while improving efficiency and accuracy. The paper presents a review of EHR including definitions, EHR file formats, structures leading to the discussion of interoperability and security issues. The paper also presents challenges that have to be addressed for realizing Cloudbased healthcare systems: data protection and big health data management. Finally, the paper presents an active data model for housing and protecting EHRs in a Cloud environment
Cache Serializability: Reducing Inconsistency in Edge Transactions
Read-only caches are widely used in cloud infrastructures to reduce access
latency and load on backend databases. Operators view coherent caches as
impractical at genuinely large scale and many client-facing caches are updated
in an asynchronous manner with best-effort pipelines. Existing solutions that
support cache consistency are inapplicable to this scenario since they require
a round trip to the database on every cache transaction.
Existing incoherent cache technologies are oblivious to transactional data
access, even if the backend database supports transactions. We propose T-Cache,
a novel caching policy for read-only transactions in which inconsistency is
tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache
improves cache consistency despite asynchronous and unreliable communication
between the cache and the database. We define cache-serializability, a variant
of serializability that is suitable for incoherent caches, and prove that with
unbounded resources T-Cache implements this new specification. With limited
resources, T-Cache allows the system manager to choose a trade-off between
performance and consistency.
Our evaluation shows that T-Cache detects many inconsistencies with only
nominal overhead. We use synthetic workloads to demonstrate the efficacy of
T-Cache when data accesses are clustered and its adaptive reaction to workload
changes. With workloads based on the real-world topologies, T-Cache detects
43-70% of the inconsistencies and increases the rate of consistent transactions
by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability:
Reducing Inconsistency in Edge Transactions," Distributed Computing Systems
(ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201
The future of social is personal: the potential of the personal data store
This chapter argues that technical architectures that facilitate the longitudinal, decentralised and individual-centric personal collection and curation of data will be an important, but partial, response to the pressing problem of the autonomy of the data subject, and the asymmetry of power between the subject and large scale service providers/data consumers. Towards framing the scope and role of such Personal Data Stores (PDSes), the legalistic notion of personal data is examined, and it is argued that a more inclusive, intuitive notion expresses more accurately what individuals require in order to preserve their autonomy in a data-driven world of large aggregators. Six challenges towards realising the PDS vision are set out: the requirement to store data for long periods; the difficulties of managing data for individuals; the need to reconsider the regulatory basis for third-party access to data; the need to comply with international data handling standards; the need to integrate privacy-enhancing technologies; and the need to future-proof data gathering against the evolution of social norms. The open experimental PDS platform INDX is introduced and described, as a means of beginning to address at least some of these six challenges
On the Fly Orchestration of Unikernels: Tuning and Performance Evaluation of Virtual Infrastructure Managers
Network operators are facing significant challenges meeting the demand for
more bandwidth, agile infrastructures, innovative services, while keeping costs
low. Network Functions Virtualization (NFV) and Cloud Computing are emerging as
key trends of 5G network architectures, providing flexibility, fast
instantiation times, support of Commercial Off The Shelf hardware and
significant cost savings. NFV leverages Cloud Computing principles to move the
data-plane network functions from expensive, closed and proprietary hardware to
the so-called Virtual Network Functions (VNFs). In this paper we deal with the
management of virtual computing resources (Unikernels) for the execution of
VNFs. This functionality is performed by the Virtual Infrastructure Manager
(VIM) in the NFV MANagement and Orchestration (MANO) reference architecture. We
discuss the instantiation process of virtual resources and propose a generic
reference model, starting from the analysis of three open source VIMs, namely
OpenStack, Nomad and OpenVIM. We improve the aforementioned VIMs introducing
the support for special-purpose Unikernels and aiming at reducing the duration
of the instantiation process. We evaluate some performance aspects of the VIMs,
considering both stock and tuned versions. The VIM extensions and performance
evaluation tools are available under a liberal open source licence
Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach
Many algorithms in workflow scheduling and resource provisioning rely on the
performance estimation of tasks to produce a scheduling plan. A profiler that
is capable of modeling the execution of tasks and predicting their runtime
accurately, therefore, becomes an essential part of any Workflow Management
System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS)
platforms that use clouds for deploying scientific workflows, task runtime
prediction becomes more challenging because it requires the processing of a
significant amount of data in a near real-time scenario while dealing with the
performance variability of cloud resources. Hence, relying on methods such as
profiling tasks' execution data using basic statistical description (e.g.,
mean, standard deviation) or batch offline regression techniques to estimate
the runtime may not be suitable for such environments. In this paper, we
propose an online incremental learning approach to predict the runtime of tasks
in scientific workflows in clouds. To improve the performance of the
predictions, we harness fine-grained resources monitoring data in the form of
time-series records of CPU utilization, memory usage, and I/O activities that
are reflecting the unique characteristics of a task's execution. We compare our
solution to a state-of-the-art approach that exploits the resources monitoring
data based on regression machine learning technique. From our experiments, the
proposed strategy improves the performance, in terms of the error, up to
29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM
International Conference on Utility and Cloud Computin
Attending to the Cultures of Data Science Work
This essay reflects on the shifting attention to the “social” and the “cultural” in data science communities. While recently the “social” and the “cultural” have been prioritized in data science discourse, social and cultural concerns that get raised in data science are almost always outwardly focused – applying to the communities that data scientists seek to support more so than more computationally-focused data science communities. I argue that data science communities have a responsibility to attend not only to the cultures that orient the work of domain communities, but also to the cultures that orient their own work. I describe how ethnographic frameworks such as thick description can be enlisted to encourage more reflexive data science work, and I conclude with recommendations for documenting the cultural provenance of data policy and infrastructure
- …