367 research outputs found
Distributed Computing in a Cloud of Mobile Phones
For the past few years we have seen an exponential growth in the number of mobile
devices and in their computation, storage and communication capabilities. We also have seen an increase in the amount of data generated by mobile devices while performing common tasks. Additionally, the ubiquity associated with these mobile devices, makes it more reasonable to start thinking in a different use for them, where they will begin to act as an important part in the computation of more demanding applications, rather than relying exclusively on external servers to do it.
It is also possible to observe an increase in the number of resource demanding applications, whereas these resort to the use of services, offered by infrastructure Clouds.
However, with the use of these Cloud services, many problems arise, such as: the considerable use of energy and bandwidth, high latency values, unavailability of connectivity infrastructures, due to the congestion or the non existence of it. Considering all the above, for some applications it starts to make sense to do part or all the computation locally in the mobile devices themselves.
We propose a distributed computing framework, able to process a batch or a stream
of data, which is being generated by a cloud composed of mobile devices, that does
not require Internet services. Differently from the current state-of-the-art, where both
computation and data are offloaded to mobile devices, our system intends to move the
computation to where the data is, reducing significantly the amount of data exchanged
between mobile devices.
Based on the evaluation performed, both on a real and simulated environment, our
framework has proved to support scalability, by benefiting significantly from the usage
of several devices to handle computation, and by supporting multiple devices submitting computation requests while not having a significant increase in the latency of a request. It also proved that is able to deal with churn without being highly penalized by it
MapReduce analysis for cloud-archived data
Public storage clouds have become a popular choice for archiving certain classes of enterprise data - for example, application and infrastructure logs. These logs contain sensitive information like IP addresses or user logins due to which regulatory and security requirements often require data to be encrypted before moved to the cloud. In order to leverage such data for any business value, analytics systems (e.g. Hadoop/MapReduce) first download data from these public clouds, decrypt it and then process it at the secure enterprise site. We propose VNCache: an efficient solution for MapReduceanalysis of such cloud-archived log data without requiring an apriori data transfer and loading into the local Hadoop cluster. VNcache dynamically integrates cloud-archived data into a virtual namespace at the enterprise Hadoop cluster. Through a seamless data streaming and prefetching model, Hadoop jobs can begin execution as soon as they are launched without requiring any apriori downloading. With VNcache's accurate pre-fetching and caching, jobs often run on a local cached copy of the data block significantly improving performance. When no longer needed, data is safely evicted from the enterprise cluster reducing the total storage footprint. Uniquely, VNcache is implemented with NO changes to the Hadoop application stack. © 2014 IEEE
How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface
Scientific workflow management systems (SWMSs) and resource managers together
ensure that tasks are scheduled on provisioned resources so that all
dependencies are obeyed, and some optimization goal, such as makespan
minimization, is fulfilled. In practice, however, there is no clear separation
of scheduling responsibilities between an SWMS and a resource manager because
there exists no agreed-upon separation of concerns between their different
components. This has two consequences. First, the lack of a standardized API to
exchange scheduling information between SWMSs and resource managers hinders
portability. It incurs costly adaptations when a component should be replaced
by another one (e.g., an SWMS with another SWMS on the same resource manager).
Second, due to overlapping functionalities, current installations often
actually have two schedulers, both making partial scheduling decisions under
incomplete information, leading to suboptimal workflow scheduling.
In this paper, we propose a simple REST interface between SWMSs and resource
managers, which allows any SWMS to pass dynamic workflow information to a
resource manager, enabling maximally informed scheduling decisions. We provide
an exemplary implementation of this API for Nextflow as an SWMS and Kubernetes
as a resource manager. Our experiments with nine real-world workflows show that
this strategy reduces makespan by up to 25.1% and 10.8% on average compared to
the standard Nextflow/Kubernetes configuration. Furthermore, a more widespread
implementation of this API would enable leaner code bases, a simpler exchange
of components of workflow systems, and a unified place to implement new
scheduling algorithms.Comment: Paper accepted in: 2023 23rd IEEE International Symposium on Cluster,
Cloud and Internet Computing (CCGrid
- …