367 research outputs found

    Profit-oriented task scheduling algorithm in Hadoop cluster

    Get PDF

    Distributed Computing in a Cloud of Mobile Phones

    Get PDF
    For the past few years we have seen an exponential growth in the number of mobile devices and in their computation, storage and communication capabilities. We also have seen an increase in the amount of data generated by mobile devices while performing common tasks. Additionally, the ubiquity associated with these mobile devices, makes it more reasonable to start thinking in a different use for them, where they will begin to act as an important part in the computation of more demanding applications, rather than relying exclusively on external servers to do it. It is also possible to observe an increase in the number of resource demanding applications, whereas these resort to the use of services, offered by infrastructure Clouds. However, with the use of these Cloud services, many problems arise, such as: the considerable use of energy and bandwidth, high latency values, unavailability of connectivity infrastructures, due to the congestion or the non existence of it. Considering all the above, for some applications it starts to make sense to do part or all the computation locally in the mobile devices themselves. We propose a distributed computing framework, able to process a batch or a stream of data, which is being generated by a cloud composed of mobile devices, that does not require Internet services. Differently from the current state-of-the-art, where both computation and data are offloaded to mobile devices, our system intends to move the computation to where the data is, reducing significantly the amount of data exchanged between mobile devices. Based on the evaluation performed, both on a real and simulated environment, our framework has proved to support scalability, by benefiting significantly from the usage of several devices to handle computation, and by supporting multiple devices submitting computation requests while not having a significant increase in the latency of a request. It also proved that is able to deal with churn without being highly penalized by it

    MapReduce analysis for cloud-archived data

    Get PDF
    Public storage clouds have become a popular choice for archiving certain classes of enterprise data - for example, application and infrastructure logs. These logs contain sensitive information like IP addresses or user logins due to which regulatory and security requirements often require data to be encrypted before moved to the cloud. In order to leverage such data for any business value, analytics systems (e.g. Hadoop/MapReduce) first download data from these public clouds, decrypt it and then process it at the secure enterprise site. We propose VNCache: an efficient solution for MapReduceanalysis of such cloud-archived log data without requiring an apriori data transfer and loading into the local Hadoop cluster. VNcache dynamically integrates cloud-archived data into a virtual namespace at the enterprise Hadoop cluster. Through a seamless data streaming and prefetching model, Hadoop jobs can begin execution as soon as they are launched without requiring any apriori downloading. With VNcache's accurate pre-fetching and caching, jobs often run on a local cached copy of the data block significantly improving performance. When no longer needed, data is safely evicted from the enterprise cluster reducing the total storage footprint. Uniquely, VNcache is implemented with NO changes to the Hadoop application stack. © 2014 IEEE

    How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface

    Full text link
    Scientific workflow management systems (SWMSs) and resource managers together ensure that tasks are scheduled on provisioned resources so that all dependencies are obeyed, and some optimization goal, such as makespan minimization, is fulfilled. In practice, however, there is no clear separation of scheduling responsibilities between an SWMS and a resource manager because there exists no agreed-upon separation of concerns between their different components. This has two consequences. First, the lack of a standardized API to exchange scheduling information between SWMSs and resource managers hinders portability. It incurs costly adaptations when a component should be replaced by another one (e.g., an SWMS with another SWMS on the same resource manager). Second, due to overlapping functionalities, current installations often actually have two schedulers, both making partial scheduling decisions under incomplete information, leading to suboptimal workflow scheduling. In this paper, we propose a simple REST interface between SWMSs and resource managers, which allows any SWMS to pass dynamic workflow information to a resource manager, enabling maximally informed scheduling decisions. We provide an exemplary implementation of this API for Nextflow as an SWMS and Kubernetes as a resource manager. Our experiments with nine real-world workflows show that this strategy reduces makespan by up to 25.1% and 10.8% on average compared to the standard Nextflow/Kubernetes configuration. Furthermore, a more widespread implementation of this API would enable leaner code bases, a simpler exchange of components of workflow systems, and a unified place to implement new scheduling algorithms.Comment: Paper accepted in: 2023 23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid

    Privacy-preserving Platforms for Computation on Hybrid Clouds

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore