3,580 research outputs found

    Transparent and scalable client-side server selection using netlets

    Get PDF
    Replication of web content in the Internet has been found to improve service response time, performance and reliability offered by web services. When working with such distributed server systems, the location of servers with respect to client nodes is found to affect service response time perceived by clients in addition to server load conditions. This is due to the characteristics of the network path segments through which client requests get routed. Hence, a number of researchers have advocated making server selection decisions at the client-side of the network. In this paper, we present a transparent approach for client-side server selection in the Internet using Netlet services. Netlets are autonomous, nomadic mobile software components which persist and roam in the network independently, providing predefined network services. In this application, Netlet based services embedded with intelligence to support server selection are deployed by servers close to potential client communities to setup dynamic service decision points within the network. An anycast address is used to identify available distributed decision points in the network. Each service decision point transparently directs client requests to the best performing server based on its in-built intelligence supported by real-time measurements from probes sent by the Netlet to each server. It is shown that the resulting system provides a client-side server selection solution which is server-customisable, scalable and fault transparent

    Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

    Full text link
    Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing efficient algorithms on MapReduce framework to process and analyze big datasets is contemporary research nowadays. In this paper, we have focused on the performance of MapReduce based Apriori on homogeneous as well as on heterogeneous Hadoop cluster. We have investigated a number of factors that significantly affects the execution time of MapReduce based Apriori running on homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both algorithmic and non-algorithmic improvements. Considered factors specific to algorithmic improvements are filtered transactions and data structures. Experimental results show that how an appropriate data structure and filtered transactions technique drastically reduce the execution time. The non-algorithmic factors include speculative execution, nodes with poor performance, data locality & distribution of data blocks, and parallelism control with input split size. We have applied strategies against these factors and fine tuned the relevant parameters in our particular application. Experimental results show that if cluster specific parameters are taken care of then there is a significant reduction in execution time. Also we have discussed the issues regarding MapReduce implementation of Apriori which may significantly influence the performance.Comment: 8 pages, 8 figures, International Conference on Computing, Communication and Automation (ICCCA2016

    Two-Layer Load Balancing for Onedata System

    Get PDF
    The recent years have significantly changed the perception of web services and data storages, as clouds became a big part of IT market. New challenges appear in the field of scalable web systems, which become bigger and more complex. One of them is designing load balancing algorithms that could allow for optimal utilization of servers' resources in large, distributed systems. This paper presents an algorithm called Two-Level Load Balancing, which has been implemented and evaluated in onedata - a global data access system. A study of onedata architecture, request types and use cases has been performed to determine the requirements of load balancing set by similar, highly scalable distributed systems. The algorithm was designed to match these requirements, and it was achieved by using a synergy of DNS and internal dispatcher load balancing. Test results show that the algorithm does not introduce considerable overheads and maintains the performance of the system on high level, even in cases when its servers are not equally loaded

    Application of Linear Programming in Scheduling

    Get PDF
    Distributed computing virtually combines the scattered interconnected computing resources and satisfies the demand of compute-bound and data-hungry applications. The paper highlights various Distributed Computing Environments (DCE), scheduling techniques and need of incorporation of Dynamic Load Balancing (DLB) in scheduling. The paper also opens a new area of research by introducing Linear Programming as a scheduling technique in DCE

    DEPAS: A Decentralized Probabilistic Algorithm for Auto-Scaling

    Full text link
    The dynamic provisioning of virtualized resources offered by cloud computing infrastructures allows applications deployed in a cloud environment to automatically increase and decrease the amount of used resources. This capability is called auto-scaling and its main purpose is to automatically adjust the scale of the system that is running the application to satisfy the varying workload with minimum resource utilization. The need for auto-scaling is particularly important during workload peaks, in which applications may need to scale up to extremely large-scale systems. Both the research community and the main cloud providers have already developed auto-scaling solutions. However, most research solutions are centralized and not suitable for managing large-scale systems, moreover cloud providers' solutions are bound to the limitations of a specific provider in terms of resource prices, availability, reliability, and connectivity. In this paper we propose DEPAS, a decentralized probabilistic auto-scaling algorithm integrated into a P2P architecture that is cloud provider independent, thus allowing the auto-scaling of services over multiple cloud infrastructures at the same time. Our simulations, which are based on real service traces, show that our approach is capable of: (i) keeping the overall utilization of all the instantiated cloud resources in a target range, (ii) maintaining service response times close to the ones obtained using optimal centralized auto-scaling approaches.Comment: Submitted to Springer Computin
    corecore