380,310 research outputs found

    Distributed Online Big Data Classification Using Context Information

    Full text link
    Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm

    Communication network analysis of the enterprise grid systems

    Full text link
    This paper addresses the problem of performance analysis based on communication modelling of largescale heterogeneous distributed systems with emphases on enterprise grid computing systems. The study of communication layers is important because the overall performance of a distributed system is often critically hinged on the effectiveness of this part. This model considers processor as well as network heterogeneity of target system. The model is validated through comprehensive simulation, which demonstrates that the proposed model exhibits a good degree of accuracy for various system sizes and under different working conditions. The proposed model is then used to investigate the performance analysis of typical systems.<br /

    CAMP: A Common API for Measuring Performance

    Get PDF
    Accurate performance testing of heterogeneous distributed systems, such as those created using GRID technology, requires a consistent method for retrieving system performance data from multiple platforms. This paper presents CAMP: a low-level platform independent performance data API designed for use with distributed testing frameworks. CAMP is not necessarily tied to the distributed testing task: it provides a simple, low-level interface into operating system performance data that can be used to build complex performance measurement applications. This paper discusses CAMP\u27s functionality and implementation in detail. It also contains a detailed analysis of the API\u27s correctness, performance, and overhead

    Analysis of multi-cluster computing systems with processor heterogeneity

    Full text link
    This paper addresses the problem of performance modeling of heterogeneous multi-cluster computing systems. We present an analytical model that can be employed to explore the effectiveness of different design approaches so that one can have an intelligent choice during design and evaluation of a cost effective large-scale heterogeneous distributed computing system. The proposed model considers stochastic quantities as well as processor heterogeneity of the target system. The analysis is based on a parametric fat-tree network, the m-port n-tree, and a deterministic routing algorithm. The correctness of the proposed model is validated through comprehensive simulation of different types of clusters.<br /

    Maximizing Service Reliability in Distributed Computing Systems with Random Node Failures: Theory and Implementation

    Get PDF
    In distributed computing systems (DCSs) where server nodes can fail permanently with nonzero probability, the system performance can be assessed by means of the service reliability, defined as the probability of serving all the tasks queued in the DCS before all the nodes fail. This paper presents a rigorous probabilistic framework to analytically characterize the service reliability of a DCS in the presence of communication uncertainties and stochastic topological changes due to node deletions. The framework considers a system composed of heterogeneous nodes with stochastic service and failure times and a communication network imposing random tangible delays. The framework also permits arbitrarily specified, distributed load-balancing actions to be taken by the individual nodes in order to improve the service reliability. The presented analysis is based upon a novel use of the concept of stochastic regeneration, which is exploited to derive a system of difference-differential equations characterizing the service reliability. The theory is further utilized to optimize certain load-balancing policies for maximal service reliability; the optimization is carried out by means of an algorithm that scales linearly with the number of nodes in the system. The analytical model is validated using both Monte Carlo simulations and experimental data collected from a DCS testbed

    Opportunistic Splitting Algorithms for Wireless Networks with Fairness Constraints

    Get PDF
    In wireless networks, it is well established that the throughput can be increased by opportunistically scheduling transmissions to users that have good channel conditions. Several “opportunistic” medium access control protocols have been developed, which enable distributed users to opportunistically transmit without requiring a centralized scheduler. In this paper, we consider opportunistic splitting algorithms, where a sequence of mini-slots is used to determine the appropriate user to schedule at each time. In prior work, this type of algorithm has been developed for homogeneous systems in which all users have independent and identically distributed (i.i.d.) channel statistics. Here, we specify new splitting algorithms for a heterogeneous environment that may also include fairness constraints. The performance of the splitting algorithms are characterized via analysis and simulations. In particular, we show that in certain cases, a heterogeneous algorithm will perform at least as well as the homogeneous algorithm in a system with the same total number of users
    • …
    corecore