55,686 research outputs found

    Optimal Principal Component Analysis in Distributed and Streaming Models

    Full text link
    We study the Principal Component Analysis (PCA) problem in the distributed and streaming models of computation. Given a matrix A∈Rm×n,A \in R^{m \times n}, a rank parameter k<rank(A)k < rank(A), and an accuracy parameter 0<ϵ<10 < \epsilon < 1, we want to output an m×km \times k orthonormal matrix UU for which ∣∣A−UUTA∣∣F2≤(1+ϵ)⋅∣∣A−Ak∣∣F2, || A - U U^T A ||_F^2 \le \left(1 + \epsilon \right) \cdot || A - A_k||_F^2, where Ak∈Rm×nA_k \in R^{m \times n} is the best rank-kk approximation to AA. This paper provides improved algorithms for distributed PCA and streaming PCA.Comment: STOC2016 full versio

    Making recommendations bandwidth aware

    Full text link
    This paper asks how much we can gain in terms of bandwidth and user satisfaction, if recommender systems became bandwidth aware and took into account not only the user preferences, but also the fact that they may need to serve these users under bandwidth constraints, as is the case over wireless networks. We formulate this as a new problem in the context of index coding: we relax the index coding requirements to capture scenarios where each client has preferences associated with messages. The client is satisfied to receive any message she does not already have, with a satisfaction proportional to her preference for that message. We consistently find, over a number of scenarios we sample, that although the optimization problems are in general NP-hard, significant bandwidth savings are possible even when restricted to polynomial time algorithms

    Large-scale Join-Idle-Queue system with general service times

    Get PDF
    A parallel server system with nn identical servers is considered. The service time distribution has a finite mean 1/μ1/\mu, but otherwise is arbitrary. Arriving customers are be routed to one of the servers immediately upon arrival. Join-Idle-Queue routing algorithm is studied, under which an arriving customer is sent to an idle server, if such is available, and to a randomly uniformly chosen server, otherwise. We consider the asymptotic regime where n→∞n\to\infty and the customer input flow rate is λn\lambda n. Under the condition λ/μ<1/2\lambda/\mu<1/2, we prove that, as n→∞n\to\infty, the sequence of (appropriately scaled) stationary distributions concentrates at the natural equilibrium point, with the fraction of occupied servers being constant equal λ/μ\lambda/\mu. In particular, this implies that the steady-state probability of an arriving customer waiting for service vanishes.Comment: Revision. 11 page

    Online Distributed Sensor Selection

    Full text link
    A key problem in sensor networks is to decide which sensors to query when, in order to obtain the most useful information (e.g., for performing accurate prediction), subject to constraints (e.g., on power and bandwidth). In many applications the utility function is not known a priori, must be learned from data, and can even change over time. Furthermore for large sensor networks solving a centralized optimization problem to select sensors is not feasible, and thus we seek a fully distributed solution. In this paper, we present Distributed Online Greedy (DOG), an efficient, distributed algorithm for repeatedly selecting sensors online, only receiving feedback about the utility of the selected sensors. We prove very strong theoretical no-regret guarantees that apply whenever the (unknown) utility function satisfies a natural diminishing returns property called submodularity. Our algorithm has extremely low communication requirements, and scales well to large sensor deployments. We extend DOG to allow observation-dependent sensor selection. We empirically demonstrate the effectiveness of our algorithm on several real-world sensing tasks

    Redundancy Scheduling with Locally Stable Compatibility Graphs

    Full text link
    Redundancy scheduling is a popular concept to improve performance in parallel-server systems. In the baseline scenario any job can be handled equally well by any server, and is replicated to a fixed number of servers selected uniformly at random. Quite often however, there may be heterogeneity in job characteristics or server capabilities, and jobs can only be replicated to specific servers because of affinity relations or compatibility constraints. In order to capture such situations, we consider a scenario where jobs of various types are replicated to different subsets of servers as prescribed by a general compatibility graph. We exploit a product-form stationary distribution and weak local stability conditions to establish a state space collapse in heavy traffic. In this limiting regime, the parallel-server system with graph-based redundancy scheduling operates as a multi-class single-server system, achieving full resource pooling and exhibiting strong insensitivity to the underlying compatibility constraints.Comment: 28 pages, 4 figure

    Fundamental limits of failure identifiability by Boolean Network Tomography

    Get PDF
    Boolean network tomography is a powerful tool to infer the state (working/failed) of individual nodes from path-level measurements obtained by egde-nodes. We consider the problem of optimizing the capability of identifying network failures through the design of monitoring schemes. Finding an optimal solution is NP-hard and a large body of work has been devoted to heuristic approaches providing lower bounds. Unlike previous works, we provide upper bounds on the maximum number of identifiable nodes, given the number of monitoring paths and different constraints on the network topology, the routing scheme, and the maximum path length. The proposed upper bounds represent a fundamental limit on the identifiability of failures via Boolean network tomography. This analysis provides insights on how to design topologies and related monitoring schemes to achieve the maximum identifiability under various network settings. Through analysis and experiments we demonstrate the tightness of the bounds and efficacy of the design insights for engineered as well as real network

    Age-Optimal Updates of Multiple Information Flows

    Full text link
    In this paper, we study an age of information minimization problem, where multiple flows of update packets are sent over multiple servers to their destinations. Two online scheduling policies are proposed. When the packet generation and arrival times are synchronized across the flows, the proposed policies are shown to be (near) optimal for minimizing any time-dependent, symmetric, and non-decreasing penalty function of the ages of the flows over time in a stochastic ordering sense
    • …
    corecore