8 research outputs found

    Interval Selection in the Streaming Model

    Full text link
    A set of intervals is independent when the intervals are pairwise disjoint. In the interval selection problem we are given a set I\mathbb{I} of intervals and we want to find an independent subset of intervals of largest cardinality. Let α(I)\alpha(\mathbb{I}) denote the cardinality of an optimal solution. We discuss the estimation of α(I)\alpha(\mathbb{I}) in the streaming model, where we only have one-time, sequential access to the input intervals, the endpoints of the intervals lie in {1,...,n}\{1,...,n \}, and the amount of the memory is constrained. For intervals of different sizes, we provide an algorithm in the data stream model that computes an estimate α^\hat\alpha of α(I)\alpha(\mathbb{I}) that, with probability at least 2/32/3, satisfies 12(1ε)α(I)α^α(I)\tfrac 12(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I}). For same-length intervals, we provide another algorithm in the data stream model that computes an estimate α^\hat\alpha of α(I)\alpha(\mathbb{I}) that, with probability at least 2/32/3, satisfies 23(1ε)α(I)α^α(I)\tfrac 23(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I}). The space used by our algorithms is bounded by a polynomial in ε1\varepsilon^{-1} and logn\log n. We also show that no better estimations can be achieved using o(n)o(n) bits of storage. We also develop new, approximate solutions to the interval selection problem, where we want to report a feasible solution, that use O(α(I))O(\alpha(\mathbb{I})) space. Our algorithms for the interval selection problem match the optimal results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval Selection, ICALP 2012], but are much simpler.Comment: Minor correction

    Recognizing End-User Transactions in Performance Management

    No full text
    Providing good quality of service (e.g., low response times) in distributed computer systems requires measuring end-user perceptions of performance. Unfortunately, in practice such measures are often expensive or impossible to obtain. Herein, we propose a machine learning approach to recognizing end-user transactions consisting of sequences of remote procedure calls (RPCs) received at a server. Two problems are addressed. The first is labeling previously segmented transaction instances with the correct transaction type. This is akin to work done in document classification. The second problem is segmenting RPC sequences into transaction instances. This is a more difficult problem, but it is similar to segmenting sounds into words as in speech understanding. Using Naive Bayes, we tackle the labeling problem with four combinations of feature vectors and probability distributions: RPC occurrences with the Bernoulli distribution and RPC counts with the multinomial, geometric, and shifted ge..

    Estimating Statistical Aggregates on Probabilistic Data Streams

    No full text
    The probabilistic stream model was introduced by Jayram, Kale, and Vee [2007]. It is a generalization of the data stream model that is suited to handling “probabilistic ” data, where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over a potentially exponential number of classical “deterministic ” streams where each item is deterministically one of the domain values. We present algorithms for computing commonly used aggregates on a probabilistic stream. We present the first one pass streaming algorithms for estimating the expected mean of a probabilistic stream. Next, we consider the problem of estimating frequency moments for probabilistic data. We propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, we extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. We present the first known streaming algorithms for estimating F0, the number of distinct items on probabilistic streams. Our work also gives an efficient one-pass algorithm for estimating the median and a two-pass algorithm for estimating the range

    Improved linear embeddings via Lagrange duality

    No full text
    Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create a dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.by Kshiteej Sheth, Dinesh Garg and Anirban Dasgupt
    corecore