Search CORE

8 research outputs found

Interval Selection in the Streaming Model

Author: AW Kolen
AZ Broder
BV Halldórsson
DS Hochbaum
E Kushilevitz
J Feigenbaum
M Datar
P Indyk
TS Jayram
Y Emek
Publication venue
Publication date: 04/02/2015
Field of study

A set of intervals is independent when the intervals are pairwise disjoint. In the interval selection problem we are given a set

\mathbb{I}

of intervals and we want to find an independent subset of intervals of largest cardinality. Let

\alpha(\mathbb{I})

denote the cardinality of an optimal solution. We discuss the estimation of

\alpha(\mathbb{I})

in the streaming model, where we only have one-time, sequential access to the input intervals, the endpoints of the intervals lie in

\{1,...,n \}

, and the amount of the memory is constrained. For intervals of different sizes, we provide an algorithm in the data stream model that computes an estimate

\hat\alpha

\alpha(\mathbb{I})

that, with probability at least

2/3

, satisfies

\tfrac 12(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I})

. For same-length intervals, we provide another algorithm in the data stream model that computes an estimate

\hat\alpha

\alpha(\mathbb{I})

that, with probability at least

2/3

, satisfies

\tfrac 23(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I})

. The space used by our algorithms is bounded by a polynomial in

\varepsilon^{-1}

and

\log n

. We also show that no better estimations can be achieved using

o(n)

bits of storage. We also develop new, approximate solutions to the interval selection problem, where we want to report a feasible solution, that use

O(\alpha(\mathbb{I}))

space. Our algorithms for the interval selection problem match the optimal results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval Selection, ICALP 2012], but are much simpler.Comment: Minor correction

arXiv.org e-Print Archive

Crossref

Recognizing End-User Transactions in Performance Management

Author: I. Rish
J.L. Hellerstein
Jl Hellerstein
T.S. Jayram
Ts Jayram
Publication venue: AAAI Press
Publication date
Field of study

Providing good quality of service (e.g., low response times) in distributed computer systems requires measuring end-user perceptions of performance. Unfortunately, in practice such measures are often expensive or impossible to obtain. Herein, we propose a machine learning approach to recognizing end-user transactions consisting of sequences of remote procedure calls (RPCs) received at a server. Two problems are addressed. The first is labeling previously segmented transaction instances with the correct transaction type. This is akin to work done in document classification. The second problem is segmenting RPC sequences into transaction instances. This is a more difficult problem, but it is similar to segmenting sounds into words as in speech understanding. Using Naive Bayes, we tackle the labeling problem with four combinations of feature vectors and probability distributions: RPC occurrences with the Bernoulli distribution and RPC counts with the multinomial, geometric, and shifted ge..

CiteSeerX

Estimating Statistical Aggregates on Probabilistic Data Streams

Author: Jayram TS
Mcgregor A
Muthukrishnan S
Vee E
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2007
Field of study

The probabilistic stream model was introduced by Jayram, Kale, and Vee [2007]. It is a generalization of the data stream model that is suited to handling “probabilistic ” data, where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over a potentially exponential number of classical “deterministic ” streams where each item is deterministically one of the domain values. We present algorithms for computing commonly used aggregates on a probabilistic stream. We present the first one pass streaming algorithms for estimating the expected mean of a probabilistic stream. Next, we consider the problem of estimating frequency moments for probabilistic data. We propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, we extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. We present the first known streaming algorithms for estimating F0, the number of distinct items on probabilistic streams. Our work also gives an efficient one-pass algorithm for estimating the median and a two-pass algorithm for estimating the range

CiteSeerX

Crossref

ScholarWorks@UMass Amherst

Improved linear embeddings via Lagrange duality

Author: Anirban Dasgupta
C Hegde
Dinesh Garg
Kshiteej Sheth
N Alon
S Boyd
S Bubeck
TS Jayram
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2017
Field of study

Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create a dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.by Kshiteej Sheth, Dinesh Garg and Anirban Dasgupt

arXiv.org e-Print Archive

Crossref

IIT Gandhinagar