112 research outputs found

    Online Row Sampling

    Get PDF
    Finding a small spectral approximation for a tall n×dn \times d matrix AA is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of AA. Row sampling improves interpretability, saves space when AA is sparse, and preserves row structure, which is especially important, for example, when AA represents a graph. However, correctly sampling rows from AA can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of AA one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates AA up to multiplicative error ϵ\epsilon and additive error δ\delta using O(dlogdlog(ϵA2/δ)/ϵ2)O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2) online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses O(d2O(d^2) memory but only requires O(dlog(ϵA2/δ)/ϵ2)O(d\log(\epsilon||A||_2/\delta)/\epsilon^2) samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation

    Emisariusz „Kultury” in spe. Korespondencja Leopolda Tyrmanda i Jerzego Giedroycia

    Get PDF
    Among the vast collection of archival materials located in Maisons-Laffitte there is an interesting set of letters from Leopold Tyrmand. It is supplemented by the Hoover Institute in Palo Alto at Stanford University. These letters document the development and dramatic end of Jerzy Giedroyc’s friendship with Leopold Tyrmand. There are many indications that they had extremely different ideas about the topics that theoretically were supposed to bring them together: emigration, attitude towards the communist authorities, literary issues. The reality turned out to be more complicated, and neither of them wanted to give up their convictions or compromise. Intense in number and spontaneous in content, the correspondence is also an excellent portrait of an era in which epistolary art, in addition to its utilitarian function, also had an aesthetic value.Wśród obszernego zbioru archiwaliów znajdujących się w Maisons-Laffitte znajdziemy ciekawy zespół listów od Leopolda Tyrmanda. Jego dopełnienie przechowuje Instytut Hoovera w Palo Alto przy Uniwersytecie Stanforda. Listy te dokumentują rozwój i dramatyczne zakończenie znajomości Jerzego Giedroycia z Leopoldem Tyrmandem. Wiele wskazuje na to, że mieli oni skrajnie różne wyobrażenia na tematy, które teoretycznie miały ich zbliżyć: emigracja, stosunek do władz PRL, kwestie literackie. Rzeczywistość okazała się bardziej skomplikowana, a żaden z nich nie chciał zrezygnować ze swoich przekonań czy pójść na kompromis. Intensywna w liczbie i żywiołowa w treści korespondencja jest także znakomitym światkiem epoki, w której sztuka epistolarna, oprócz funkcji użytkowej, miała także walor estetyczny

    Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams

    Full text link
    In the communication problem UR\mathbf{UR} (universal relation) [KRW95], Alice and Bob respectively receive x,y{0,1}nx, y \in\{0,1\}^n with the promise that xyx\neq y. The last player to receive a message must output an index ii such that xiyix_i\neq y_i. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly Θ(min{n,log(1/δ)log2(nlog(1/δ))})\Theta(\min\{n,\log(1/\delta)\log^2(\frac n{\log(1/\delta)})\}) for failure probability δ\delta. Our lower bound holds even if promised support(y)support(x)\mathop{support}(y)\subset \mathop{support}(x). As a corollary, we obtain optimal lower bounds for p\ell_p-sampling in strict turnstile streams for 0p<20\le p < 2, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if promised x{0,1}nx\in\{0,1\}^n at all points in the stream. We give two different proofs of our main result. The first proof demonstrates that any algorithm A\mathcal A solving sampling problems in turnstile streams in low memory can be used to encode subsets of [n][n] of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to A\mathcal A throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with A\mathcal A, which is loosely motivated by techniques in differential privacy. Our second proof is via a novel randomized reduction from Augmented Indexing [MNSW98] which needs to interact with A\mathcal A adaptively. To handle the adaptivity we identify certain likely interaction patterns and union bound over them to guarantee correct interaction on all of them. To guarantee correctness, it is important that the interaction hides some of its randomness from A\mathcal A in the reduction.Comment: merge of arXiv:1703.08139 and of work of Kapralov, Woodruff, and Yahyazade

    PCN28 SHOULD FOTEMUSTINE BE USED AS THE FIRST LINE TREATMENT

    Get PDF
    corecore