1,157 research outputs found

    Space-Optimal Profile Estimation in Data Streams with Applications to Symmetric Functions

    Full text link
    We revisit the problem of estimating the profile (also known as the rarity) in the data stream model. Given a sequence of mm elements from a universe of size nn, its profile is a vector ϕ\phi whose ii-th entry ϕi\phi_i represents the number of distinct elements that appear in the stream exactly ii times. A classic paper by Datar and Muthukrishan from 2002 gave an algorithm which estimates any entry ϕi\phi_i up to an additive error of ±ϵD\pm \epsilon D using O(1/ϵ2(logn+logm))O(1/\epsilon^2 (\log n + \log m)) bits of space, where DD is the number of distinct elements in the stream. In this paper, we considerably improve on this result by designing an algorithm which simultaneously estimates many coordinates of the profile vector ϕ\phi up to small overall error. We give an algorithm which, with constant probability, produces an estimated profile ϕ^\hat\phi with the following guarantees in terms of space and estimation error: - For any constant τ\tau, with O(1/ϵ2+logn)O(1 / \epsilon^2 + \log n) bits of space, i=1τϕiϕ^iϵD\sum_{i=1}^\tau |\phi_i - \hat\phi_i| \leq \epsilon D. - With O(1/ϵ2log(1/ϵ)+logn+loglogm)O(1/ \epsilon^2\log (1/\epsilon) + \log n + \log \log m) bits of space, i=1mϕiϕ^iϵm\sum_{i=1}^m |\phi_i - \hat\phi_i| \leq \epsilon m. In addition to bounding the error across multiple coordinates, our space bounds separate the terms that depend on 1/ϵ1/\epsilon and those that depend on nn and mm. We prove matching lower bounds on space in both regimes. Application of our profile estimation algorithm gives estimates within error ±ϵD\pm \epsilon D of several symmetric functions of frequencies in O(1/ϵ2+logn)O(1/\epsilon^2 + \log n) bits. This generalizes space-optimal algorithms for the distinct elements problems to other problems including estimating the Huber and Tukey losses as well as frequency cap statistics.Comment: To appear in ITCS 202

    Improved Frequency Estimation Algorithms with and without Predictions

    Full text link
    Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.Comment: NeurIPS 202

    Neither dust nor black carbon causing apparent albedo decline in Greenland\u27s dry snow zone: Implications for MODIS C5 surface reflectance

    Get PDF
    Remote sensing observations suggest Greenland ice sheet (GrIS) albedo has declined since 2001, even in the dry snow zone. We seek to explain the apparent dry snow albedo decline. We analyze samples representing 2012–2014 snowfall across NW Greenland for black carbon and dust light-absorbing impurities (LAI) and model their impacts on snow albedo. Albedo reductions due to LAI are small, averaging 0.003, with episodic enhancements resulting in reductions of 0.01–0.02. No significant increase in black carbon or dust concentrations relative to recent decades is found. Enhanced deposition of LAI is not, therefore, causing significant dry snow albedo reduction or driving melt events. Analysis of Collection 5 Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance data indicates that the decline and spectral shift in dry snow albedo contains important contributions from uncorrected Terra sensor degradation. Though discrepancies are mostly below the stated accuracy of MODIS products, they will require revisiting some prior conclusions with C6 data

    Increased Expression of M1 and M2 Phenotypic Markers in Isolated Microglia After Four-Day Binge Alcohol Exposure in Male Rats

    Get PDF
    Microglia activation and neuroinflammation are common features of neurodegenerative conditions, including alcohol use disorders (AUDs). When activated, microglia span a continuum of diverse phenotypes ranging from classically activated, pro-inflammatory (M1) microglia/macrophages to alternatively activated, growth-promoting (M2) microglia/macrophages. Identifying microglia phenotypes is critical for understanding the role of microglia in the pathogenesis of AUDs. Therefore, male rats were gavaged with 25% (w/v) ethanol or isocaloric control diet every 8 h for 4 days and sacrificed at 0, 2, 4, and 7 days after alcohol exposure (e.g., T0, T2, etc.). Microglia were isolated from hippocampus and entorhinal cortices by Percoll density gradient centrifugation. Cells were labeled with microglia surface antigens and analyzed by flow cytometry. Consistent with prior studies, isolated cells yielded a highly enriched population of brain macrophages/microglia (\u3e 95% pure), evidenced by staining for the macrophage/microglia antigen CD11b. Polarization states of CD11b+CD45low microglia were evaluated by expression of M1 surface markers, major histocompatibility complex (MHC) II, CD32, CD86, and M2 surface marker, CD206 (mannose receptor). Ethanol-treated animals begin to show increased expression of M1 and M2 markers at T0 (p = n.s.), with significant changes at the T2 time point. At T2, expression of M1 markers, MHC-II, CD86, and CD32 were increased (p \u3c 0.05) in hippocampus and entorhinal cortices, while M2 marker, CD206, was increased significantly only in entorhinal cortices (p \u3c 0.05). All effects resolved to control levels by T4. In summary, four-day binge alcohol exposure produces a transient increase in both M1 (MHC-II, CD32, and CD86) and M2 (CD206) populations of microglia isolated from the entorhinal cortex and hippocampus. Thus, these findings that both pro-inflammatory and potentially beneficial, recovery-promoting microglia phenotypes can be observed after a damaging exposure of alcohol are critically important to our understanding of the role of microglia in the pathogenesis of AUDs

    The Renormalization Group and Singular Perturbations: Multiple-Scales, Boundary Layers and Reductive Perturbation Theory

    Full text link
    Perturbative renormalization group theory is developed as a unified tool for global asymptotic analysis. With numerous examples, we illustrate its application to ordinary differential equation problems involving multiple scales, boundary layers with technically difficult asymptotic matching, and WKB analysis. In contrast to conventional methods, the renormalization group approach requires neither {\it ad hoc\/} assumptions about the structure of perturbation series nor the use of asymptotic matching. Our renormalization group approach provides approximate solutions which are practically superior to those obtained conventionally, although the latter can be reproduced, if desired, by appropriate expansion of the renormalization group approximant. We show that the renormalization group equation may be interpreted as an amplitude equation, and from this point of view develop reductive perturbation theory for partial differential equations describing spatially-extended systems near bifurcation points, deriving both amplitude equations and the center manifold.Comment: 44 pages, 2 Postscript figures, macro \uiucmac.tex available at macro archives or at ftp://gijoe.mrl.uiuc.edu/pu

    Renormalization Group Theory for Global Asymptotic Analysis

    Full text link
    We show with several examples that renormalization group (RG) theory can be used to understand singular and reductive perturbation methods in a unified fashion. Amplitude equations describing slow motion dynamics in nonequilibrium phenomena are RG equations. The renormalized perturbation approach may be simpler to use than other approaches, because it does not require the use of asymptotic matching, and yields practically superior approximations.Comment: 13 pages, plain tex + uiucmac.tex (available from babbage.sissa.it), one PostScript figure appended at end. Or (easier) get compressed postscript file by anon ftp from gijoe.mrl.uiuc.edu (128.174.119.153), file /pub/rg_sing_prl.ps.

    Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

    Full text link
    Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the 'combine' function of size polynomial or even exponential in the number of graph nodes nn, as well as feature vectors of length linear in nn. We present an improved simulation of the WL test on GNNs with \emph{exponentially} lower complexity. In particular, the neural network implementing the combine function in each node has only a polylogarithmic number of parameters in nn, and the feature vectors exchanged by the nodes of GNN consists of only O(logn)O(\log n) bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.Comment: 22 pages,5 figures, accepted at NeurIPS 202
    corecore