73 research outputs found

    Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

    Get PDF
    We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extends to the entire space Rd\mathbb{R}^d using the techniques from [Andoni, Razenshteyn 2015]). We also show tight, unconditional cell-probe lower bounds for one and two probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than for one probe. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1511.07527 [cs.DS]

    Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

    Full text link
    [See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the cc-Approximate Near Neighbor Search problem. For the dd-dimensional Euclidean space and nn-point datasets, we develop a data structure with space n1+ρu+o(1)+O(dn)n^{1 + \rho_u + o(1)} + O(dn) and query time nρq+o(1)+dno(1)n^{\rho_q + o(1)} + d n^{o(1)} for every ρu,ρq0\rho_u, \rho_q \geq 0 such that: \begin{equation} c^2 \sqrt{\rho_q} + (c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor c>1c > 1, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for ρq=0\rho_q = 0, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version contains more elaborated proofs and fixed some typo

    Approximate Near Neighbors for General Symmetric Norms

    Full text link
    We show that every symmetric normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every nn, d=no(1)d = n^{o(1)}, and every dd-dimensional symmetric norm \|\cdot\|, there exists a data structure for poly(loglogn)\mathrm{poly}(\log \log n)-approximate nearest neighbor search over \|\cdot\| for nn-point datasets achieving no(1)n^{o(1)} query time and n1+o(1)n^{1+o(1)} space. The main technical ingredient of the algorithm is a low-distortion embedding of a symmetric norm into a low-dimensional iterated product of top-kk norms. We also show that our techniques cannot be extended to general norms.Comment: 27 pages, 1 figur

    Forecasting Models for Integration of Large-Scale Renewable Energy Generation to Electric Power Systems

    Get PDF
    Amid growing concerns about climate change and non-renewable energy sources deple¬tion, vari¬able renewable energy sources (VRESs) are considered as a feasible substitute for conventional environment-polluting fossil fuel-based power plants. Furthermore, the transition towards clean power systems requires additional transmission capacity. Dynamic thermal line rating (DTLR) is being considered as a potential solution to enhance the current transmission line capacity and omit/postpone transmission system expansion planning, while DTLR is highly dependent on weather variations. With increasing the accommodation of VRESs and application of DTLR, fluctuations and variations thereof impose severe and unprecedented challenges on power systems operation. Therefore, short-term forecasting of large-scale VERSs and DTLR play a crucial role in the electric power system op¬eration problems. To this end, this thesis devotes on developing forecasting models for two large-scale VRESs types (i.e., wind and tidal) and DTLR. Deterministic prediction can be employed for a variety of power system operation problems solved by deterministic optimization. Also, the outcomes of deterministic prediction can be employed for conditional probabilistic prediction, which can be used for modeling uncertainty, used in power system operation problems with robust optimization, chance-constrained optimization, etc. By virtue of the importance of deterministic prediction, deterministic prediction models are developed. Prevalently, time-frequency decomposition approaches are adapted to decompose the wind power time series (TS) into several less non-stationary and non-linear components, which can be predicted more precisely. However, in addition to non-stationarity and nonlinearity, wind power TS demonstrates chaotic characteristics, which reduces the predictability of the wind power TS. In this regard, a wind power generation prediction model based on considering the chaosity of the wind power generation TS is addressed. The model consists of a novel TS decomposition approach, named multi-scale singular spectrum analysis (MSSSA), and least squares support vector machines (LSSVMs). Furthermore, deterministic tidal TS prediction model is developed. In the proposed prediction model, a variant of empirical mode decomposition (EMD), which alleviates the issues associated with EMD. To further improve the prediction accuracy, the impact of different components of wind power TS with different frequencies (scales) in the spatiotemporal modeling of the wind farm is assessed. Consequently, a multiscale spatiotemporal wind power prediction is developed, using information theory-based feature selection, wavelet decomposition, and LSSVM. Power system operation problems with robust optimization and interval optimization require prediction intervals (PIs) to model the uncertainty of renewables. The advanced PI models are mainly based on non-differentiable and non-convex cost functions, which make the use of heuristic optimization for tuning a large number of unknown parameters of the prediction models inevitable. However, heuristic optimization suffers from several issues (e.g., being trapped in local optima, irreproducibility, etc.). To this end, a new wind power PI (WPPI) model, based on a bi-level optimization structure, is put forward. In the proposed WPPI, the main unknown parameters of the prediction model are globally tuned based on optimizing a convex and differentiable cost function. In line with solving the non-differentiability and non-convexity of PI formulation, an asymmetrically adaptive quantile regression (AAQR) which benefits from a linear formulation is proposed for tidal uncertainty modeling. In the prevalent QR-based PI models, for a specified reliability level, the probabilities of the quantiles are selected symmetrically with respect the median probability. However, it is found that asymmetrical and adaptive selection of quantiles with respect to median can provide more efficient PIs. To make the formulation of AAQR linear, extreme learning machine (ELM) is adapted as the prediction engine. Prevalently, the parameters of activation functions in ELM are selected randomly; while different sets of random values might result in dissimilar prediction accuracy. To this end, a heuristic optimization is devised to tune the parameters of the activation functions. Also, to enhance the accuracy of probabilistic DTLR, consideration of latent variables in DTLR prediction is assessed. It is observed that convective cooling rate can provide informative features for DTLR prediction. Also, to address the high dimensional feature space in DTLR, a DTR prediction based on deep learning and consideration of latent variables is put forward. Numerical results of this thesis are provided based on realistic data. The simulations confirm the superiority of the proposed models in comparison to traditional benchmark models, as well as the state-of-the-art models

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    Hardness of Approximate Nearest Neighbor Search

    Full text link
    We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every δ>0\delta>0 there exists a constant ϵ>0\epsilon>0 such that computing a (1+ϵ)(1+\epsilon)-approximation to the Bichromatic Closest Pair requires n2δn^{2-\delta} time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the Distributed PCP framework of [ARW'17], but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but our construction is the first to yield new hardness results

    Lower bound techniques for data structures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 135-143).We describe new techniques for proving lower bounds on data-structure problems, with the following broad consequences: * the first [omega](lg n) lower bound for any dynamic problem, improving on a bound that had been standing since 1989; * for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show [omega](lg n/ lg lg n) bounds when the space is O(n - polylog n). Using these techniques, we analyze a variety of central data-structure problems, and obtain improved lower bounds for the following: * the partial-sums problem (a fundamental application of augmented binary search trees); * the predecessor problem (which is equivalent to IP lookup in Internet routers); * dynamic trees and dynamic connectivity; * orthogonal range stabbing. * orthogonal range counting, and orthogonal range reporting; * the partial match problem (searching with wild-cards); * (1 + [epsilon])-approximate near neighbor on the hypercube; * approximate nearest neighbor in the l[infinity] metric. Our new techniques lead to surprisingly non-technical proofs. For several problems, we obtain simpler proofs for bounds that were already known.by Mihai Pǎtraşcu.Ph.D

    WIND POWER PROBABILISTIC PREDICTION AND UNCERTAINTY MODELING FOR OPERATION OF LARGE-SCALE POWER SYSTEMS

    Get PDF
    Over the last decade, large scale renewable energy generation has been integrated into power systems. Wind power generation is known as a widely-used and interesting kind of renewable energy generation around the world. However, the high uncertainty of wind power generation leads to some unavoidable error in wind power prediction process; consequently, it makes the optimal operation and control of power systems very challenging. Since wind power prediction error cannot be entirely removed, providing accurate models for wind power uncertainty can assist power system operators in mitigating its negative effects on decision making conditions. There are efficient ways to show the wind power uncertainty, (i) accurate wind power prediction error probability distribution modeling in the form of probability density functions and (ii) construction of reliable and sharp prediction intervals. Construction of accurate probability density functions and high-quality prediction intervals are difficult because wind power time series is non-stationary. In addition, incorporation of probability density functions and prediction intervals in power systems’ decision-making problems are challenging. In this thesis, the goal is to propose comprehensive frameworks for wind power uncertainty modeling in the form of both probability density functions and prediction intervals and incorporation of each model in power systems’ decision-making problems such as look-ahead economic dispatch. To accurately quantify the uncertainty of wind power generation, different approaches are studied, and a comprehensive framework is then proposed to construct the probability density functions using a mixture of beta kernels. The framework outperforms benchmarks because it can validly capture the actual features of wind power probability density function such as main mass, boundaries, high skewness, and fat tails from the wind power sample moments. Also, using the proposed framework, a generic convex model is proposed for chance-constrained look-ahead economic dispatch problems. It allows power system operators to use piecewise linearization techniques to convert the problem to a mixed-integer linear programming problem. Numerical simulations using IEEE 118-bus test system show that compared with widely used sequential linear programming approaches, the proposed mixed-integer linear programming model leads to less system’s total cost. A framework based on the concept of bandwidth selection for a new and flexible kernel density estimator is proposed for construction of prediction intervals. Unlike previous related works, the proposed framework uses neither a cost function-based optimization problem nor point prediction results; rather, a diffusion-based kernel density estimator is utilized to achieve high-quality prediction intervals for non-stationary wind power time series. The proposed prediction interval construction framework is also founded based on a parallel computing procedure to promote the computational efficiency for practical applications in power systems. Simulation results demonstrate the high performance of the proposed framework compared to well-known conventional benchmarks such as bootstrap extreme learning machine, lower upper bound estimation, quantile regression, auto-regressive integrated moving average, and linear programming-based quantile regression. Finally, a new adjustable robust optimization approach is used to incorporate the constructed prediction intervals with the proposed fuzzy and adaptive diffusion estimator-based prediction interval construction framework. However, to accurately model the correlation and dependence structure of wind farms, especially in high dimensional cases, C-Vine copula models are used for prediction interval construction. The simulation results show that uncertainty modeling using C-Vine copula can lead the system operators to get more realistic sense about the level of overall uncertainty in the system, and consequently more conservative results for energy and reserve scheduling are obtained
    corecore