Search CORE

35 research outputs found

Knowing when you're wrong: Building fast and reliable approximate query processing systems

Author: Agarwal Sameer
Jordan Michael
Kleiner Ariel
Madden Samuel R.
Milner Henry
Mozafari Barzan
Stoica Ion
Talwalkar Ameet
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

Modern data analytics applications typically process massive amounts of data on clusters of tens, hundreds, or thousands of machines to support near-real-time decisions.The quantity of data and limitations of disk and memory bandwidth often make it infeasible to deliver answers at interactive speeds. However, it has been widely observed that many applications can tolerate some degree of inaccuracy. This is especially true for exploratory queries on data, where users are satisfied with "close-enough" answers if they can come quickly. A popular technique for speeding up queries at the cost of accuracy is to execute each query on a sample of data, rather than the whole dataset. To ensure that the returned result is not too inaccurate, past work on approximate query processing has used statistical techniques to estimate "error bars" on returned results. However, existing work in the sampling-based approximate query processing (S-AQP) community has not validated whether these techniques actually generate accurate error bars for real query workloads. In fact, we find that error bar estimation often fails on real world production workloads. Fortunately, it is possible to quickly and accurately diagnose the failure of error estimation for a query. In this paper, we show that it is possible to implement a query approximation pipeline that produces approximate answers and reliable error bars at interactive speeds.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)Lawrence Berkeley National Laboratory (Award 7076018)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331)Amazon.com (Firm)Google (Firm)SAP CorporationThomas and Stacey Siebel FoundationApple Computer, Inc.Cisco Systems, Inc.Cloudera, Inc.EMC CorporationEricsson, Inc.Facebook (Firm

DSpace@MIT

Crossref

A scalable bootstrap for massive data.

Author: Ameet Talwalkar
Ariel Kleiner
Michael I Jordan
Purnamrita Sarkar
Publication venue
Publication date: 01/01/2014
Field of study

Summary. The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets-which are increasingly prevalentthe calculation of bootstrap-based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number of subsampled data points), and they often require knowledge of the estimator's convergence rate, in contrast with the bootstrap. As an alternative, we introduce the 'bag of little bootstraps' (BLB), which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. The BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate the BLB's favourable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing the BLB with the bootstrap, the m out of n bootstrap and subsampling. In addition, we present results from a large-scale distributed implementation of the BLB demonstrating its computational superiority on massive data, a method for adaptively selecting the BLB's tuning parameters, an empirical study applying the BLB to several real data sets and an extension of the BLB to time series data

CiteSeerX

Voters, Non-Voters, and the Implications of Election Timing for Public Policy

Author: A Lijphart
A Pacek
A Richard
Albert K Karnig
Anne Lee
Anne Lee
Anne Lee
Anne Lee
Anup Malani
Anup Malani
Ariel Porat
Ariel Porat
Ariel Porat
Arthur J Townley
Arthur Lupia
B D Silver
B Grofman
Barbara Norrander
Benjamin Highton
Bernard E Harcourt
Carlo Perroni
Christopher R. Berry
Clyde W Summers
D Rubenson
Daniel Rubinfeld
Daniel Rubinfeld
David Gilo
David Weisbach
Dennis Epple
Douglas G Baird
Douglas G Baird
E Bernard
Eric A Posner
Eric A Posner
F F Piven
Frederick Hess
G Lutz
Gary King
Harvey J Tucker
Heather Rose
Heather Rose
Henry S Farber
J Denardo
J H Pammett
J K Lunceford
J Lior
J M Robins
J Stephen
J Stephen
J Stephen
Jacob E Gersen
Jacob E. Gersen
James L Freund
Jan E Leighley
Jessica Trounstine
John C Sonstelie
John D Griffin
Jonathan Masur
Jonathan Masur
Jonathan S Masur
Jonathan S Masur
Joseph Isenbergh
K Swaddle
Karen Maeshiro
Keith Krehbiel
Kenneth A Shepsle
Kenneth W Dam
Kevin M O&apos
Kevin M O&apos
Kim Q Hill
Kim Q Hill
L J Yale
Lee Epstein
M
M
M
M D Martinez
Marc Meredith
Martin Gilens
Mas-Colell
Michael M Gant
Michael Rauscher
Morris M Kleiner
Nuno Garoupa
Omri Ben
Omri Ben
Omri Ben
Omri Ben Shahar
Omri Ben-Shahar
Omri Ben-Shahar
Patrick Ellcessor
Phillip K Piele
Phillip Sprunger
R Christopher
Randal C Picker
Randal C Picker
Randal C Picker
Randal C Picker
Randal C Picker
Raymond E Wolfinger
Richard A Epstein
Richard A Epstein
Richard A Epstein
Richard A Epstein
Richard A Epstein
Richard A Epstein
Richard A Epstein
Richard A Epstein
Richard B Freeman
Robert G Gregory
Ronald G Ehrenberg
Rose-Ackerman
Saul Levmore
Saul Levmore
Sidney Verba
Stephanie W Dunne
Susanna Loeb
Terry M Moe
Thomas R Souzzi
Thomas W G Gilligan
Tom Ginsburg
Tom Ginsburg
Tom Ginsburg
Walter
William Fischel
Yosh Halberstam
Zoltan L Hajnal
Zoltan L Hajnal
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Crossref

Randomized Algorithms for Scalable Machine Learning

Author: Ariel Jacob Kleiner
Publication venue
Publication date: 01/01/2012
Field of study

Many existing procedures in machine learning and statistics are computationally intractable in the setting of large-scale data. As a result, the advent of rapidly increasing dataset sizes, which should be a boon yielding improved statistical performance, instead severely blunts the usefulness of a variety of existing inferential methods. In this work, we use randomness to ameliorate this lack of scalability by reducing complex, computationally difficult inferential problems to larger sets of significantly smaller and more tractable subproblems. This approach allows us to devise algorithms which are both more efficient and more amenable to use of parallel and distributed computation. We propose novel randomized algorithms for two broad classes of problems that arise in machine learning and statistics: estimator quality assessment and semidefinite programming. For the former, we present the Bag of Little Bootstraps (BLB), a procedure which incorporates features of both the bootstrap and subsampling to obtain substantial computational gains while retaining the bootstrap's accuracy and automation; we also present a novel diagnostic procedure which leverages increasing dataset sizes combined with increasingly powerful computational resources to render existing estimator quality assessment methodology more automatically usable. For semidefinite programming, we present Random Conic Pursuit, a procedure that solves semidefinite programs via repeated optimization over randomly selected two-dimensional subcones of the positive semidefinite cone. As we demonstrate via both theoretical and empirical analyses, these algorithms are scalable, readily benefit from the use of parallel and distributed computing resources, are generically applicable and easily implemented, and have favorable theoretical properties

CiteSeerX

eScholarship - University of California

Recommended from our members

Randomized Algorithms for Scalable Machine Learning

Author: Kleiner Ariel Jacob
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

eScholarship - University of California

Random Conic Pursuit for Semidefinite Programming

Author: Ali Rahimi
Ariel Kleiner
Michael I. Jordan
Publication venue
Publication date: 01/01/2012
Field of study

We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm.

CiteSeerX

The Degree of the Predischarge Pulmonary Congestion in Patients Hospitalized for Worsening Heart Failure Predicts Readmission and Mortality.

Author: Ambrosy Andrew P
Fudim Marat
Glantz Juliya
Kapustin Daniel
Kazatsker Mark
Kleiner Ilia
Kleiner-Shochat Michael
Meisel Simcha R
Panjrath Gurusher
Roguin Ariel
Weinstein Jean Marc
Publication venue: 'S. Karger AG'
Publication date: 28/10/2020
Field of study

Crossref

George Washington University: Health Sciences Research Commons (HSRC)