We propose the following general method for scaling learning algorithms to arbitrarily large data sets. Consider the model M ~n learned by the algorithm using n i examples in step i (~n = (n 1 ; : : : ; nm )), and the model M1 that would be learned using infinite examples. Upper-bound the loss L(M ~n ; M1 ) between them as a function of ~n, and then minimize the algorithm's time complexity f(~n) subject to the constraint that L(M1 ; M ~n ) be at most with probability at most . We apply this method to the EM algorithm for mixtures of Gaussians. Preliminary experiments on a series of large data sets provide evidence of the potential of this approach
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.