We present a new method for nonlinear prediction of discrete random sequences
under minimal structural assumptions. We give a mathematical construction for
optimal predictors of such processes, in the form of hidden Markov models. We
then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which
approximates the ideal predictor from data. We discuss the reliability of CSSR,
its data requirements, and its performance in simulations. Finally, we compare
our approach to existing methods using variable-length Markov models and
cross-validated hidden Markov models, and show theoretically and experimentally
that our method delivers results superior to the former and at least comparable
to the latter.Comment: 8 pages, 4 figure