The first quest for optimal estimation by Fisher, , Cramer, Rao and others, , dates back to over half a century and has changed remarkably little. The covariance of the estimated parameters was taken as the quality measure of estimators, for which the main result, the Cramer-Rao inequality, sets a lower bound. It is reached by the maximum likelihood (ML) estimator for a restricted subclass of models and asymptotically for a wider class. The covariance, which is just one property of models, is too weak a measure to permit extension to estimation of the number of parameters, which is handled by various ad hoc criteria too numerous to list here. Soon after I had studied Shannon’s formal definition of information in random variables and his other remarkable performance bounds for communication, , I wanted to apply them to other fields – in particular to estimation and statistics in general. After all, the central problem in statistics is to extract information from data. After having worked on data compression and introduced arithmetic coding it seemed evident that both estimation and data compression have a common goal: in data compression the shortest code length cannot be achieved without taking advantage of the regular features in data, while in estimation it is these regular features, the underlying mechanism, that we want to learn. This led me to introduce the MDL or Minimum Description Length principle, and I thought that the job was done. However, when starting to prepare this lecture I found that it was difficult, for I could not connect the several in themselves meaningful results to form a nice coherent picture. It was like a jigsaw puzzle where the pieces almost fit but not quite, and, moreover, vital pieces were missing. After considerable struggle I was able to get the pieces to fit but to do so I had to alter them all, and ignore the means and concepts introduced by the masters mentioned above. The result was separation of estimation from data compression, and we can now define optimality for all amounts of data and not just asymptotically.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.