1 research outputs found
Notes on a New Philosophy of Empirical Science
This book presents a methodology and philosophy of empirical science based on
large scale lossless data compression. In this view a theory is scientific if
it can be used to build a data compression program, and it is valuable if it
can compress a standard benchmark database to a small size, taking into account
the length of the compressor itself. This methodology therefore includes an
Occam principle as well as a solution to the problem of demarcation. Because of
the fundamental difficulty of lossless compression, this type of research must
be empirical in nature: compression can only be achieved by discovering and
characterizing empirical regularities in the data. Because of this, the
philosophy provides a way to reformulate fields such as computer vision and
computational linguistics as empirical sciences: the former by attempting to
compress databases of natural images, the latter by attempting to compress
large text databases. The book argues that the rigor and objectivity of the
compression principle should set the stage for systematic progress in these
fields. The argument is especially strong in the context of computer vision,
which is plagued by chronic problems of evaluation.
The book also considers the field of machine learning. Here the traditional
approach requires that the models proposed to solve learning problems be
extremely simple, in order to avoid overfitting. However, the world may contain
intrinsically complex phenomena, which would require complex models to
understand. The compression philosophy can justify complex models because of
the large quantity of data being modeled (if the target database is 100 Gb, it
is easy to justify a 10 Mb model). The complex models and abstractions learned
on the basis of the raw data (images, language, etc) can then be reused to
solve any specific learning problem, such as face recognition or machine
translation.Comment: Draft Versio