27,032 research outputs found
Kolmogorov Complexity in perspective. Part II: Classification, Information Processing and Duality
We survey diverse approaches to the notion of information: from Shannon
entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov
complexity are presented: randomness and classification. The survey is divided
in two parts published in a same volume. Part II is dedicated to the relation
between logic and information system, within the scope of Kolmogorov
algorithmic information theory. We present a recent application of Kolmogorov
complexity: classification using compression, an idea with provocative
implementation by authors such as Bennett, Vitanyi and Cilibrasi. This stresses
how Kolmogorov complexity, besides being a foundation to randomness, is also
related to classification. Another approach to classification is also
considered: the so-called "Google classification". It uses another original and
attractive idea which is connected to the classification using compression and
to Kolmogorov complexity from a conceptual point of view. We present and unify
these different approaches to classification in terms of Bottom-Up versus
Top-Down operational modes, of which we point the fundamental principles and
the underlying duality. We look at the way these two dual modes are used in
different approaches to information system, particularly the relational model
for database introduced by Codd in the 70's. This allows to point out diverse
forms of a fundamental duality. These operational modes are also reinterpreted
in the context of the comprehension schema of axiomatic set theory ZF. This
leads us to develop how Kolmogorov's complexity is linked to intensionality,
abstraction, classification and information system.Comment: 43 page
Recommended from our members
AQUA: an ontology driven question answering system
This paper describes AQUA our question answering over the Web. AQUA was designed to work over heterogeneous sources. This means that AQUA is equipped to work as closed domain and in addition to open-domain question answering. As a first instance, AQUA tries to answer a question using a Knowledge base. If a query cannot be satisfied over a knowledge base/database. Then, AQUA tries to find an answer on web pages (i.e. it uses as corpus the internet as resource). Our system uses NLP (Natural Language Processing), First order logic and Information Extraction technologies. AQUA has been tested using an ontology which describes academic life. Keywords Ontologies, Information Extraction, Machine Learnin
BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences
This paper argues that there are three fundamental challenges that need to be
overcome in order to foster the adoption of big data technologies in
non-computer science related disciplines: addressing issues of accessibility of
such technologies for non-computer scientists, supporting the ad hoc
exploration of large data sets with minimal effort and the availability of
lightweight web-based frameworks for quick and easy analytics. In this paper,
we address the above three challenges through the development of 'BigExcel', a
three tier web-based framework for exploring big data to facilitate the
management of user interactions with large data sets, the construction of
queries to explore the data set and the management of the infrastructure. The
feasibility of BigExcel is demonstrated through two Yahoo Sandbox datasets. The
first dataset is the Yahoo Buzz Score data set we use for quantitatively
predicting trending technologies and the second is the Yahoo n-gram corpus we
use for qualitatively inferring the coverage of important events. A
demonstration of the BigExcel framework and source code is available at
http://bigdata.cs.st-andrews.ac.uk/projects/bigexcel-exploring-big-data-for-social-sciences/.Comment: 8 page
- …