17 research outputs found

    Algorithmic Complexity Bounds on Future Prediction Errors

    Get PDF
    We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor MM from the true distribution mumu by the algorithmic complexity of mumu. Here we assume we are at a time t>1t>1 and already observed x=x1...xtx=x_1...x_t. We bound the future prediction performance on xt+1xt+2...x_{t+1}x_{t+2}... by a new variant of algorithmic complexity of mumu given xx, plus the complexity of the randomness deficiency of xx. The new complexity is monotone in its condition in the sense that this complexity can only decrease if the condition is prolonged. We also briefly discuss potential generalizations to Bayesian model classes and to classification problems.Comment: 21 page

    Universal Intelligence: A Definition of Machine Intelligence

    Get PDF
    A fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: we take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this equation formally captures the concept of machine intelligence in the broadest reasonable sense. We then show how this formal definition is related to the theory of universal optimal learning agents. Finally, we survey the many other tests and definitions of intelligence that have been proposed for machine

    The teaching size: computable teachers and learners for universal languages

    Full text link
    [EN] The theoretical hardness of machine teaching has usually been analyzed for a range of concept languages under several variants of the teaching dimension: the minimum number of examples that a teacher needs to figure out so that the learner identifies the concept. However, for languages where concepts have structure (and hence size), such as Turing-complete languages, a low teaching dimension can be achieved at the cost of using very large examples, which are hard to process by the learner. In this paper we introduce the teaching size, a more intuitive way of assessing the theoretical feasibility of teaching concepts for structured languages. In the most general case of universal languages, we show that focusing on the total size of a witness set rather than its cardinality, we can teach all total functions that are computable within some fixed time bound. We complement the theoretical results with a range of experimental results on a simple Turing-complete language, showing how teaching dimension and teaching size differ in practice. Quite remarkably, we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.We would like to thank the anonymous referees for their helpful comments. This work was supported by the EU (FEDER) and the Spanish MINECO under grant RTI2018-094403-B-C32, and the Generalitat Valenciana PROMETEO/2019/098. This work was done while the first author visited Universitat Politecnica de Valencia and also while the third author visited University of Bergen (covered by Generalitat Valenciana BEST/2018/027 and University of Bergen). J. Hernandez-Orallo is also funded by an FLI grant RFP2-152.Telle, JA.; Hernández-Orallo, J.; Ferri Ramírez, C. (2019). The teaching size: computable teachers and learners for universal languages. Machine Learning. 108(8-9):1653-1675. https://doi.org/10.1007/s10994-019-05821-2S165316751088-9Angluin, D., & Kriķis, M. (2003). Learning from different teachers. Machine Learning, 51(2), 137–163.Balbach, F. J. (2007). Models for algorithmic teaching. Ph.D. thesis, University of Lübeck.Balbach, F. J. (2008). Measuring teachability using variants of the teaching dimension. Theoretical Computer Science, 397(1–3), 94–113.Balbach, F. J., & Zeugmann, T. (2009). Recent developments in algorithmic teaching. In Intl conf on language and automata theory and applications (pp. 1–18). Springer.Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41–48). ACM.Biran, O., & Cotton, C. (2017). Explanation and justification in machine learning: A survey. In IJCAI-17 Workshop on explainable AI (XAI) (p. 8).Böhm, C. (1964). On a family of turing machines and the related programming language. ICC Bulletin, 3(3), 187–194.Elias, P. (1975). Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2), 194–203.Freivalds, R., Kinber, E. B., & Wiehagen, R. (1989). Inductive inference from good examples. In International workshop on analogical and inductive inference (pp. 1–17). Springer.Freivalds, R., Kinber, E. B., & Wiehagen, R. (1993). On the power of inductive inference from good examples. Theoretical Computer Science, 110(1), 131–144.Gao, Z., Ries, C., Simon, H. U., & Zilles, S. (2016). Preference-based teaching. In Conf. on learning theory (pp. 971–997).Gold, E. M. (1967). Language identification in the limit. Information and Control, 10(5), 447–474.Goldman, S. A., & Kearns, M. J. (1995). On the complexity of teaching. Journal of Computer and System Sciences, 50(1), 20–31.Goldman, S. A., & Mathias, H. D. (1993). Teaching a smart learner. In Conf. on computational learning theory (pp. 67–76).Gulwani, S., Hernández-Orallo, J., Kitzelmann, E., Muggleton, S. H., Schmid, U., & Zorn, B. (2015). Inductive programming meets the real world. Communications of the ACM, 58(11).Hernandez-Orallo, J., & Telle, J. A. (2018). Finite biased teaching with infinite concept classes. arXiv preprint. arXiv:1804.07121 .Jun, S. W. (2016). 50,000,000,000 instructions per second: Design and implementation of a 256-core brainfuck computer. Computer Science and AI Laboratory, MIT.Khan, F., Mutlu, B., & Zhu, X. (2011). How do humans teach: On curriculum learning and teaching dimension. In Advances in neural information processing systems (pp. 1449–1457).Lake, B., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML (pp. 2879–2888).Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.Lázaro-Gredilla, M., Lin, D., Guntupalli, J. S., & George, D. (2019). Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics 4.Levin, L. A. (1973). Universal Search Problems. Problems of Information Transmission, 9, 265–266.Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). New York, NY: Springer.Lieberman, H. (2001). Your wish is my command: Programming by example. San Francisco, CA: Morgan Kaufmann.Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A rational account of pedagogical reasoning: Teaching by, and learning from, examples. Cognitive Psychology, 71, 55–89.Shinohara, A., & Miyano, S. (1991). Teachability in computational learning. New Generation Computing, 8(4), 337–347.Simard, P. Y., Amershi, S., Chickering, D. M., Pelton, A. E., Ghorashi, S., Meek, C., Ramos, G., Suh, J., Verwey, J., & Wang, M., et al. (2017). Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 .Solomonoff, R. J. (1964). A formal theory of inductive inference. Part I. Information and Control, 7(1), 1–22.Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16, 264–280.Zhu, X. (2013). Machine teaching for Bayesian learners in the exponential family. In Neural information processing systems 26, Curran (pp. 1905–1913).Zhu, X. (2015). Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In AAAI (pp. 4083–4087).Zhu, X., Singla, A., Zilles, S., & Rafferty, A. N. (2018). An overview of machine teaching. arXiv preprint arXiv:1801.05927

    Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

    Full text link
    In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes

    Causal loops: logically consistent correlations, time travel, and computation

    Get PDF
    Causal loops are loops in cause-effect chains: An effect can be the cause of that effect's cause. We show that causal loops can be unproblematic, and explore them from different points of view. This thesis is motivated by quantum theory, general relativity, and quantum gravity. By accepting all of quantum theory one can ask whether the possibility to take superpositions extends to causal structures. Then again, quantum theory comes with conceptual problems: Can we overcome these problems by dropping causality? General relativity is consistent with space-time geometries that allow for time-travel: What happens to systems traveling along closed time-like curves, are there reasons to rule out the existence of closed time-like curves in nature? Finally, a candidate for a theory of quantum gravity is quantum theory with a different, relaxed space-time geometry. Motivated by these questions, we explore the classical world of the non-causal. This world is non-empty; and what can happen in such a world is sometimes weird, but not too crazy. What is weird is that in these worlds, a party (or event) can be in the future and in the past of some other party (time travel). What is not too crazy is that this theoretical possibility does not lead to any contradiction. Moreover, one can identify logical consistency with the existence of a unique fixed point in a cause-effect chain. This can be understood as follows: No fixed point is the same as having a contradiction (too stiff), multiple fixed points, then again, is the same as having an unspecified system (too loose). This leads to a series of results in that field: Characterization of classical non-causal correlations, closed time- like curves that do not restrict the actions of experimenters, and a self-referential model of computation. We study the computational power of this model and use it to upper bound the computational power of closed time-like curves. Time travel has ever since been term weird, what we show here, however, is that time travel is not too crazy: It is not possible to solve hard problems by traveling through time. Finally, we apply our results on causal loops to other fields: an analysis with Kolmogorov complexity, local and classical simulation of PR-box correlations with closed time-like curves, and a short note on self-referentiality in language

    Evaluating Point Cloud Quality via Transformational Complexity

    Full text link
    Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Merging the research of cognitive science and intuition of the human visual system (HVS), the difference between the expected perceptual result and the practical perception reproduction in the visual center of the cerebral cortex indicates the subjective quality degradation. Therefore in this paper, we try to derive the point cloud quality by measuring the complexity of transforming the distorted point cloud back to its reference, which in practice can be approximated by the code length of one point cloud when the other is given. For this purpose, we first segment the reference and the distorted point cloud into a series of local patch pairs based on one 3D Voronoi diagram. Next, motivated by the predictive coding theory, we utilize one space-aware vector autoregressive (SA-VAR) model to encode the geometry and color channels of each reference patch in cases with and without the distorted patch, respectively. Specifically, supposing that the residual errors follow the multi-variate Gaussian distributions, we calculate the self-complexity of the reference and the transformational complexity between the reference and the distorted sample via covariance matrices. Besides the complexity terms, the prediction terms generated by SA-VAR are introduced as one auxiliary feature to promote the final quality prediction. Extensive experiments on five public point cloud quality databases demonstrate that the transformational complexity based distortion metric (TCDM) produces state-of-the-art (SOTA) results, and ablation studies have further shown that our metric can be generalized to various scenarios with consistent performance by examining its key modules and parameters

    Synthetic Kolmogorov Complexity in Coq

    Get PDF
    International audienceWe present a generalised, constructive, and machine-checked approach to Kolmogorov complexity in the constructive type theory underlying the Coq proof assistant. By proving that nonrandom numbers form a simple predicate, we obtain elegant proofs of undecidability for random and nonrandom numbers and a proof of uncomputability of Kolmogorov complexity. We use a general and abstract definition of Kolmogorov complexity and subsequently instantiate it to several definitions frequently found in the literature. Whereas textbook treatments of Kolmogorov complexity usually rely heavily on classical logic and the axiom of choice, we put emphasis on the constructiveness of all our arguments, however without blurring their essence. We first give a high-level proof idea using classical logic, which can be formalised with Markov's principle via folklore techniques we subsequently explain. Lastly, we show a strategy how to eliminate Markov's principle from a certain class of computability proofs, rendering all our results fully constructive. All our results are machine-checked by the Coq proof assistant, which is enabled by using a synthetic approach to computability: rather than formalising a model of computation, which is well-known to introduce a considerable overhead, we abstractly assume a universal function, allowing the proofs to focus on the mathematical essence

    Probabilistic, Information-Theoretic Models for Etymological Alignment

    Get PDF
    This thesis starts out by reviewing Bayesian reasoning and Bayesian network models. We present results related to discriminative learning of Bayesian network parameters. Along the way, we explicitly identify a number of problems arising in Bayesian model class selection. This leads us to information theory and, more speci cally, the minimum description length (MDL) principle. We look at its theoretic foundations and practical implications. The MDL approach provides elegant solutions for the problem of model class selection and enables us to objectively compare any set of models, regardless of their parametric structure. Finally, we apply these methods to problems arising in computational etymology. We develop model families for the task of sound-by-sound alignment across kindred languages. Fed with linguistic data in the form of cognate sets, our methods provide information about the correspondence of sounds, as well as the history and ancestral structure of a language family. As a running example we take the family of Uralic languages.Tämä väitöskirja käsittelee kolme aihepiiriä. Ensimmäinen niistä on todennäköisyyslaskenta ja Bayesiläinen päättely. Tämä lähestymistapa on hyödyllinen monessa tapauksessa, kun halutaan kuvata jotakin dataa yleistäen, voidakseen eristää sen omanaisuuksia tai ennustaa jonkin ei havaitun osan siitä. Tapauksissa joissa malliluokka, eli kuvaamistapa ei ole etukäteen tiedossa, tarvitaan valintakriteeri jonka avulla löytyy sopiva luokka. Tällaisen kriiterin määritteleminen objektiivisella tavalla on monesti vaativaa. Bayesiläinen päättely tarjoaa siihen tehtävään joitakin työkaluja, mutta usein on suotuisaa valita toisenlaisen lähestymistavan. Lyhyimmän kuvaamisen periaate MDL katsoo todennäköisyyden maksimoinnin olevan ekvivalentti ongelma kuvaamispituuden minimoinnin kanssa, siis mahdollisimman tehokkaan tiedon pakkaamisen. Datan kuvaustapa, ja sen kautta saavutettu tiedostonkoko on useasti helposti määritelty ongelman luonteeseen sopivalla tavalla. On myös helppo verrata eri kuvaamistavat, joita voi siis lukea malliluokkinakin, pakkaamistehokkuuden avulla objetiivisella tavalla. Nämä tulokset informaatioteorian maailmasta sovelletaan kolmannessa teoksen osassa ongelmiin, jotka syntyvät etymologiassa, sanojen historiallisen alkuperän tieteessä. Kehitetään tietokonemalleja jotka kuvaavat sukulaiskielten suhteet toisiinsä. Niitten avulla tutkitaan säännöt joiden mukaan äänteet vastaavat toisiaan ja miten ovat muuttuneet, ja missä kontekstissa mikä sääntö soveltuu. Esitetään mekanismin joka automatisoidusti arvaa puuttuvat sanamuodot, sekä rakentaa kieliperheelle sukupuun. Esimerkkinä käytetään suomalais-ugrilaiset kielet

    Fast whole-genome phylogeny of the COVID-19 virus SARS-CoV-2 by compression

    Get PDF
    We analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny

    A sequence-length sensitive approach to learning biological grammars using inductive logic programming.

    Get PDF
    This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results
    corecore