11 research outputs found

    How incomputable is Kolmogorov complexity?

    Get PDF
    Kolmogorov complexity is the length of the ultimately compressed version of a file (that is, anything which can be put in a computer). Formally, it is the length of a shortest program from which the file can be reconstructed. We discuss the incomputabilty of Kolmogorov complexity, which formal loopholes this leaves us, recent approaches to compute or approximate Kolmogorov complexity, which approaches are problematic and which approaches are viable.Comment: 9 pages LaTe

    Consistent Quantification of Complex Dynamics via a Novel Statistical Complexity Measure

    Get PDF
    Natural systems often show complex dynamics. The quantification of such complex dynamics is an important step in, e.g., characterization and classification of different systems or to investigate the effect of an external perturbation on the dynamics. Promising routes were followed in the past using concepts based on (Shannon’s) entropy. Here, we propose a new, conceptually sound measure that can be pragmatically computed, in contrast to pure theoretical concepts based on, e.g., Kolmogorov complexity. We illustrate the applicability using a toy example with a control parameter and go on to the molecular evolution of the HIV1 protease for which drug treatment can be regarded as an external perturbation that changes the complexity of its molecular evolutionary dynamics. In fact, our method identifies exactly those residues which are known to bind the drug molecules by their noticeable signal. We furthermore apply our method in a completely different domain, namely foreign exchange rates, and find convincing results as well

    Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

    Full text link
    In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes

    Evaluating Point Cloud Quality via Transformational Complexity

    Full text link
    Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Merging the research of cognitive science and intuition of the human visual system (HVS), the difference between the expected perceptual result and the practical perception reproduction in the visual center of the cerebral cortex indicates the subjective quality degradation. Therefore in this paper, we try to derive the point cloud quality by measuring the complexity of transforming the distorted point cloud back to its reference, which in practice can be approximated by the code length of one point cloud when the other is given. For this purpose, we first segment the reference and the distorted point cloud into a series of local patch pairs based on one 3D Voronoi diagram. Next, motivated by the predictive coding theory, we utilize one space-aware vector autoregressive (SA-VAR) model to encode the geometry and color channels of each reference patch in cases with and without the distorted patch, respectively. Specifically, supposing that the residual errors follow the multi-variate Gaussian distributions, we calculate the self-complexity of the reference and the transformational complexity between the reference and the distorted sample via covariance matrices. Besides the complexity terms, the prediction terms generated by SA-VAR are introduced as one auxiliary feature to promote the final quality prediction. Extensive experiments on five public point cloud quality databases demonstrate that the transformational complexity based distortion metric (TCDM) produces state-of-the-art (SOTA) results, and ablation studies have further shown that our metric can be generalized to various scenarios with consistent performance by examining its key modules and parameters

    Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

    Full text link
    In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a category? In this paper, we conceptualize a category through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at https://github.com/SarahRastegar/InfoSieve.Comment: Accepted by NeurIPS 202

    Ladderpath Approach: How Tinkering and Reuse Increase Complexity and Information

    Get PDF
    The notion of information and complexity are important concepts in many scientific fields such as molecular biology, evolutionary theory and exobiology. Many measures of these quantities are either difficult to compute, rely on the statistical notion of information, or can only be applied to strings. Based on assembly theory, we propose the notion of a ladderpath, which describes how an object can be decomposed into hierarchical structures using repetitive elements. From the ladderpath, two measures naturally emerge: the ladderpath-index and the order-index, which represent two axes of complexity. We show how the ladderpath approach can be applied to both strings and spatial patterns and argue that all systems that undergo evolution can be described as ladderpaths. Further, we discuss possible applications to human language and the origin of life. The ladderpath approach provides an alternative characterization of the information that is contained in a single object (or a system) and could aid in our understanding of evolving systems and the origin of life in particular

    Synthetic Kolmogorov Complexity in Coq

    Get PDF
    International audienceWe present a generalised, constructive, and machine-checked approach to Kolmogorov complexity in the constructive type theory underlying the Coq proof assistant. By proving that nonrandom numbers form a simple predicate, we obtain elegant proofs of undecidability for random and nonrandom numbers and a proof of uncomputability of Kolmogorov complexity. We use a general and abstract definition of Kolmogorov complexity and subsequently instantiate it to several definitions frequently found in the literature. Whereas textbook treatments of Kolmogorov complexity usually rely heavily on classical logic and the axiom of choice, we put emphasis on the constructiveness of all our arguments, however without blurring their essence. We first give a high-level proof idea using classical logic, which can be formalised with Markov's principle via folklore techniques we subsequently explain. Lastly, we show a strategy how to eliminate Markov's principle from a certain class of computability proofs, rendering all our results fully constructive. All our results are machine-checked by the Coq proof assistant, which is enabled by using a synthetic approach to computability: rather than formalising a model of computation, which is well-known to introduce a considerable overhead, we abstractly assume a universal function, allowing the proofs to focus on the mathematical essence

    Compressão eficiente de sequências biológicas usando uma rede neuronal

    Get PDF
    Background: The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of biosequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for biosequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA and amino acids models. For this purpose, we created GeCo3 and AC2, two new biosequence compressors. Both use a neural network for mixing the opinions of multiple specific models. Findings: We benchmark GeCo3 as a reference-free DNA compressor in five datasets, including a balanced and comprehensive dataset of DNA sequences, the Y-chromosome and human mitogenome, two compilations of archaeal and virus genomes, four whole genomes, and two collections of FASTQ data of a human virome and ancient DNA. GeCo3 achieves a solid improvement in compression over the previous version (GeCo2) of 2:4%, 7:1%, 6:1%, 5:8%, and 6:0%, respectively. As a reference-based DNA compressor, we benchmark GeCo3 in four datasets constituted by the pairwise compression of the chromosomes of the genomes of several primates. GeCo3 improves the compression in 12:4%, 11:7%, 10:8% and 10:1% over the state-of-the-art. The cost of this compression improvement is some additional computational time (1:7_ to 3:0_ slower than GeCo2). The RAM is constant, and the tool scales efficiently, independently from the sequence size. Overall, these values outperform the state-of-the-art. For AC2 the improvements and costs over AC are similar, which allows the tool to also outperform the state-of-the-art. Conclusions: The GeCo3 and AC2 are biosequence compressors with a neural network mixing approach, that provides additional gains over top specific biocompressors. The proposed mixing method is portable, requiring only the probabilities of the models as inputs, providing easy adaptation to other data compressors or compression-based data analysis tools. GeCo3 and AC2 are released under GPLv3 and are available for free download at https://github.com/cobilab/geco3 and https://github.com/cobilab/ac2.Contexto: O aumento da produção de dados genómicos levou a uma maior necessidade de modelos que possam lidar de forma eficiente com a compressão sem perdas de biosequências. Aplicações importantes incluem armazenamento de longo prazo e análise de dados baseada em compressão. Na literatura, apenas alguns artigos recentes propõem o uso de uma rede neuronal para compressão de biosequências. No entanto, os resultados ficam aquém quando comparados com ferramentas de compressão de ADN específicas, como o GeCo2. Essa limitação deve-se à ausência de modelos específicos para sequências de ADN. Neste trabalho, combinamos o poder de uma rede neuronal com modelos específicos de ADN e aminoácidos. Para isso, criámos o GeCo3 e o AC2, dois novos compressores de biosequências. Ambos usam uma rede neuronal para combinar as opiniões de vários modelos específicos. Resultados: Comparamos o GeCo3 como um compressor de ADN sem referência em cinco conjuntos de dados, incluindo um conjunto de dados balanceado de sequências de ADN, o cromossoma Y e o mitogenoma humano, duas compilações de genomas de arqueas e vírus, quatro genomas inteiros e duas coleções de dados FASTQ de um viroma humano e ADN antigo. O GeCo3 atinge uma melhoria sólida na compressão em relação à versão anterior (GeCo2) de 2,4%, 7,1%, 6,1%, 5,8% e 6,0%, respectivamente. Como um compressor de ADN baseado em referência, comparamos o GeCo3 em quatro conjuntos de dados constituídos pela compressão aos pares dos cromossomas dos genomas de vários primatas. O GeCo3 melhora a compressão em 12,4%, 11,7%, 10,8% e 10,1% em relação ao estado da arte. O custo desta melhoria de compressão é algum tempo computacional adicional (1,7 _ a 3,0 _ mais lento do que GeCo2). A RAM é constante e a ferramenta escala de forma eficiente, independentemente do tamanho da sequência. De forma geral, os rácios de compressão superam o estado da arte. Para o AC2, as melhorias e custos em relação ao AC são semelhantes, o que permite que a ferramenta também supere o estado da arte. Conclusões: O GeCo3 e o AC2 são compressores de sequências biológicas com uma abordagem de mistura baseada numa rede neuronal, que fornece ganhos adicionais em relação aos biocompressores específicos de topo. O método de mistura proposto é portátil, exigindo apenas as probabilidades dos modelos como entradas, proporcionando uma fácil adaptação a outros compressores de dados ou ferramentas de análise baseadas em compressão. O GeCo3 e o AC2 são distribuídos sob GPLv3 e estão disponíveis para download gratuito em https://github.com/ cobilab/geco3 e https://github.com/cobilab/ac2.Mestrado em Engenharia de Computadores e Telemátic
    corecore