567 research outputs found

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    On the relevance of the neurobiological analogue of the finite-state architecture

    Get PDF
    We present two simple arguments for the potential relevance of a neurobiological analogue of the finite-state architecture. The first assumes the classical cognitive framework, is well-known, and is based on the assumption that the brain is finite with respect to its memory organization. The second is formulated within a general dynamical systems framework and is based on the assumption that the brain sustains some level of noise and/or does not utilize infinite precision processing. We briefly review the classical cognitive framework based on Church-Turing computability and non-classical approaches based on analog processing in dynamical systems. We conclude that the dynamical neurobiological analogue of the finite-state architecture appears to be relevant, at least at an implementational level, for cognitive brain systems

    An exploration of language identification techniques for the Dutch folktale database

    Get PDF
    The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks, classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000 documents in 16 languages and dialects is available on request for followup research

    Statistical Learning of Arbitrary Computable Classifiers

    Get PDF
    Statistical learning theory chiefly studies restricted hypothesis classes, particularly those with finite Vapnik-Chervonenkis (VC) dimension. The fundamental quantity of interest is the sample complexity: the number of samples required to learn to a specified level of accuracy. Here we consider learning over the set of all computable labeling functions. Since the VC-dimension is infinite and a priori (uniform) bounds on the number of samples are impossible, we let the learning algorithm decide when it has seen sufficient samples to have learned. We first show that learning in this setting is indeed possible, and develop a learning algorithm. We then show, however, that bounding sample complexity independently of the distribution is impossible. Notably, this impossibility is entirely due to the requirement that the learning algorithm be computable, and not due to the statistical nature of the problem.Comment: Expanded the section on prior work and added reference

    PAC Learning, VC Dimension, and the Arithmetic Hierarchy

    Get PDF
    We compute that the index set of PAC-learnable concept classes is mm-complete Ī£30\Sigma^0_3 within the set of indices for all concept classes of a reasonable form. All concept classes considered are computable enumerations of computable Ī 10\Pi^0_1 classes, in a sense made precise here. This family of concept classes is sufficient to cover all standard examples, and also has the property that PAC learnability is equivalent to finite VC dimension
    • ā€¦
    corecore