Search CORE

567 research outputs found

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author: Cook Cory
Publication venue: SJSU ScholarWorks
Publication date: 14/06/2016
Field of study

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

SJSU ScholarWorks

On the relevance of the neurobiological analogue of the finite-state architecture

Author: Chomsky
Chomsky
Davis
Elman
Friederici
Gold
Gomez
Hauser
Jackendoff
Karl Magnus Petersson
Koch
Marr
McCulloch
Minsky
Nowak
Petersson
Savage
Shinohara
Siegelmann
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We present two simple arguments for the potential relevance of a neurobiological analogue of the finite-state architecture. The first assumes the classical cognitive framework, is well-known, and is based on the assumption that the brain is finite with respect to its memory organization. The second is formulated within a general dynamical systems framework and is based on the assumption that the brain sustains some level of noise and/or does not utilize infinite precision processing. We briefly review the classical cognitive framework based on Church-Turing computability and non-classical approaches based on analog processing in dynamical systems. We conclude that the dynamical neurobiological analogue of the finite-state architecture appears to be relevant, at least at an implementational level, for cognitive brain systems

Crossref

Sapientia

MPG.PuRe

An exploration of language identification techniques for the Dutch folktale database

Author: Hiemstra Djoerd
Jong Franciska de
Meder Theo
Theune Mariët
Trieschnigg Dolf
Publication venue: LREC organization
Publication date: 01/01/2012
Field of study

The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks, classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000 documents in 16 languages and dialects is available on request for followup research

CiteSeerX

Radboud Repository

University of Twente Research Information

Statistical Learning of Arbitrary Computable Classifiers

Author: Soloveichik David
Publication venue
Publication date: 22/06/2008
Field of study

Statistical learning theory chiefly studies restricted hypothesis classes, particularly those with finite Vapnik-Chervonenkis (VC) dimension. The fundamental quantity of interest is the sample complexity: the number of samples required to learn to a specified level of accuracy. Here we consider learning over the set of all computable labeling functions. Since the VC-dimension is infinite and a priori (uniform) bounds on the number of samples are impossible, we let the learning algorithm decide when it has seen sufficient samples to have learned. We first show that learning in this setting is indeed possible, and develop a learning algorithm. We then show, however, that bounding sample complexity independently of the distribution is impossible. Notably, this impossibility is entirely due to the requirement that the learning algorithm be computable, and not due to the statistical nature of the problem.Comment: Expanded the section on prior work and added reference

arXiv.org e-Print Archive

Caltech Authors

PAC Learning, VC Dimension, and the Arithmetic Hierarchy

Author: Calvert Wesley
Publication venue
Publication date: 04/06/2014
Field of study

We compute that the index set of PAC-learnable concept classes is

m

-complete

\Sigma^0_3

within the set of indices for all concept classes of a reasonable form. All concept classes considered are computable enumerations of computable

\Pi^0_1

classes, in a sense made precise here. This family of concept classes is sufficient to cover all standard examples, and also has the property that PAC learnability is equivalent to finite VC dimension

arXiv.org e-Print Archive

CiteSeerX

Crossref

OpenSIUC