28,750 research outputs found
Minimal Forbidden Factors of Circular Words
Minimal forbidden factors are a useful tool for investigating properties of
words and languages. Two factorial languages are distinct if and only if they
have different (antifactorial) sets of minimal forbidden factors. There exist
algorithms for computing the minimal forbidden factors of a word, as well as of
a regular factorial language. Conversely, Crochemore et al. [IPL, 1998] gave an
algorithm that, given the trie recognizing a finite antifactorial language ,
computes a DFA recognizing the language whose set of minimal forbidden factors
is . In the same paper, they showed that the obtained DFA is minimal if the
input trie recognizes the minimal forbidden factors of a single word. We
generalize this result to the case of a circular word. We discuss several
combinatorial properties of the minimal forbidden factors of a circular word.
As a byproduct, we obtain a formal definition of the factor automaton of a
circular word. Finally, we investigate the case of minimal forbidden factors of
the circular Fibonacci words.Comment: To appear in Theoretical Computer Scienc
Linear-time Computation of Minimal Absent Words Using Suffix Array
An absent word of a word y of length n is a word that does not occur in y. It
is a minimal absent word if all its proper factors occur in y. Minimal absent
words have been computed in genomes of organisms from all domains of life;
their computation provides a fast alternative for measuring approximation in
sequence comparison. There exists an O(n)-time and O(n)-space algorithm for
computing all minimal absent words on a fixed-sized alphabet based on the
construction of suffix automata (Crochemore et al., 1998). No implementation of
this algorithm is publicly available. There also exists an O(n^2)-time and
O(n)-space algorithm for the same problem based on the construction of suffix
arrays (Pinho et al., 2009). An implementation of this algorithm was also
provided by the authors and is currently the fastest available. In this
article, we bridge this unpleasant gap by presenting an O(n)-time and
O(n)-space algorithm for computing all minimal absent words based on the
construction of suffix arrays. Experimental results using real and synthetic
data show that the respective implementation outperforms the one by Pinho et
al
Minimal Absent Words in Rooted and Unrooted Trees
We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet of cardinality. We show that the set of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality (resp.), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time (resp. assuming an integer alphabet of size polynomial in n
Decoding genomic information
Our work here outlines and follows some trends of research which analyze and interpret (i.e., decode) genomic information, by assuming the genome to be a book encrypted in an unknown language. This analysis is performed by sequence alignment-free methods, based on information theoretical concepts, in order to convert the genomic information into a comprehensible mathematical form and understand its complexity
Entropy-based parametric estimation of spike train statistics
We consider the evolution of a network of neurons, focusing on the asymptotic
behavior of spikes dynamics instead of membrane potential dynamics. The spike
response is not sought as a deterministic response in this context, but as a
conditional probability : "Reading out the code" consists of inferring such a
probability. This probability is computed from empirical raster plots, by using
the framework of thermodynamic formalism in ergodic theory. This gives us a
parametric statistical model where the probability has the form of a Gibbs
distribution. In this respect, this approach generalizes the seminal and
profound work of Schneidman and collaborators. A minimal presentation of the
formalism is reviewed here, while a general algorithmic estimation method is
proposed yielding fast convergent implementations. It is also made explicit how
several spike observables (entropy, rate, synchronizations, correlations) are
given in closed-form from the parametric estimation. This paradigm does not
only allow us to estimate the spike statistics, given a design choice, but also
to compare different models, thus answering comparative questions about the
neural code such as : "are correlations (or time synchrony or a given set of
spike patterns, ..) significant with respect to rate coding only ?" A numerical
validation of the method is proposed and the perspectives regarding spike-train
code analysis are also discussed.Comment: 37 pages, 8 figures, submitte
Theoretical analysis of interhemispheric transfer costs in visual word recognition
It is becoming increasingly clear that interhemispheric transfer is an important factor in visual word recognition. One of the two computational models of visual word recognition that includes this aspect, the SERIOL model, is tested on the basis of recently obtained behavioural word naming data. Optimal viewing position (OVP) data were collected from participants with left hemisphere language dominance, right hemisphere language dominance, and bilateral language representation (as determined by fMRI). We employ a mathematical model, which is based on some of the underlying assumptions of SERIOL, to investigate the model's ability to predict our results. We show that this mathematical model, which makes use of the original parameters, is able to perfectly predict the differences in the OVP curves observed in the three groups of participants
- âŠ