9 research outputs found
Degree of Sequentiality of Weighted Automata
Weighted automata (WA) are an important formalism to describe quantitative properties. Obtaining equivalent deterministic machines is a longstanding research problem. In this paper we consider WA with a set semantics, meaning that the semantics is given by the set of weights of accepting runs. We focus on multi-sequential WA that are defined as finite unions of sequential WA. The problem we address is to minimize the size of this union. We call this minimum the degree of sequentiality of (the relation realized by) the WA.
For a given positive integer k, we provide multiple characterizations of relations realized by a union of k sequential WA over an infinitary finitely generated group: a Lipschitz-like machine independent property, a pattern on the automaton (a new twinning property) and a subclass of cost register automata. When possible, we effectively translate a WA into an equivalent union of k sequential WA. We also provide a decision procedure for our twinning property for commutative computable groups thus allowing to compute the degree of sequentiality. Last, we show that these results also hold for word transducers and that the associated decision problem is PSPACE
-complete
Algebraic synchronization criterion and computing reset words
We refine a uniform algebraic approach for deriving upper bounds on reset
thresholds of synchronizing automata. We express the condition that an
automaton is synchronizing in terms of linear algebra, and obtain upper bounds
for the reset thresholds of automata with a short word of a small rank. The
results are applied to make several improvements in the area.
We improve the best general upper bound for reset thresholds of finite prefix
codes (Huffman codes): we show that an -state synchronizing decoder has a
reset word of length at most . In addition to that, we prove
that the expected reset threshold of a uniformly random synchronizing binary
-state decoder is at most . We also show that for any non-unary
alphabet there exist decoders whose reset threshold is in .
We prove the \v{C}ern\'{y} conjecture for -state automata with a letter of
rank at most . In another corollary, based on the recent
results of Nicaud, we show that the probability that the \v{C}ern\'y conjecture
does not hold for a random synchronizing binary automaton is exponentially
small in terms of the number of states, and also that the expected value of the
reset threshold of an -state random synchronizing binary automaton is at
most .
Moreover, reset words of lengths within all of our bounds are computable in
polynomial time. We present suitable algorithms for this task for various
classes of automata, such as (quasi-)one-cluster and (quasi-)Eulerian automata,
for which our results can be applied.Comment: 18 pages, 2 figure
Implementation of Code Properties via Transducers
The FAdo system is a symbolic manipulator of formal language objects, implemented in Python. In this work, we extend its capabilities by implementing methods to manipulate transducers and we go one level higher than existing formal language systems and implement methods to manipulate objects representing classes of independent languages (widely known as code properties). Our methods allow users to define their own code properties and combine them between themselves or with fixed properties such as prefix codes, suffix codes, error detecting codes, etc. The satisfaction and maximality decision questions are solvable for any of the definable properties. The new online system LaSer allows one to query about a code property and obtain the answer in a batch mode. Our work is founded on independence theory as well as the theory of rational relations and transducers, and contributes with improved algorithms on these objects
Bidirectional best hit r-window gene clusters
<p>Abstract</p> <p>Background</p> <p><it>Conserved gene clusters </it>are groups of genes that are located close to one another in the genomes of several species. They tend to code for proteins that have a functional interaction. The identification of conserved gene clusters is an important step towards understanding genome evolution and predicting gene function.</p> <p>Results</p> <p>In this paper, we propose a novel pairwise gene cluster model that combines the notion of bidirectional best hits with the <it>r</it>-window model introduced in 2003 by Durand and Sankoff. The bidirectional best hit (BBH) constraint removes the need to specify the minimum number of shared genes in the <it>r</it>-window model and improves the relevance of the results. We design a subquadratic time algorithm to compute the set of BBH <it>r</it>-window gene clusters efficiently.</p> <p>Conclusion</p> <p>We apply our cluster model to the comparative analysis of <it>E. coli </it>K-12 and <it>B. subtilis </it>and perform an extensive comparison between our new model and the gene teams model developed by Bergeron <it>et al</it>. As compared to the gene teams model, our new cluster model has a slightly lower recall but a higher precision at all levels of recall when the results were ranked using statistical tests. An analysis of the most significant BBH <it>r</it>-window gene cluster show that they correspond to known operons.</p
Statistics for approximate gene clusters
Jahn K, Winter S, Stoye J, Böcker S. Statistics for approximate gene clusters. BMC Bioinformatics. 2013;14(Suppl 15: Proc. of RECOMB-CG 2013): S14.Background
Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering.
Results
In this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes