551 research outputs found
Quickest Online Selection of an Increasing Subsequence of Specified Size
Given a sequence of independent random variables with a common continuous
distribution, we consider the online decision problem where one seeks to
minimize the expected value of the time that is needed to complete the
selection of a monotone increasing subsequence of a prespecified length .
This problem is dual to some online decision problems that have been considered
earlier, and this dual problem has some notable advantages. In particular, the
recursions and equations of optimality lead with relative ease to asymptotic
formulas for mean and variance of the minimal selection time.Comment: 17 page
What Makes the Arc-Preserving Subsequence Problem Hard?
International audienceGiven two arc-annotated sequences (S, P ) and (T, Q) representing RNA structures, the Arc-Preserving Subsequence (APS) problem asks whether (T, Q) can be obtained from (S, P ) by deleting some of its bases (together with their incident arcs, if any). In previous studies [3, 6], this problem has been naturally divided into subproblems reflecting intrinsic complexity of arc structures. We show that APS(Crossing, Plain) is NP-complete, thereby answering an open problem [6]. Furthermore, to get more insight into where actual border of APS hardness is, we refine APS classical subproblems in much the same way as in [11] and give a complete categorization among various restrictions of APS problem complexity
Novel Techniques For Model-Code Synchronization
The orientation of the current software development practice requires efficient model-based iterative solutions. The high costs of maintenance and evolution during the life cycle of the software can be reduced by using tool-aided iterative development. This paper presents how model-based iterative software development can be supported through efficient model-code change propagation. The presented approach facilitates bi-directional synchronization between the modified source code and the refined initial models. The backgrounds of the synchronization technique are three-way abstract syntax tree (AST) differencing and merging. The AST-based solution enables syntactically correct merge operations. OMG's Model-Driven Architecture describes a proposal for platform-specific model creation and source code generation. We extend this vision with the synchronization feature to assist the iterative development. Furthermore, a case study is also provided
Analysis of the Relationships among Longest Common Subsequences, Shortest Common Supersequences and Patterns and its application on Pattern Discovery in Biological Sequences
For a set of mulitple sequences, their patterns,Longest Common Subsequences
(LCS) and Shortest Common Supersequences (SCS) represent different aspects of
these sequences profile, and they can all be used for biological sequence
comparisons and analysis. Revealing the relationship between the patterns and
LCS,SCS might provide us with a deeper view of the patterns of biological
sequences, in turn leading to better understanding of them. However, There is
no careful examinaton about the relationship between patterns, LCS and SCS. In
this paper, we have analyzed their relation, and given some lemmas. Based on
their relations, a set of algorithms called the PALS (PAtterns by Lcs and Scs)
algorithms are propsoed to discover patterns in a set of biological sequences.
These algorithms first generate the results for LCS and SCS of sequences by
heuristic, and consequently derive patterns from these results. Experiments
show that the PALS algorithms perform well (both in efficiency and in accuracy)
on a variety of sequences. The PALS approach also provides us with a solution
for transforming between the heuristic results of SCS and LCS.Comment: Extended version of paper presented in IEEE BIBE 2006 submitted to
journal for revie
A genetic algorithm with expansion and exploration operators for the maximum satisfiability problem
There are many problems that standard genetic algorithms fail to solve. Refinements of standard genetic algorithms that can be used to solve hard problems has caused considerable interest. In this paper, we consider genetic algorithms withexpansion and exploration operators for the maximum satisfiability problem
A hybrid algorithm for the longest common transposition-invariant subsequence problem
The longest common transposition-invariant subsequence (LCTS) problem is a music information retrieval oriented variation of the classic LCS problem. There are basically only two known efficient approaches to calculate the length of the LCTS, one based on sparse dynamic programming and the other on bit-parallelism. In this work, we propose a hybrid algorithm picking the better of the two algorithms for individual subproblems. Experiments on music (MIDI), with 32-bit and 64-bit implementations, show that the proposed algorithm outperforms the faster of the two component algorithms by a factor of 1.4–2.0, depending on sequence lengths. Similar, if not better, improvements can be observed for random data with Gaussian distribution. Also for uniformly random data, the hybrid algorithm is the winner if the alphabet is neither too small (at least 32 symbols) nor too large (up to 128 symbols). Part of the success of our scheme is attributed to a quite robust component selection heuristic
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
Beyond Word N-Grams
We describe, analyze, and evaluate experimentally a new probabilistic model
for word-sequence prediction in natural language based on prediction suffix
trees (PSTs). By using efficient data structures, we extend the notion of PST
to unbounded vocabularies. We also show how to use a Bayesian approach based on
recursive priors over all possible PSTs to efficiently maintain tree mixtures.
These mixtures have provably and practically better performance than almost any
single model. We evaluate the model on several corpora. The low perplexity
achieved by relatively small PST mixture models suggests that they may be an
advantageous alternative, both theoretically and practically, to the widely
used n-gram models.Comment: 15 pages, one PostScript figure, uses psfig.sty and fullname.sty.
Revised version of a paper in the Proceedings of the Third Workshop on Very
Large Corpora, MIT, 199
On the least exponential growth admitting uncountably many closed permutation classes
We show that the least exponential growth of counting functions which admits
uncountably many closed permutation classes lies between 2^n and
(2.33529...)^n.Comment: 13 page
- …