10 research outputs found
Regular expression constrained sequence alignment revisited
International audienceImposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n^2t^4) time and O(n^2t^2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n^2t^3) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n^2t^3/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense
Efficient edit distance with duplications and contractions
Abstract We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n 2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP ( n ) = O n 3 log 3 log n log 2 n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O | Σ | n 3 log 2 n . In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1 log | D | n < λ < 1 , the algorithm preprocesses an n × n matrix in O n 2 + λ | D | time and O n 2 + λ | D | λ 2 log | D | 2 n space. Then, it may multiply the matrix with any given n-length vector in O n 2 λ 2 log | D | 2 n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest O n 3 log 2 n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O | Σ | nMP ( ñ ) ñ time and O ( | Σ | nñ ) space, where ñ is the length of the run-length encoding of the strings
Recommended from our members
Efficient edit distance with duplications and contractions
Abstract We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n 2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP ( n ) = O n 3 log 3 log n log 2 n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O | Σ | n 3 log 2 n . In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1 log | D | n < λ < 1 , the algorithm preprocesses an n × n matrix in O n 2 + λ | D | time and O n 2 + λ | D | λ 2 log | D | 2 n space. Then, it may multiply the matrix with any given n-length vector in O n 2 λ 2 log | D | 2 n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest O n 3 log 2 n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O | Σ | nMP ( ñ ) ñ time and O ( | Σ | nñ ) space, where ñ is the length of the run-length encoding of the strings
Algorithms for path-constrained sequence alignment
International audienceWe define a novel variation on the constrained sequence alignment problem in which the constraint is given in the form of a regular expression. Given two sequences, an alphabet describing pairwise sequence alignment operations, and a regular expression over , the problem is to compute the highest scoring sequence alignment of the given sequences, such that . Two algorithms are given for solving this problem. The first basic algorithm is general and solves the problem in time and space, where and are the lengths of the two sequences and is the size of the NFA for . The second algorithm is restricted to patterns that do not contain the Kleene-closure star, and exploits this constraint to reduce the NFA size factor in the time complexity to a smaller factor . is compacted by supporting alignment patterns extended by \emph{meta-characters} including general insertion, deletion and match operations, as well as some cases of substitutions. For a regular expression , these time bounds range from to , depending on the meta-characters used in . An additional result obtained along the way is an extension of the algorithm of Fischer and Paterson for String Matching with Wildcards. Our extension allows the input strings to include "negation symbols" (that match all letters but a specific one) while retaining the original time complexity. We implemented both algorithms and applied them to data-mine new miRNA seeding patterns in \textit{C. elegans} Clip-seq experimental data
Good Health Practices and Well-Being among Adolescents with Type-1 Diabetes: A Cross-Sectional Study Examining the Role of Satisfaction and Frustration of Basic Psychological Needs
Type 1 diabetes (T1D) is a chronic disease requiring medical adherence. However, among adolescents, non-adherence rates may reach up to 75%. Satisfaction or frustration with psychological needs is a crucial factor in the motivation and management of health-related behaviors. This study aimed to examine the differences in good health practices and psychological and physical well-being among adolescents with and without T1D and the mediating role of satisfaction and frustration of psychological needs on the association between good health practices and well-being in this population. A total of 94 adolescents (42 with T1D, 52 healthy controls, mean age 14.83 ± 1.82 years) completed questionnaires assessing good health practices, satisfaction or frustration of psychological needs, and well-being. Adolescents with T1D reported lower levels of physical well-being compared to healthy controls. Satisfaction or frustration of psychological needs had an effect on good health practices and psychological and physical well-being among healthy controls. Among adolescents with T1D, satisfaction or frustration of psychological needs was related to psychological well-being and partially related to physical well-being, but not to good health practices. The results demonstrate that the satisfaction or frustration of psychological needs has a unique effect on health behaviors and well-being among adolescents with T1D. This calls for further examination of the underlying mechanisms involved in health-related behaviors and well-being among adolescents with T1D