712 research outputs found

    Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version

    Full text link
    Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.Comment: 17 pages, 7 figure

    Note on Ward-Horadam H(x) - binomials' recurrences and related interpretations, II

    Full text link
    We deliver here second new H(x)binomials\textit{H(x)}-binomials' recurrence formula, were H(x)binomialsH(x)-binomials' array is appointed by WardHoradamWard-Horadam sequence of functions which in predominantly considered cases where chosen to be polynomials . Secondly, we supply a review of selected related combinatorial interpretations of generalized binomial coefficients. We then propose also a kind of transfer of interpretation of p,qbinomialp,q-binomial coefficients onto qbinomialq-binomial coefficients interpretations thus bringing us back to Gyo¨rgyPoˊlyaGy{\"{o}}rgy P\'olya and Donald Ervin Knuth relevant investigation decades ago.Comment: 57 pages, 8 figure

    Wilson loops in supersymmetric Chern-Simons-matter theories and duality

    Get PDF
    We study the algebra of BPS Wilson loops in 3d gauge theories with N=2 supersymmetry and Chern-Simons terms. We argue that new relations appear on the quantum level, and that in many cases this makes the algebra finite-dimensional. We use our results to propose the mapping of Wilson loops under Seiberg-like dualities and verify that the proposed map agrees with the exact results for expectation values of circular Wilson loops. In some cases we also relate the algebra of Wilson loops to the equivariant quantum K-ring of certain quasi projective varieties. This generalizes the connection between the Verlinde algebra and the quantum cohomology of the Grassmannian found by Witten

    On Special k-Spectra, k-Locality, and Collapsing Prefix Normal Words

    Get PDF
    The domain of Combinatorics on Words, first introduced by Axel Thue in 1906, covers by now many subdomains. In this work we are investigating scattered factors as a representation of non-complete information and two measurements for words, namely the locality of a word and prefix normality, which have applications in pattern matching. In the first part of the thesis we investigate scattered factors: A word u is a scattered factor of w if u can be obtained from w by deleting some of its letters. That is, there exist the (potentially empty) words u1, u2, . . . , un, and v0,v1,...,vn such that u = u1u2 ̈ ̈ ̈un and w = v0u1v1u2v2 ̈ ̈ ̈unvn. First, we consider the set of length-k scattered factors of a given word w, called the k-spectrum of w and denoted by ScatFactk(w). We prove a series of properties of the sets ScatFactk(w) for binary weakly-0-balanced and, respectively, weakly-c-balanced words w, i.e., words over a two- letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has c occurrences more than the other. In particular, we consider the question which cardinalities n = | ScatFactk (w)| are obtainable, for a positive integer k, when w is either a weakly-0- balanced binary word of length 2k, or a weakly-c-balanced binary word of length 2k ́ c. Second, we investigate k-spectra that contain all possible words of length k, i.e., k-spectra of so called k-universal words. We present an algorithm deciding whether the k-spectra for given k of two words are equal or not, running in optimal time. Moreover, we present several results regarding k-universal words and extend this notion to circular universality that helps in investigating how the universality of repetitions of a given word can be determined. We conclude the part about scattered factors with results on the reconstruction problem of words from scattered factors that asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word w P {a, b} ̊ can be reconstructed from the number of occurrences of at most min(|w|a, |w|b) + 1 scattered factors of the form aib, where |w|a is the number of occurrences of the letter a in w. Moreover, we generalise the result to alphabets of the form {1, . . . , q} by showing that at most ∑q ́1 |w|i (q ́ i + 1) scattered factors suffices to reconstruct w. Both results i=1 improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here. In the second part we consider patterns, i.e., words consisting of not only letters but also variables, and in particular their locality. A pattern is called k-local if on marking the pattern in a given order never more than k marked blocks occur. We start with the proof that determining the minimal k for a given pattern such that the pattern is k-local is NP- complete. Afterwards we present results on the behaviour of the locality of repetitions and palindromes. We end this part with the proof that the matching problem becomes also NP-hard if we do not consider a regular pattern - for which the matching problem is efficiently solvable - but repetitions of regular patterns. In the last part we investigate prefix normal words which are binary words in which each prefix has at least the same number of 1s as any factor of the same length. First introduced in 2011 by Fici and Lipták, the problem of determining the index (amount of equivalence classes for a given word length) of the prefix normal equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending the notion of critical words). We prove characterizations for both the palindromes and the collapsing words and show their connection. Based on this, we show that still open problems regarding prefix normal words can be split into certain subproblems

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore