9 research outputs found

    IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

    Get PDF
    Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.</p

    Comparing Degenerate Strings

    Get PDF
    Uncertain sequences are compact representations of sets of similar strings. They highlight common segments by collapsing them, and explicitly represent varying segments by listing all possible options. A generalized degenerate string (GD string) is a type of uncertain sequence. Formally, a GD string S is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length ki but this length can vary between different sets. We denote by W the sum of these lengths k0, k1,... , kn-1. Our main result is an (N + M)-time algorithm for deciding whether two GD strings of total sizes N and M, respectively, over an integer alphabet, have a non-empty intersection. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in linear space. We then apply our string comparison tool to devise a simple algorithm for computing all palindromes in S in (min{W, n2}N)-time. We complement this upper bound by showing a similar conditional lower bound for computing maximal palindromes in S. We also show that a result, which is essentially the same as our string comparison linear-time algorithm, can be obtained by employing an automata-based approach

    New genes for resistance to downy mildew (Peronospora parasitica ) in oilseed rape (Brassica napus ssp. oleifera )

    No full text
    The response of cotyledons of oilseed rape (Brassica napus ssp. oleifera) accessions to infection by isolates of Peronospora parasitica under controlled conditions was assessed on a 0-7 scale (disease reaction). In interactions scored 0-3, 3-5 and 6-7, the host was considered resistant, partially resistant and susceptible, respectively. Accession RES-26, selected from the spring oilseed rape cultivar Janetzkis, was partially resistant to isolate R1 and resistant to isolate P003 of P. parasitica, which distinguishes it from three previously described differential response groups ('A', 'B' and 'C') of accessions in B. napus. The resistance of RES-26 to isolate P003 seemed to be conditioned by a single, partially dominant gene and the resistance of RES-02, which belongs to group 'A' (resistant to R1 and P003), by two independent partially dominant genes. The gene for resistance to P003 in RES-26 is either closely linked, allelic or identical to one of the two genes for resistance in RES-02. Resistance of RES-02 to R1 is conditioned by a single, incompletely dominant gene. The genes for resistance to isolates R1 and P003 in RES-02 are either closely linked, allelic or identical. The cotyledonary leaves of each seedling responded independently when inoculated simultaneously each with a different isolate of the pathogen

    Efficient Identification of k -Closed Strings

    No full text
    A closed string contains a proper factor occurring as both a prefix and a suffix but not elsewhere in the string. Closed strings were introduced by Fici (WORDS 2011) as objects of combinatorial interest. This paper addresses a new problem by extending the closed string problem to the k-closed string problem, for which a level of approximation is permitted up to a number of Hamming distance errors, set by the parameter k. We address the problem of deciding whether or not a given string of length n over an integer alphabet is k-closed and additionally specifying the border resulting in the string being k-closed. Specifically, we present an (kn)-time and (n)-space algorithm to achieve this along with the pseudocode of an implementation and proof-of-concept experimental results
    corecore