Search CORE

26 research outputs found

Illustration of the proof of Theorem 3.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

Shown are two aligned copies of the section of the reference sequence where (black) represents the sequence between two equivalent deletions (green) and (magenta). The filled polygon illustrates the sequence identity that holds if and only if and are equivalent. Following the grey arrows up and down and from left to right, it can be seen that the sequence section consists of repeats of the deletion sequence until the last copy overlaps with the deletion sequence .</p

FigShare

Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date: 01/01/2013
Field of study

<div>There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence.</div

Directory of Open Access Journals

PubMed Central

FigShare

Illustration of Theorem 1.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

The first line is the reference sequence. The second and third lines contain two deletions. are the nucleotides. If , then the two deletions are equivalent.</p

FigShare

Code 2: Pseudocode for the identification of the region of ambiguity for all indels.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

The variable is the start position of the deletion, is the most downstream start position of equivalent indels and is the most upstream start position. Compared to the code of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0062803#pone-0062803-t002" target="_blank">Table 2</a> the index now cyclicly provides the nucleotide indel[] directly from the indel allele.</p

FigShare

Different variations lead to the same alternative sequence and, therefore, the variations are equivalent.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

The first variation is a deletion at position 15, the second variation is a deletion at position 14, and the third variation is a deletion at position 5.This deletion is annotated in Ensembl as lying in the start codon of transcript HRNR-001 and therefore leads to the loss of the start codon. The equivalent indel has no effect on the protein. (Sequences are all shown as reverse complementary, because the transcript is located on the reverse strand.) Regular characters denote the upstream region and bold, italic characters the coding sequence.</p

FigShare

Number of observed insertions (a) and deletions (b) versus length.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

There is strong correlation between length (x-axis) and frequency (y-axis) of the indels.</p

FigShare

Example of the prefix in a sequence.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

In this example, the missing sequence is = CATT which is repeated times and the prefix of the repeated sequence is CA, which is shown in bold italic.</p

FigShare

An example of equivalent deletions in a non-repetitive sequence on human chromosome .

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

The deletions are bold and italic. All four variations are equivalent although they are located in a non-repetitive sequence. All four variations are annotated in dbSNP. The alternative sequence is in each case ACGTGGG.</p

FigShare

An example of the algorithm code 1: A deletion sequence is shifted downstream to detect equivalent deletions.

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

The deletion sequence is printed in bold italic. The nucleotide following the deletion sequence is compared with the first nucleotide of the deletion. If both are equal, the variation is shifted 1 bp downstream, otherwise the algorithm terminates. In our example the algorithm terminates after the 6th alignment.</p

FigShare

Different functional classes of insertion rs55710688 on human chromosome at position .

Author: Armin O. Schmitt (409461)
Gudrun A. Brockmann (190888)
Jens Assmus (409460)
Jürgen Kleffe (61042)
Publication venue
Publication date
Field of study

The coding region is shown in lower case. The insertion rs55710688 is bold italic. Lines 1 and 2 represent the alignment as annotated in dbSNP: The insertion lies inside the coding region and causes a start lost. Lines 3 and 4 represent an alternative alignment: The insertion lies outside the coding region. The insertion does not affect the start codon ATG.</p

FigShare