17 research outputs found

    Nullomers and High Order Nullomers in Genomic Sequences

    No full text
    <div><p>A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications.</p></div

    Number of first order nullomers (black filled circles, ⚫) compared with expected number of first order nullomers (red empty circle, ⚪) of size 14, as a function of the number of CpGs occurring in the sequences.

    No full text
    <p>The expected number of nullomers is computed considering random sequences with the same length of the human genome preserving dinucleotide frequencies.</p

    Phylogenetic trees of 11 species obtained by (first row) DC distance for nullomers (T1—on the left) and first order nullomers (T2—on the right); (second row) DJ distance for nullomers (T3—on the left) and first order nullomers (T4—on the right).

    No full text
    <p>Phylogenetic trees of 11 species obtained by (first row) DC distance for nullomers (T1—on the left) and first order nullomers (T2—on the right); (second row) DJ distance for nullomers (T3—on the left) and first order nullomers (T4—on the right).</p

    Distribution of average rise values (black line) for (panel a), (panel b) and (panel c). Average rise values for present sequences (green plot) are also reported in the three panels.

    No full text
    <p>Distribution of average rise values (black line) for (panel a), (panel b) and (panel c). Average rise values for present sequences (green plot) are also reported in the three panels.</p

    Size and mean gene distances on chromosomes.

    No full text
    <p>BP size = size in base pairs = number of nucleotides.</p

    The bp-distance between genes and is denoted as .

    No full text
    <p>The ppi-distance is the shortest path between nodes of the corresponding PPI network indicated by . In this example the internode distance is equal to one so that is greater than .</p

    Percentage of gene couples distances for the whole network (red plot) and for the same chromosome (green plot) for both networks considered: BIOGRID (lower panel) and MINT (upper panel).

    No full text
    <p>Percentage of gene couples distances for the whole network (red plot) and for the same chromosome (green plot) for both networks considered: BIOGRID (lower panel) and MINT (upper panel).</p

    Distributions of of shortest path distances for chromosomes 1–16 (from left top to right bottom) for both networks considered: BIOGRID (blue lines) and MINT (red lines).

    No full text
    <p>Distributions of of shortest path distances for chromosomes 1–16 (from left top to right bottom) for both networks considered: BIOGRID (blue lines) and MINT (red lines).</p
    corecore