17 research outputs found
Nullomers and High Order Nullomers in Genomic Sequences
<div><p>A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications.</p></div
CpG frequencies for each dinucleotide position for first order nullomers () in eleven different species: panel <i>a</i>) Human, Chimpanzee and Gorilla (yellow, dark yellow and light orange, respectively), panel <i>b</i>) Rat and Mouse (orange and light red, respectively), panel <i>c</i>) Opossum, Bovine, Goat and Lemur (red, dark red, light brown and brown, respectively), panel <i>d</i>) Chicken, and Rabbit (very dark brown and black, respectively).
<p>In panel <i>e</i> all the species are reported together.</p
Number of first order nullomers (black filled circles, ⚫) compared with expected number of first order nullomers (red empty circle, ⚪) of size 14, as a function of the number of CpGs occurring in the sequences.
<p>The expected number of nullomers is computed considering random sequences with the same length of the human genome preserving dinucleotide frequencies.</p
Phylogenetic trees of 11 species obtained by (first row) DC distance for nullomers (T1—on the left) and first order nullomers (T2—on the right); (second row) DJ distance for nullomers (T3—on the left) and first order nullomers (T4—on the right).
<p>Phylogenetic trees of 11 species obtained by (first row) DC distance for nullomers (T1—on the left) and first order nullomers (T2—on the right); (second row) DJ distance for nullomers (T3—on the left) and first order nullomers (T4—on the right).</p
Distribution of average rise values (black line) for (panel a), (panel b) and (panel c). Average rise values for present sequences (green plot) are also reported in the three panels.
<p>Distribution of average rise values (black line) for (panel a), (panel b) and (panel c). Average rise values for present sequences (green plot) are also reported in the three panels.</p
Z-score values for each chromosomes for MINT network, computed by equation 5, are reported as function of the threshold
<p>.</p
Size and mean gene distances on chromosomes.
<p>BP size = size in base pairs = number of nucleotides.</p
The bp-distance between genes and is denoted as .
<p>The ppi-distance is the shortest path between nodes of the corresponding PPI network indicated by . In this example the internode distance is equal to one so that is greater than .</p
Percentage of gene couples distances for the whole network (red plot) and for the same chromosome (green plot) for both networks considered: BIOGRID (lower panel) and MINT (upper panel).
<p>Percentage of gene couples distances for the whole network (red plot) and for the same chromosome (green plot) for both networks considered: BIOGRID (lower panel) and MINT (upper panel).</p
Distributions of of shortest path distances for chromosomes 1–16 (from left top to right bottom) for both networks considered: BIOGRID (blue lines) and MINT (red lines).
<p>Distributions of of shortest path distances for chromosomes 1–16 (from left top to right bottom) for both networks considered: BIOGRID (blue lines) and MINT (red lines).</p