5 research outputs found

    Revealing the Symmetry of Conifer Transcriptomes through Triplet Statistics

    Get PDF
    The novel powerful technique is used for a study of combinatorial and statistical properties of transcriptome sequences. The main approach stands on the study of distribution of nucleotide triplet frequency dictionaries obtained from the conversion of transcriptome sequences. The distribution is revealed through PCA presentation and elastic map technique. The transcriptomic data of Siberian larch (Larix sibirica Ledeb.) and Siberian pine (Pinus sibirica Du Tour) were studied. The transcriptomes exhibit unusual symmetries. The octahedral structure exhibiting rotational symmetry in transcriptome contig distribution was found for L. sibirica, while mirror symmetry was found for P. sibirica. The octahedron structure seems to be universal for plants

    Non-Coding Regions of Chloroplast Genomes Exhibit a Structuredness of Five Types

    Get PDF
    We studied the statistical properties of non-coding regions of chloroplast genomes of 391 plants. To do that, each non-coding region has been tiled with a set of overlapping fragments of the same length, and those fragments were transformed into triplet frequency dictionaries. The dictionaries were clustered in 64-dimensional Euclidean space. Five types of the distributions were identified: ball, ball with tail, ball with two tails, lens with tail, and lens with two tails. Besides, the multigenome distribution has been studied: there are ten species performing an isolated and distant cluster; surprisingly, there is no immediate and simple relation in taxonomy composition of these clusters

    Triplet Frequencies Implementation in Total Transcriptome Analysis

    Get PDF
    We studied the structuredness in total transcriptome of Siberian larch. To do that, the contigs from total transcriptome has been labeled with the reads comprising the tissue specific transcriptomes, and the distribution of the contigs from the total transcriptome has been developed with respect to the mutual entropy of the frequencies of occurrence of reads from tissue specific transcriptomes. It was found that a number of contigs contain comparable amounts of reads from different tissues, so the chimeric transcripts to be extremely abundant. On the contrary, the transcripts with high tissue specificity do not yield a reliable clustering revealing the tissue specificity. This fact makes usage of total transcriptome for the purposes of differential expression arguable

    Chloroplast Genomes Exhibit Eight-Cluster Structuredness and Mirror Symmetry

    Get PDF
    Chloroplast genomes have eight-cluster structuredness, in triplet frequency space. Small fragments of a genome converted into a triplet frequency dictionaries are the elements to be clustered. Typical structure consists of eight clusters: six of them correspond to three different positions of a reading frame shifted for 0, 1 and 2 nucleotides (in two opposing strands), the seventh cluster corresponds to a junk regions of a genome, and the eighth cluster is comprised by the fragments with excessive GC-content bearing specific RNA genes. The structure exhibits a specific symmetry
    corecore