68 research outputs found

    Sequence complexity.

    No full text
    <p>The numbers of 0, 1 to 5, 6 to 10, and 11 or more indicate the read frequencies. The numbers outside and inside parentheses are the average and standard deviation of sequence complexities, respectively. The complexities were calculated from the sequence data pools.</p

    Classification accuracies of <i>K</i>-nearest neighbor classifier and pattern effects.

    No full text
    0 vs. 1 or more, 0 vs. 6 or more, and 0 vs. 11 or more indicate problems for classification of the positions with no read and with 1 or more reads, 6 or more reads, and 11 or more reads respectively. Mono, multi, and all indicate the data sets which are composed of the features extracted from the distributions of mono-nucleotides, multi-nucleotides (di-, tri-, and tetra-nucleotides), and all nucleotides (mono- and multi-nucleotides), respectively. PEI was calculated from the classification accuracy for the feature set of all nucleotides and averaged over three classification problems.</p

    Classification accuracies of <i>K</i>-nearest neighbor classifier on the sequence complexity data.

    No full text
    <p>Classification accuracies of <i>K</i>-nearest neighbor classifier on the sequence complexity data.</p

    Global and local nucleotide distributions.

    No full text
    <p>X-axis shows mono- and dinucleotides and Y-axis displays the normalized nucleotide count. Numbers of 0, 1 to 5, 6 to 10, and 11 or more indicate the read frequencies.</p

    Euclidean distances between the distribution of all local nucleotides and the global nucleotide distribution.

    No full text
    <p>Numbers of 0, 1 to 5, 6 to 10, and 11 or more indicate the read frequencies. The number in parentheses is the p-value of t-test, where the alternative hypothesis is that the nucleotide distribution of the positions with no read is closer to the global distribution than the distribution of the positions with reads.</p

    Classification accuracies of indicated classifiers.

    No full text
    <p>Accuracies were averaged over four organisms. The number in parentheses is the standard deviation. 0 vs. 1 or more, 0 vs. 6 or more, and 0 vs. 11 or more indicate problems for classification of the positions with no read and with 1 or more reads, 6 or more reads, and 11 or more reads respectively. Random shuffling was performed 10 times for each data set.</p

    Generation of sequence data pool.

    No full text
    <p>Gray circles on reference genome indicate random positions, and black bars above reference genome indicate read starts at a given random position. Character “F” in data pool means feature.</p

    Data types and sources.

    No full text
    <p>Data types and sources.</p

    Pattern effect index for chromosome 1 of <i>Arabidopsis thaliana</i>.

    No full text
    <p>Original and corrected data indicate that reads are uncorrected and GC-corrected, respectively. 0 vs. 1 or more, 0 vs. 6 or more, and 0 vs. 11 or more indicate problems for classification of the positions with no read and with 1 or more reads, 6 or more reads, and 11 or more reads, respectively. PEIs were measured 5 times by the companion software and averaged. The number is the average PEI and the number in parentheses is the standard deviation.</p
    corecore