In this paper, we aim to give a tutorial for undergraduate students studying
statistical methods and/or bioinformatics. The students will learn how data
visualization can help in genomic sequence analysis. Students start with a
fragment of genetic text of a bacterial genome and analyze its structure. By
means of principal component analysis they ``discover'' that the information in
the genome is encoded by non-overlapping triplets. Next, they learn how to find
gene positions. This exercise on PCA and K-Means clustering enables active
study of the basic bioinformatics notions. Appendix 1 contains program listings
that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet
usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich
ones. Animated 3D PCA plots are attached as separate gif files. Topology
(cluster structure) and geometry (mutual positions of clusters) of these plots
depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes
and additional animated 3D PCA plot