Despite significant advances in high-throughput DNA sequencing, many important
species remain understudied at the genome level. In this study we addressed a
question of what can be predicted about the genome-wide characteristics of less
studied species, based on the genomic data from completely sequenced species.
Using NCBI databases we performed a comparative genome-wide analysis of such
characteristics as alternative splicing, number of genes, gene products and
exons in 36 completely sequenced model species. We created statistical
regression models to fit these data and applied them to loblolly pine
(Pinus taeda L.), an example of an important species whose
genome has not been completely sequenced yet. Using these models, the
genome-wide characteristics, such as total number of genes and exons, can be
roughly predicted based on parameters estimated from available limited genomic
data, e.g. exon length and exon/gene ratio