Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public
databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences
because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic
genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification
of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete
and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in
the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are
automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a
simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially
downloadable in a variety of formats