Gene annotation databases (compendiums maintained by the scientific community
that describe the biological functions performed by individual genes) are
commonly used to evaluate the functional properties of experimentally derived
gene sets. Overlap statistics, such as Fisher's Exact Test (FET), are often
employed to assess these associations, but don't account for non-uniformity in
the number of genes annotated to individual functions or the number of
functions associated with individual genes. We find FET is strongly biased
toward over-estimating overlap significance if a gene set has an unusually high
number of annotations. To correct for these biases, we develop Annotation
Enrichment Analysis (AEA), which properly accounts for the non-uniformity of
annotations. We show that AEA is able to identify biologically meaningful
functional enrichments that are obscured by numerous false-positive enrichment
scores in FET, and we therefore suggest it be used to more accurately assess
the biological properties of gene sets