# Zipf&apos;s Law in Importance of Genes for Cancer Classification Using Microarray Data

## Abstract

Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf&apos;s law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the important&quot; genes and irrelevant&quot; genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known $\chi^2$ distribution of likelihood ratios. We discuss the implication of this Zipf&apos;s law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood

## Similar works

This paper was published in CiteSeerX.

# Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.