We consider the problem of collaborative filtering from a channel coding
perspective. We model the underlying rating matrix as a finite alphabet matrix
with block constant structure. The observations are obtained from this
underlying matrix through a discrete memoryless channel with a noisy part
representing noisy user behavior and an erasure part representing missing data.
Moreover, the clusters over which the underlying matrix is constant are {\it
unknown}. We establish a sharp threshold result for this model: if the largest
cluster size is smaller than C1βlog(mn) (where the rating matrix is of size
mΓn), then the underlying matrix cannot be recovered with any
estimator, but if the smallest cluster size is larger than C2βlog(mn), then
we show a polynomial time estimator with diminishing probability of error. In
the case of uniform cluster size, not only the order of the threshold, but also
the constant is identified.Comment: 32 pages, 1 figure, Submitted to IEEE Transactions on Information
Theor