Cataloged from PDF version of article.Student evaluations of teaching (SET) have been the principal instrument to elicit students’ opinions in higher
education institutions. Many decisions, including high-stake ones, are made based on SET scores reported by
students. In this respect, reliability of SET scores is of considerable importance. This paper has an argument that
there are some problems in choosing and using of reliability indices in SET context. Three hypotheses were tested: (i)
using internal consistency measures is misleading in SET context since the variability is mainly due to disagreement
between students’ ratings, which requires use of inter-rater reliability coefficients, (ii) number of minimum feedbacks
is not achieved in most of the classes, resulting unreliable decisions, and (iii) calculating reliability coefficient
assuming a common factor structure across all classes is misleading because a common model may not be tenable for
all. Results showed that use of internal consistency only to assess reliability of SET scores may result in wrong
decisions. Considerable large numbers of missing feedbacks were observed to achieve acceptable reliability levels.
Findings also indicated that factorial model differed across several groups