Inter-rater reliability (IRR) has been the prevalent quality and precision
measure in ratings from multiple raters. However, applicant selection
procedures based on ratings from multiple raters usually result in a binary
outcome. This final outcome is not considered in IRR, which instead focuses on
the ratings of the individual subjects or objects. In this work, we outline how
to transform the selection procedures into a binary classification framework
and develop a quantile approximation which connects a measurement model for the
ratings with the binary classification framework. The quantile approximation
allows us to estimate the probability of correctly selecting the best
applicants and assess error probabilities when evaluating the quality of
selection procedures using ratings from multiple raters. We draw connections
between the inter-rater reliability and the binary classification metrics,
showing that binary classification metrics depend solely on the IRR coefficient
and proportion of selected applicants. We assess the performance of the
quantile approximation in a simulation study and apply it in an example
comparing the reliability of multiple grant peer review selection procedures