Purpose This study measured observer variation in radiographic rating of elbow arthrosis. Methods Thirty-seven independent orthopedic surgeons graded the extent of elbow arthrosis in 20 consecutive sets of plain radiographs, according to the Broberg and Morrey rating system (grade 0, normal joint; grade 1, slight joint-space narrowing with minimum osteophyte formation; grade 2, moderate joint-space narrowing with moderate osteophyte formation; and grade 3, severe degenerative change with gross destruction of the joint). The kappa multirater measure (kappa) was used to estimate reliability between observers, with 0 indicating no agreement above chance, and 1 indicating perfect agreement. Results There was fair agreement in arthrosis ratings between surgeons. Surgeons with more than 10 years of experience had greater agreement than did surgeons with less experience, and surgeons who treated more than 10 elbow fractures per year had better agreement than did those treating fewer fractures. In post hoc analyses, 2 simplified binary rating systems (eg, "none or mild" vs "moderate or severe" arthrosis) resulted in moderate agreement among observers. Conclusions The 4 grades of the Broberg and Money classification system have only fair interobserver reliability that is influenced by subspecialty and experience. Binary rating systems might be more reliable. (J Hand Surg 2012;37A:755-759. Copyright (C) 2012 by the American Society for Surgery of the Hand. All rights reserved.