It is critical to utilize treatment integrity instruments to support the evaluation of evidence-based programs in early classroom contexts. However, in the early childhood field, guidelines for collecting treatment integrity data are underdeveloped. Consequently, most treatment integrity instruments employed in the field solely assess adherence, vary in design features and have little psychometric evidence supporting their use. As such, this represents a gap in the field that might slow efforts to implement evidence-based programs. The current study examines the score reliability and validity of an observational treatment integrity instrument (The BEST in CLASS Adherence and Competence Scale [BiCACS]; Sutherland et al., 2014). The BiCACS is designed to assess adherence and competence of the practices found in the BEST in CLASS program, a teacher-delivered evidence-based program for children at-risk for emotional and behavioral disorders. Data were drawn from observations of 179 teachers who were randomized to BEST in CLASS (n = 89) or business-as-usual (n = 90) and 416 children (n = 211 in the BEST in CLASS condition; n = 205 in the business-as-usual condition) at risk for emotional and behavioral disorders. Based on double-coded observations (25% of sample) the mean single-measure intraclass correlation (ICC[2,1]) was .74 (SD = 0.06) for the Adherence items and .46 (SD = 0.14) for the Competence items. The ICC(2,1) for the Adherence and Competence subscales were .81 and .43, respectively. Findings also suggested initial evidence of convergent and discriminant validity at the BiCACS item and subscale levels. The magnitude of correlations among the BiCACS items suggests that the adherence and Competence items overlap the most with items within the same subscale, but also measure distinct BEST in CLASS practices. At the subscale level, the correlation among the Adherence and Competence items are more related to each other than their correlations with scores on measures of child responsiveness, child engagement, closeness, and conflict of student-teacher relationships. Validity evidence at the subscale level suggests that the BiCACS can distinguish between intervention groups and detect change over time. The reliability and validity findings support the use of the BiCACS as a program evaluation instrument. Although, future research is still needed to replicate these findings and test the construct validity of the BiCACS with other instruments that assess adherence and competence. Still, results provide valuable information about the psychometric properties of a treatment integrity instrument used in early classroom contexts and inform the growing knowledge of this area in the field