High-Stakes For Whom? Understanding Principal Behavior in Rating Teacher Performance

Blog Post
Aug. 8, 2017
While the last five years have brought significant changes* to the design of K-12 schools’ teacher evaluation systems, we have not witnessed a corresponding increase in differentiation among teacher performance ratings. Evaluation systems still typically rely most heavily on observations of classroom practice—which are usually conducted by school principals—and these observation ratings tend to be confined to the top rating categories. Why is it that principals consistently assign high ratings to most of their teachers? Is it because principals really perceive most of their teachers to be high performers, or are principals not reflecting their true perceptions in these ratings?

Last month, Education Week published an article with a title that appeared to confidently answer these questions: Want Principals to Rate Teachers Honestly? Take Away the Stakes. While the headline certainly grabs the reader’s attention, critics of stakes in teacher evaluation systems should look a bit more closely at the research before becoming too gleeful: the reality is that the research the article highlights doesn’t quite support that assertion.

The Education Week article covers a recent study by academic researchers Jason Grissom and Susanna Loeb that compares 100 principals’ summative teacher evaluation ratings in a “high-stakes” environment to their evaluations of those teachers in a “low-stakes” environment. The researchers simulated a low-stakes environment by conducting confidential one-on-one interviews with principals where they were asked to rate some of their teachers. The study concluded that principals assign mostly positive ratings in both situations, but there is far more variation in the low-stakes ratings than in the high-stakes ones (although both were predictive of teachers’ value-added ratings).

These findings should be interpreted with the caveat that there are some limitations to the study. The instruments that were used in the low-stakes and high-stakes evaluations were different. The high stakes evaluation instrument asked principals to rate teachers on seven high-level standards (such as knowledge of learners, communication, and instructional planning), while the low-stakes evaluation instrument asked principals to rate teachers on 10 specific items (such as high test performance, improving critical thinking, and helping with school leadership). Also, the rating scales principals used for the two assessments were different; the low-stakes assessment used a more nuanced rating scale than the high-stakes one (a six-point scale vs. a four-point scale, respectively), which may have led principals to in turn be more willing to provide a below average rating.

In addition to the differences in instruments used, there are limitations to the real-world applicability of the “low-stakes” scenario, which the research paper’s authors readily admit. The simulated low-stakes scenario created a confidential interview rating process, which does not reflect the standard evaluation process where teachers are informed of their rating and provided associated feedback, regardless of the existence of “stakes.”

There are a variety of reasons why teacher evaluations would likely still result in overly positive ratings from principals even without high stakes for teachers—and many of them are actually related to the stakes for principals in issuing low ratings. Grissom explained to Education Week’s Liana Loweus that principals conducting teacher evaluations “are capable of differentiating, but they also face really strong incentives to not fully differentiate when they know there are potential job consequences for their teachers or consequences for their own relationships with their teachers (emphasis added).” Grissom and Loeb are not the first to explore how concerns about creating tense relationships or damaging school culture can lead principals to inflate their assigned ratings of teachers. Research by Kraft & Gilmour identified “personal discomfort” as well as several other explanations for why principals often rate teachers highly, even if their true perceptions do not match that rating, including:

  • Time constraints: Principals have to observe a teacher and collect evidence to back up a low rating. Principals then have to provide feedback and create improvement plans to support low-performing teachers. This kind of increased workload may cause principals to use low-performing ratings sparingly.

  • Teachers’ potential and motivation: Rather than risking losing of a teacher they see as having potential, principals may assign teachers—particularly newer ones—a slightly higher rating in order to keep them motivated and receptive to feedback.

  • The challenge of removing and replacing teachers: Principals may not assign a low-rating in order to avoid the time and financial burden associated with what is often a lengthy dismissal process. They may also want to avoid dismissing a teacher for fear of being forced to fill that vacancy with a lower-quality replacement from an excess pool.

In order to address some of these challenges, policymakers should consider changes to principals’ assigned responsibilities, as well as to their professional development and evaluation processes. States and districts could reassess whether principals should be solely responsible for observing and providing feedback to teachers or if those, or other, responsibilities could be distributed to others with sufficient expertise. Doing so—along with encouragement and resources to provide more frequent informal observations, feedback, and opportunities for meaningful coaching by principals or other school staff—could create cultures that enable more honest, trusting performance conversations. States and districts could invest in more training for principals around how to have difficult conversations with staff about developing their practice, as well as providing principals with clear guidance on how to collect evidence and artifacts to support performance ratings. Education system leaders can also ensure that principal supervisors have the skills and capacity to evaluate principals’ execution of teacher evaluation and support systems in these key areas.

Most of policymakers’ attention has been placed on whether, and to what degree, test scores should be used in teacher evaluations. But observation of teacher practice typically makes up half or more of a teacher evaluation, making the observer’s role in evaluation even more critical. While high stakes associated with evaluation ratings may not encourage principals to submit honest ratings of their teachers, eliminating stakes is not going to fix the lack of differentiation in teacher evaluation. If we want teacher evaluation systems to better differentiate performance and promote professional growth, school systems should lower the stakes for principals by providing them sufficient time and support in accurately evaluating teacher practice, providing honest feedback, and following up with meaningful ways to address any areas identified for improvement.


* As a quick recap, over the last five years, many states began requiring new state- or district-developed evaluation systems that strive to be more rigorous and more objective than historical systems. They do this by factoring in multiple measures (which at a minimum include observations of classroom practice and a measure of impact on student learning growth, such as value-added models, and incorporating at least three performance rating categories. Some systems, such as District of Columbia Public Schools, also require or encourage “high-stake” consequences, such as dismissal for the lowest performers or bonuses for the highest performers.

Related Topics
Evaluation and Professional Development Teachers and Leaders Evaluation Systems