Disappointing Trends—and a Potential Divergence—in States’ Teacher Evaluation Findings

This is the first in a series of posts examining states’ initial forays into implementing new PreK-12 teacher evaluation systems. This series will analyze recent state data on teacher evaluation implementation to highlight trends, lessons learned, and promising practices in using new evaluation systems to identify and improve educator quality.

Our first post provides an overview of the initial results from states’ new teacher evaluation systems and highlights one state—New Jersey—that may be bucking the trends.

As 2014 kicks off, new teacher evaluation systems* continue to be a hot—and divisive—topic across the nation. But since the first few states released data from their new evaluation systems showing that nearly all teachers continued to be rated in the top performance categories (based primarily on consistently high classroom practice/observation ratings), there’s been little national attention paid to states’ teacher evaluation result releases. A new wave of state data releases at the end of 2013 should spark renewed interest in these results—and more importantly, provide insight into how to maximize new evaluation systems’ potential to identify and improve teacher quality.

A now year-old Education Week article captures the explanations offered for why states weren’t seeing as much variation in results from their new teacher evaluation systems as many expected. One, coming from detractors of new evaluation systems, says uniformly high rating results are evidence that teachers are uniformly strong performers. Another, from supporters, suggests there has been inadvertent noncompliance with new systems’ classroom observation rating guidelines—that is, that it will take time to change school cultures where, historically, the expectation was that all teachers will earn the highest rating. (Unfortunately, there are also instances of district leaders deliberately and unapologetically choosing not to comply with evaluation guidelines and rating all of their teachers in the top performance categories.)

The truth—as in many cases—is probably somewhere in-between: while the majority of teachers are doing a good job (and some are doing an excellent job), a sizeable portion are struggling but continue to be told they are performing just as well as everyone else. And while there’s no “right” distribution of evaluation ratings, the data we do have indicate that many new systems continue to understate the proportion of teachers who are less proficient. For instance, one powerful finding from Tennessee’s recent Year 2 report on its evaluation implementation shows that teachers with the lowest student learning growth received an average observation score that was only about half a point lower, on a five-point scale, than the observation score of those teachers with the highest student learning growth. And at Lacoochee Elementary school in Pasco County, Florida, 100% of teachers were rated effective or highly effective, even though the school had received a “D” grade from the state three years in a row.

Exacerbating the issue of rating all teachers as high-performing is the fact that, the way most evaluation systems are designed, teachers don’t receive any targeted assistance or interventions unless they are identified in one of the bottom performance categories. All teachers can benefit from receiving honest, targeted feedback about areas for improvement and strengths. But it is especially important for those teachers who are less effective in helping their students learn (particularly since some research shows such teachers tend to be disproportionately assigned to historically-disadvantaged students).

These kinds of findings could understandably cause some handwringing by those of us who believe that, if implemented well, more rigorous evaluation systems can help bolster educator quality on behalf of our nation’s students. But New Jersey’s recent report detailing its educator evaluation pilot results offers a glimmer of hope—and a possible example for other states and districts undertaking this work. Here’s one illustration: the distribution of teacher practice (i.e., observation) ratings in the New Jersey districts that piloted the new evaluation system were much more varied than Florida’s overall  Year 2 evaluation ratings (where 98% of teachers were rated in the top two of four performance categories), although one would expect overall evaluation ratings—which include a student growth measure that is less subject to “inadvertent noncompliance”—to have a more varied distribution.

While New Jersey’s evaluation results are encouraging, the thoughtful process the state is undertaking to use the evaluation pilot learnings to share lessons learned and recommend practical advice for full implementation may be just as promising. Come back to Ed Central over the next several months, where we will dig deeper into New Jersey’s—and other leading states’—teacher evaluation findings and share our take on what we can learn from them.

* As a quick recap, many states are requiring new state- or district-developed evaluation systems that strive to be more rigorous and more objective than historical systems. They do this by factoring in multiple measures (which at a minimum include observations of classroom practice and a measure of impact on student learning growth) and incorporating at least three performance categories. Read New America’s most recent papers on teacher evaluation here and here."


Melissa Tooley is the director of Educator Quality with New America's Education Policy program. She is a member of the PreK-12 team, where she provides research and analysis on PreK-12 policies and practices that impact teaching quality and school leadership.