The Early Grades are Different: A Look at Classroom Observations

This is part two of a four-part blog series on teacher evaluation in the early grades.
Blog Post
Sept. 28, 2016

Picture yourself as an observer conducting a teacher evaluation, tasked with deciding where teachers fall on a scale of “ineffective” to “highly-effective,” potentially affecting their pay or job security. You walk into a science lesson on the conservation of mass in a fifth grade classroom. The desks are in rows and the students are listening to their teacher at the front of the room. She asks them to predict whether the ice cube on her desk will maintain its mass when it melts. Using what they learned in the previous night’s reading, she asks them to explain their predictions in their journals. After 10 minutes, she asks the students to share with the person next to them and then selects one student to read his answer aloud. By this time, the ice has melted and the students can see that the mass has remained the same.

Next you observe a science lesson in a kindergarten classroom, where the students are learning about the properties of different materials. With the children seated in a circle on the floor, the teacher reads Captain Kidd's Crew Experiments with Sinking and Floating. Afterwards, she asks the students why they think some objects sink and others float. She writes their ideas down on the whiteboard. Then she pulls out a few everyday items from a bag and asks the class to predict whether they will sink or float. After guessing together, the students return to their tables, which have been equipped with similar items and tubs of water. Working in small groups they test their ideas out for themselves. They draw pictures of the items that sink under the "Sink" heading and the items that float under the "Float" heading using a graphic organizer. The teacher walks around checking in with each small group, asking probing questions.

 Instruction in these two classrooms looks very different. As the observer, do you know whether it was good practice to have fifth graders write in journals rather than share with the whole class? Or whether there was the right balance between whole group instruction and student-centered learning in the kindergarten classroom?

Classroom observation allows principals or external observers to see teachers in action and offer feedback that can help them improve their practice. But high-quality teaching should look different from one grade to the next, especially in the early years. Notice how the lesson plans, classroom environments, and role of the teacher differ in these grades. To effectively promote high-quality teaching across all grade levels, evaluators need a keen understanding of these differences.

As teacher evaluation systems have been changing in recent years, many states and districts have updated their frameworks for observing teachers. However, many states continue to use one general framework across all grades. But observation tools are often created with a certain age group in mind and using them to evaluate teachers instructing different grades can be confusing or even unfair. For instance, some rubrics used for observing teachers in K–12 might be inconsistent with best practices in the early grades, or fail to clarify how to identify certain measures in classrooms where instruction looks different.

Lisa Guernsey and Susan Ochshorn’s 2011 paper, Watching Teachers Work: Using Observation Tools to Promote Effective Teaching in the Early Years and Early Grades, examines the importance of classroom observation as a tool to identify, promote and reward good teaching. While observations are increasingly likely to inform personnel decisions, they should also play a prominent role in helping teachers understand the parts of their practice that are most beneficial to children, and the parts that they can change to be more effective. As Guernsey and Ochshorn explain, “professional development and formal evaluations will need to go hand-in-hand, with data from observations bridging the two.”

In the early grades, a high-quality observation tool should emphasize the importance of certain types of interactions and teaching strategies that help students to gain academic skills in areas like language, literacy, and math, and to develop social-emotional skills. Teaching in these years should be hands on, young children should be engaged, teachers should be responsive and encourage children to build on their interests, and adults in the classroom should demonstrate an understanding of child development and learning. 

Observation tools designed specifically for pre-K classrooms usually acknowledge this. For instance, Head Start and many other pre-K programs use tools like the Classroom Assessment Scoring System (CLASS) to observe teachers. CLASS measures interactions related to emotional climate, classroom organization, and instructional support. Most state Quality Rating and Improvement Systems require the use of an observation tool like CLASS. However, in pre-K and child care centers, especially those outside of the public school system, these tools are usually used to measure overall program quality, as opposed to formally evaluate teachers. These types of tools are appropriate for measuring quality teaching, but they rarely meet state requirements for teacher evaluation. As pre-K is more and more commonly folded into the public school system, states and school districts need to ensure that their observation tools can accurately evaluate quality instruction in pre-K and early grade classrooms.

Several states and districts, such as Illinois and Washington, DC, recognize that using one classroom observation model for all grades and subject areas may be an ineffective or unfair way to evaluate teachers of younger children, specifically those in kindergarten through third grade, and pre-K when included. As such, they have developed separate rubrics, guidelines, or methods to better evaluate early educators. The lessons they have learned may help states that have not yet acknowledged the differences between evaluating teachers of young children and older students. 


Illinois has taken significant steps to ensure that early education teachers are evaluated on the practices that are best for young children. The state encourages districts to select one evaluation rubric for all staff, but acknowledges that teaching and learning in the early grades may require a different kind of tool.

The state is one of many that has approved the use of the Charlotte Danielson Framework for Teaching for teacher observations across grade levels, including pre-K. This framework, like many others, was created for use beginning in the upper grades of elementary school, raising concerns with how well the tool adapts to the early grades. To figure this out, researchers at the Center for the Study of Education Policy at Illinois State University (CSEP) conducted a validation study of the Danielson framework to determine if it is valid and reliable in the early grades.

CSEP spent the first year taking an in-depth look at the content of the Danielson framework to determine if it was aligned with what research says is important for children in pre-K through third grade. When comparing it to NAEYC’s Standards for Professional Preparation Programs, CLASS, and the Head Start standards, they found that overall, it aligns with developmentally appropriate practice. Unsurprisingly, Danielson is more academic than the early childhood-specific frameworks and has less emphasis on family engagement. According to Lisa Hood, director of the study, “this doesn’t mean Danielson can’t be used to evaluate social-emotional interactions and family engagement, it just needs to be more intentional.” 

Twenty-six teachers (14 pre-K teachers and 12 K–3rd grade teachers) in seven districts with a total of 620 students (50 percent in pre-K) participated in CSEP’s validation study. To test the framework’s inter-rater reliability, the researchers paired internal observers (principals/center directors) with trained external observers and compared their classroom observation ratings on 17 components. A comparison of the ratings showed an inter-rater reliability average of 67 percent, with agreement between the internal and external observers ranging as low as 42 percent in one component to as high as 92 percent in another. Internal observers tended to rate teachers higher than external observers on several components.

Based on these findings, CSEP is developing resources for the areas of the framework where inter-rater reliability was weakest, such as on using assessment, setting instructional outcomes, and more abstract concepts, like “developing respect and rapport” or creating a “culture of learning.” In June, its team embarked on a three-year project to develop videos showing best practices for how pre-K and kindergarten teachers and their principals can navigate the observation tool and evaluation process.

Early grade teachers and evaluators have access to extensive documents created by a group of early childhood stakeholders that outline multiple examples of what each component of the Danielson framework might look like in the early years. Hood says CSEP has received positive anecdotal feedback about the examples, but it has not collected systematic feedback on whether principals actually go back to their schools and use the tool. Principals also have access to trainings provided by the Illinois Principals Association and guidance created by the Illinois State Board of Education’s Performance Evaluation Advisory Council (PEAC) around PreK–3rd grade evaluation. 

While CSEP found that most teachers earn a “proficient” rating (the performance levels are unsatisfactory, basic, proficient, and distinguished) in Danielson, it is possible for the tool to differentiate early childhood educator performance. According to Hood, most of the challenges with Danielson in the early grades are “user-oriented issues, instead of with the framework itself. When used well and when people have strong understanding of early childhood practice, Danielson works well. When they don’t have this background, that’s when there’s an issue,” she says.

District of Columbia Public Schools

District of Columbia Public Schools (DCPS), which has been an oftentimes controversial pioneer when it comes to teacher evaluation reform, uses IMPACT, a self-created teacher evaluation system. IMPACT has been around since 2009, but a separate rubric to evaluate pre-K and kindergarten teachers that more accurately reflects developmentally-informed practice was created in 2011. The preK–K rubric was updated this year through a collaborative process that involved content area experts weighing in to ensure that it is appropriate for the youngest learners. The rubric includes the same broad practices as those of the older grades, but differs in the way that it describes their implementation.

 The rubric for grades 1–12 focuses on what observers should see students doing, whereas the preK–K rubric is more focused on whether the teacher is creating the conditions to make learning possible. Accordingly, the early childhood rubric looks more at teacher actions instead of independent student actions. As depicted in the examples below, the preK–K rubric evaluates teachers based on how well they encourage students to take certain actions or behave in certain ways, whereas the 1–12 rubric rates teachers for how well students take certain actions independently. The early childhood rubric mentions the importance of learning environments and how teachers can encourage meaningful work and play, rather than just work. The grade 1-12 rubric makes no mention of learning environments or play. Furthermore, it gives specific guidance that observers should “consider students’ developmental age when assessing” certain practices.

According to Stephanie Shultz, who works on IMPACT, the preK–K rubric aligns with the “context and structures you are most likely to see with young learners, such as station-driven learning, play, morning meeting, etc.” It also emphasizes language development, which is a crucial component of learning at this age.

The stakes are high for DCPS teachers: the observations make up a majority of evaluation scores for all preK–12 teachers. This year, for the first time, school principals will be the only ones using the rubrics to evaluate teachers; in the past, external observers (“master educators”) have played a prominent role. Hiring principals who are instructional leaders is a priority for the district and all principals receive extensive training and support from the IMPACT team to become familiar with the tool. It’s important that this training enables principals to distinguish high-quality instruction in a kindergarten class versus a fourth grade class. 

Shultz says DCPS has “received a lot of appreciation from early childhood teachers, who say they “see themselves in the rubric” and appreciate the distinction between this rubric and the one used with other students.” The district should consider extending the preK–K rubric into the first through third grades to reflect the full continuum of early childhood.

Teacher evaluation systems should reward good teaching and promote improvements in practice. Teaching young children requires different skills and strategies than those for older children, and the best observation tools acknowledge those differences. It can be difficult for a single tool to meet the needs of a teacher who is reading stories about floating boats and another who is teaching the law of conservation of mass, but specific guidance for teachers and observers on how standards and rubrics can be tailored for the early grades is one way to help ensure that evaluations accurately capture the quality of teaching.