Evaluating the Design of Quality Rating and Improvement Systems

Kristen Loschert

May 14, 2014

State and federal policymakers know that simply providing children with access to preschool programs isn’t enough to prepare them for kindergarten. The quality of those programs matters too. In an effort to monitor and improve the quality of their child care and early education programs, 49 states and the District of Columbia have implemented quality rating and improvement systems (QRISs), or are planning such programs, on a statewide, regional, or pilot scale. (In Missouri, legislative action is required to implement a QRIS.)

QRISs evaluate child care centers and preschool programs against a set of indicators, such as staff education or teacher-child ratios, and assign programs a quality or “star” rating, typically on a four-point scale, based on their performance against those measures. In theory, a higher rating level indicates a better quality program, but only if the different rating tiers reflect meaningful differences in program quality.

Trying to compare the effectiveness and outcomes of different QRISs proves challenging, though, since the content and structure of these rating systems varies by state.

A new study from the Office of Planning, Research, and Evaluation (OPRE) at the U.S. Department of Health and Human Services offers some insight into the ways structural differences affect the function and effectiveness of different QRIS models. Specifically, researchers determined that the structure of a QRIS affects the distribution of programs across the system’s different rating levels and also influences how rating levels correlate with differences in observed program quality.

The report, Implications of QRIS Design for the Distribution of Program Ratings and Linkages between Ratings and Observed Quality, focuses on three types of QRIS structures:

Block structure specifies a set of standards for each quality level. To progress to a higher tier, a program must satisfy all of the standards of that new level and the ones below it.
Points structure assigns points to each quality indicator and programs receive a total quality score. This structure assigns a range of scores to each rating tier.
Hybrid structurecombines elements of the block and points structures. While approaches in this model vary, this system typically uses a block structure for the lowest rating tiers and assigns points for the upper tiers.

In the OPRE study, researchers developed a uniform set of quality indicators and created a hypothetical QRIS model for each of the three structures using a subset of data found in the Early Childhood Longitudinal Study-Birth cohort (ECLS-B). ECLS-B is a nationally representative sample of about 14,000 children tracked from their birth in 2001 until kindergarten. The hypothetical models allowed the researchers to evaluate the same preschool programs against the same quality indicators between the three different QRIS structures and isolate the effects of structure alone.

“Overall, programs received lower ratings in the block structure, higher ratings in the points structure, and middle to high ratings in the hybrid structure,” according to the report. In the block structure, 83 percent of programs received a 0, 1, or 2, the lowest possible ratings, while under the points structure, 81 percent of programs received a 3 or 4, the highest possible ratings. Essentially, the results demonstrate that a preschool program could receive a drastically different rating depending on the QRIS used to evaluate it.

Part of the variation stems from how the quality indicators responded to the rating systems―different quality indicators demonstrated different scoring patterns depending on the structure of the QRIS. For instance, programs generally received high scores in areas of health and safety, assessment, and accreditation and low scores on family partnerships regardless of the QRIS structure. Program scores for teacher and director qualifications, however, varied depending on the structure of the QRIS. Under the block structure, most programs received high scores in those areas across all rating levels, despite low overall program ratings. The points and hybrid structures, meanwhile, reflected more incremental increases in the scores for teacher and director qualifications between the rating tiers. One would expect this pattern “based on the scoring rules for points and hybrid structures,” as the report explains, “but it also highlights how a low overall rating in the block structure can ‘mask’ high scores in individual quality categories.” That means a program with a low overall rating still could have highly skilled teachers, but the rating system might not reflect that.

Finally, the researchers looked at the correlation between observed classroom quality, as measured by the Early Childhood Environment Rating Scale-Revised (ECERS-R), and rating levels to determine whether different rating levels truly reflect significant differences in program quality. The ECERS-R assesses various classroom components, from the physical environment to basic child care to interactions between staff, children, and parents, and is used widely to measure program quality.

Although the researchers found that higher ECERS-R scores correlated with higher program ratings regardless of QRIS structure, “the points structure was the only structure in which observed quality was significantly different between each level,” the report states. Furthermore, the points structure was the only structure that demonstrated significant quality differences between the top two tiers. These findings have meaningful implications for states, particularly those involved in the Race to the Top-Early Learning Challenge (RTT-ELC). RTT-ELC requires participating states to validate their QRISs to show that their rating tiers reflect meaningful differences in program quality.

The question remains whether existing indicators, like ECERS-R scores, are even the most reliable measure of program quality.

The question remains whether existing indicators, like ECERS-R scores, are even the most reliable measure of program quality. RTT-ELC also requires states to show that differences in their quality ratings relate to progress in children’s learning and development. A recent study by Terri Sabol of the Institute for Policy Research at Northwestern University and Robert Pianta of the Curry School of Education at the University of Virginia found little evidence that ECERS-R program scores relate to children’s development. In fact, Sabol and Pianta found no significant connection between ECERS-R scores and children’s academic, language, and social-emotional development at kindergarten.

As states continue to refine their quality rating systems, they must ensure that the components they measure relate to children’s learning and development and that the established rating tiers represent genuine differences in those indicators. After all, if the ratings do not reflect significant differences in program quality, than the QRIS is meaningless.

Evaluating the Design of Quality Rating and Improvement Systems

Blog Post

May 14, 2014