The ‘Race to the Top’ Winners: Evaluating Quality Ratings Systems

Laura Bornfreund

Feb. 8, 2012

This is the second post in a series on winners of the Race to the Top – Early Learning Challenge (RTT-ELC), the Obama Administration’s competition to spur improvements in early learning for children up to age 5. Earlier this week, we wrote about states’ plans to use quality rating and improvement systems (QRIS). In this post, we’ll look at how states plan to evaluate those systems. Later posts will explore plans to improve early learning standards, develop the early childhood workforce and implement kindergarten entry assessments.

State and federal policymakers seem to have bought into the idea that rating systems are the best mechanism to encourage early learning programs to improve their quality and assist parents in selecting centers that will best prepare their children to learn.

The weight given to Quality Rating and Improvement Systems (QRIS) in the Obama Administration’s Race to the Top – Early Learning Challenge is a prime example. Applicants had to show that they were using or developing “tiered” systems that showcased varying levels of quality, with exemplar centers earning, say, 5 stars, and low-quality centers not earning any stars at all.

But we actually don’t know much about how successful QRISs are at doing those things. Up to this point, there has been limited research on the effectiveness of quality rating systems as a strategy for improving childcare center quality and even less research on improving learning outcomes for children, especially based on school readiness measures.

So, to have the most impact on early learning, are the investments in QRIS justified? In a recent article for The New Republic, Sara Mead shares some concerns:

“Unfortunately, there’s not much evidence that creating QRIS will produce any significant improvements in children’s readiness to learn. Because many of these programs are relatively new, there is little research on their effectiveness. The research that does exist is not encouraging: A study of Colorado’s acclaimed Qualistar QRIS by researchers at the RAND Corporation found little to no evidence of a relationship between childcare programs’ star ratings and child outcomes.

“Further, increased use of QRIS could have a number of unintended negative consequences. Since the proposed rating system places a heavy emphasis on costly inputs like classroom furnishings and teacher education, QRIS could drive up costs at a time when many families are already struggling to afford child care and cash-strapped states are ill-equipped to make large new investments.”

The Early Learning Challenge does provide the opportunity for more rigorous research on the effectiveness of QRIS to improve quality and learning outcomes for children, including high-need children, but at the same time requires states to put a significant amount of resources into planning, developing, implementing a sustaining their QRIS. Hopefully the evaluations will provide evidence of improved program quality and children’s learning outcomes and not that the costs of the system outweigh the benefits.

Evaluation of tiered QRISs was one of five components under the “core area” in the RTT-ELC application titled “High Quality, Accountable Programs.” Specifically, states were asked to:

Validate the effectiveness of their QRIS in at least two ways:
- Whether the tiers accurately reflect differential levels of program quality; and
- The extent to which changes in quality ratings are related to progress in children’s learning, development and school readiness.

Including student outcomes has not been required of QRIS in the past. Most states’ systems have focused primarily on inputs such as teacher qualifications, child-to-staff ratios, curricula, program administration, physical environment and the like. Some or all of these inputs likely have a positive effect on student learning, but policymakers need evidence to back up that theory.

Reviewers of states’ applications gave especially high marks to three states:

A Core Area of the RTT-ELC Application: High Quality, Accountable Programs
Sub-section	Point Value	High-scoring States
Validate the tiered QRIS	15	Michigan (15), North Carolina (15), Pennsylvania (14.2)
*Key: = non-winners**

Note that Michigan, a non-winning state, earned the full 15 points for its plan too. Michigan came in ninth place for the “High Quality, Accountable Programs” core area, but did not make it into the winners circle overall. The other winners ranged from California with 9.8 points to Delaware with 13.8 points. There were several other non-winning states that fell within this wide range, including Illinois, Oregon, New Jersey, Colorado and Nebraska. Those five tied with or scored just below Delaware.

Among the winners, North Carolina was the only state to earn a perfect score for its validation plan. Remarking on the state’s plan, one peer reviewer said, “NC has a strong plan to build on previous validation studies of the NC TQRIS to demonstrate that the tiers reflect meaningful differences in quality.” Another reviewer noted, “Their process evolves from prior validation study of their TQRIS and builds upon it to gather additional and important information to link quality early learning and development programs to great child outcomes.”

Sampling of past research in winning states on QRISs

As noted, the evidence base for the impact of QRIS is thin so far. But there have been some studies of rating systems that include information on children’s performance. Here are summaries of the research that states included in their RTT-ELC applications. (We did not review the complete studies.)

A three-year independent evaluation was conducted on Minnesota’s “Parent Aware” QRIS pilot. One of the areas researchers measured was children’s progress overtime. Evaluators recruited 4-year-old children from rated programs and assessed in the fall and spring using multiple assessment tools that included the same domains as Minnesota’s kindergarten readiness assessment. Researchers reported that children, overall (the results were not linked to individual children), made significant gains from the fall to the spring on measures of expressive and receptive vocabulary, print knowledge, phonological awareness, reduced anxiety/withdrawal and persistence, but no gains were observed in early mathematics. Researchers, however, did not compare children’s progress in rated programs with children’s progress in unrated programs. Minnesota also analyzed the tiers of its QRIS to determine whether they accurately differentiated quality levels. While fully rated 4-star programs outscored programs at other levels on several observation-based measures of the learning environment such as the CLASS and ECERS-R, in some cases there was not great variation between 3- and 4-star programs.

North Carolina has conducted two separate validation studies of its QRIS, the first in 2001 and the second in 2010. For the 2010 study, the University of North Carolina-Greensboro collected data on toddler and preschool classrooms from childcare centers at all five star levels. Researchers found that classrooms in 4- and 5-star programs received significantly higher scores on observation-based assessments than classrooms in 1- through 3-star programs, but no significant differences between 4- and 5-star programs. This finding is the driver of North Carolina’s RTT-ELC validation study, focusing on differentiations of quality in its upper tiers.

UNC-Greensboro also looked at relationships among star ratings, quality measures and children’s social and cognitive skills and emotional experiences. Researchers found that, in classrooms that received high scores on CLASS and ECERS-E (two observation-based assessments of the learning environmental assessment), preschool children demonstrated more ability to distinguish what they know from what they see Children also exhibited more flexible thinking when their outdoor environments were rated at a higher level. Toddlers were perceived to have fewer behavior problems in high-scoring classrooms. In its application North Carolina said its next steps would be to examine progress over time in a broader range of skills and further differentiate children’s development, learning and readiness for school at each star level.

Ohio has had three independent evaluations of its QRIS, “Step Up to Quality.” The most recent in 2011 found that after controlling for family characteristics and children’s demographics, children in tier 3 programs (Ohio’s highest tier at the time) performed better on measures of literacy- and math-based standardized tools than children attending programs in lower tiers. Additionally, when matched with a sample of 12 non-rated programs, rated-programs score much higher than non-rated programs on many teacher quality and child outcome measures used. This may be some of the best data out there showing an impact on school-readiness measures, , but the study was relatively small, including two teachers and five children from each classroom in 36 programs. During this study, evaluators asked parents’ permission to link individual child data collected during the study to the child’s results on Ohio’s current kindergarten readiness assessment, which focuses on literacy. These results weren’t in as of the submission of Ohio’s application. And for the RTT-ELC grant, Ohio is expanding the scope of its kindergarten assessment.

A sampling of what winners plan to study and how they plan to do it

For its evaluation North Carolina plans to focus on its upper tiers of QRIS. The study will also look at family child care (excluded from previous studies) and include infants and toddlers. And the state plans to document program features most closely associated with differences in outcomes among children with high-needs and identify quality features that distinguish programs of the highest quality from the rest.

Massachusetts intends to validate the “self-assessment” that must be completed by programs participating in its QRIS. The state plans to “externally validate” whether the results of the self-assessments are good indicators of differences between programs at Levels 3 and 4. Massachusetts plans to audit programs in the study sample to determine whether the self-assessments are accurate, identify inaccuracies and explore the reasons for inaccuracies with the respective program directors.

Maryland plans to conduct two studies. First, for all programs participating in Maryland’s system, called EXCELS, evaluators will collect program, staff and child-level data on quality indicators, professional development and children’s learning progressions at multiple times during the pre-K years of and at the start of kindergarten. These data will be used to provide a snapshot of Maryland’s programs, as well as look at the relationship between quality indicators and children’s learning.

The second study will take a deeper look at a randomly selected state-funded pre-K, preschool centers and home-based programs. It will study two separate cohorts of children that begin participating in EXCELS centers in 2013 and 2014. Evaluators will collect data from classroom observations and parent and teacher surveys and use it to identify the associations between teacher-child interactions, using CLASS, and children’s learning. According to Maryland’s application, using two different cohorts will allow researchers to capture and evaluate change in programs, classroom and child-level data longitudinally as programs begin their second year of participation in EXCELS. Additionally following a cohort of classrooms longitudinally using Maryland’s Early Childhood Data Warehouse and Longitudinal State Data System, researchers the impact of EXCELS indicators and CLASS scores on Maryland’s kindergarten readiness assessment and the Maryland State Assessment that children take in later grades.

Once again California is an outlier. Because it won’t have a statewide QRIS, California officials don’t see evaluation of every region’s QRIS as feasible. So the state will select a subset of consortia for validation by an independent evaluator. Peer reviewers expressed concerns with this approach, noting “The consortia that will be evaluated may not represent the diversity from all the consortia.” Additionally, the structure and tiers of each region’s QRIS could end up vastly different. Without evaluating each system, there’s little way to tell if they have comparable results for improving program quality.

The results from states’ evaluations will surely benefit the field. They will provide more information on whether these rating systems are an accurate measure of quality, particularly in those states examining which quality indicators seem to matter most. The studies could also provide important insight into how program quality affects children’s learning and development outcomes and kindergarten readiness.

Look for our next post in this series, which will explore states’ plans to improve early learning standards.

For more on how QRISs work, see our 2009 issue brief as well as resources from the National Association for the Education of Young Children, the Early Learning Challenge Collaborative and the Center for Law and Social Policy.

Also be sure to visit our special page on the Race to the Top – Early Learning Challenge for continuing coverage.

The ‘Race to the Top’ Winners: Evaluating Quality Ratings Systems

Blog Post

Feb. 8, 2012