We Need to Assess Assessments

Fredrik deBoer

March 24, 2016

Recently, I’ve found myself defending the value of quantitative assessment to humanists. This is perhaps not unusual—I am sure that many an English professor has scoffed at political scientists’ data-driven research—except that I myself am a humanist.

I began my doctoral education in Purdue University’s Rhetoric & Composition Program in August 2011. Rhetoric & composition is a subfield of English that concerns argument, persuasion, and the teaching of writing. The field tends toward qualitative work such as ethnography or narrative research. Quantitative work is relatively rare. And the humanities writ large have often been skeptical of assessment, thanks to fears that the values and less utilitarian intellectual abilities the humanities are intended to inculcate are not easy to assess. At the same time, I am an advocate for a robust engagement with assessment by scholars in the humanities. In a time where our programs and funding are at risk, we have to demonstrate our value to our institutions and communities, including in terms that may not always come naturally to us. We also have a responsibility, given the cost and importance of a college education, to ensure that we are delivering on our responsibilities when it comes to student learning. For this reason, I have long tried to engage with members of my disciplinary community about the importance of careful assessment of our programs. To put it very simply: When it comes to higher education, measurement matters.

But because measurement matters, when I’m not speaking to fellow humanists, such as when I work and speak with those in educational testing or experimental psychology, I often make the opposite case, calling for greater humility in our assessment efforts and urging caution as we proceed forward. Assessment is a touchy subject for a reason: In a very direct and real way, assessment asks what we value as educators and as institutions. To undertake the assessment process without care risks making decisions that are unfair to students and teachers alike. Any new systems of higher education assessment must remain aware of a set of limitations and problems that can influence our analysis. And our current system of assessment is indeed limited and problematic.

For example, consider the use of broad tests of critical thinking or general learning to assess departments, majors, or programs, such as those discussed in my recent paper for New America, “Standardized Assessments of College Learning: Past and Future.” These tests, typically administered by nonprofit and for profit testing companies that also develop tests for K-12 education, tend to forego specific disciplinary knowledge in order to make them applicable to entire student bodies; you can hardly blame a nursing student (or a nursing department) if he or she hasn’t learned much about agricultural economics. Tests like those described in my paper are therefore designed to measure cognitive abilities and skills that are not subject-matter based but contribute to more general competencies, such as abstract reasoning, quantitative literacy, the ability to persuade, or reading for deep understanding. Ideally, this broader focus allows such tests to give us useful information about the mental flexibility and adaptability of college students, which they will surely need in a constantly-evolving workplace, in whatever field that may be.

The trouble is that this same diffuse nature makes it very difficult to ascribe a causal relationship between particular classes and particular abilities. Though a student might gain useful knowledge about computer science outside of the classroom, we can feel confident that a coding test delivers meaningful information about how much the student has gained in their computer science courses. The broad nature of general instruments like those discussed in my paper, however, makes this kind of analysis difficult or impossible. College students, after all, learn in many different contexts. They develop intellectually in their major classes, in their minors, in their electives, in their extracurricular activities, on study abroad, in their time at the library, and in their pleasure reading in the dorms. Indeed, even students of college age who don’t attend school at all can be reliably expected to see some cognitive development simply through the maturation of their brains and the general advantages of age and experience. While we might reasonably decide that certain students show an inadequate amount of learning from freshman to senior year in total, we generally cannot say with fairness or accuracy whether a given major or department is doing an adequate job of teaching via such tests. We will instead have to use a concert of methods, including disciplinary assessment and post-graduation surveys, to make such determinations.

Consider the impact of differences in prerequisite ability between student bodies on an institutional level. The competitive landscape of college admissions, after all, is defined by the great lengths exclusive colleges go to in order to secure profoundly dissimilar incoming student bodies. We should be clear in pointing out that attending an exclusive college is quite rare; of the approximately 3,000 accredited 4-year institutions of higher education in the United States, perhaps 125 reject more students than they accept. A large majority accept almost every student who applies. Still, selective colleges attract the lion’s share of attention in our educational debates, and these schools are among the schools most likely to be “cross shopped” by students and parents.

The evidence tells us that the admissions process clearly has a large impact on the incoming ability of a given college’s student body, which in turn has serious consequences for how we evaluate performance on standardized tests. The developers of two of the tests discussed in my paper, the Council for Aid to Education (which develops the CLA+) and the Educational Testing Service (which develops the Proficiency Profile) have each released data that demonstrates a strong relationship between an institution’s average incoming SAT scores and that institution’s performance on these tests. We can explain 75 percent of the total variation in institutional averages on the CLA+ through reference to that institution’s average incoming SAT score, and this holds mostly true for both freshmen and seniors. ETS data on the MAPP (the old name for the Proficiency Profile) showed an even closer relationship. In both cases, how well schools perform on the test of college learning was strongly predictable from how well their students performed on a high school test.

It’s important to note what this does and does not mean. This does not mean, crucially, that students are not learning. In both sets of data, robust gains were observed from freshman to senior administration. It simply means that the relative performance of schools remained rather static between freshman and senior year; schools with lower incoming SAT scores tended to have lower CLA+ and MAPP scores at both the freshmen and senior levels. Institutional averages, meanwhile, will tend to be more static in this regard, while individual student results will naturally tend towards more variability. We should reasonably expect there to be a fairly strong relationship between these scores; after all, a given student’s own ability should necessarily be one the most important determiners of their performance on any given test.

But we must also take care to understand that this difference makes comparisons between different institutions tricky. It would be unfair and invalid, given the determinative power of incoming SAT scores, to make a simple comparison between a more selective college’s scores and those of a less selective college and call this an indicator of their relative quality. We have ways to ameliorate these problems; for example, several of these tests utilize test-retest or value-added measures in an attempt to show growth over time. But as before, the key is to be judicious and skeptical in understanding our data and careful in making decisions based on that data.

None of this is intended to be nihilistic about assessment. As I said above, I have spent the past half-decade learning and researching assessment. Instead, I mean only to remind that objections to various aspects of assessment should be listened to with care and good faith, even as we remain committed to the overall project of understanding how much our students are learning. Too often, faculty concerns about assessment are seen as inherently political or self-interested in nature, derived from a self-protective instinct. We should see these concerns for what they usually are: an understandable and constructive call for fairness and equity as we try to navigate the considerable complexity inherent to the task of educational assessment.

New America Weekly

We Need to Assess Assessments

Weekly Article

Fredrik deBoer

March 24, 2016