School Reform: Testing and Data — Does the Tail Wag the Dog? Part A

Summary: Testing and the data collection it produces can be a useful tool in school reform, but serious question persists that in this forest it is too easy to lose perspective, and end up magnifying testing results beyond their legitimate utility. The first of two posts exploring this theme.

Third grade room, state test time, elementary school USA.

Charlene, a teacher with nineteen years of experience at various grade levels, has arrived early to school to move the desks in her room from study clusters in which they are normally arranged, to rows which emphasize maximum distance from one another. She places dictionaries at strategic points around the room. On each desk she arranges various materials obviously related to the test – a pencil, an eraser, a protractor – but also a water bottle, a granola bar (chosen to be opened quietly and to provide energy), a “smencil” — a pencil which smells like the peppermint, which is known to enhance focus — and a stick of gum, also known to enhance focus. The latter trio not your father’s test materials.

Charlene then spreads white butcher paper over various materials on the classroom walls that have been used to teach critical items on the day’s test, and fastens the coverage with staples and masking tape. The visual effect is as though the room has been sanitized.

On the door to the classroom she hangs a sign “Testing in Progress – Do Not Disturb”, though it will be hard for anyone approaching her classroom, if allowed in the building at all, to not know she should enter other than quietly.

Finally, she pulls the test booklets and answer sheets from the box in which they came, and makes sure each student has a test with their name and identification number on it.

Boys and girls begin to spill into the classroom. Charlene uses the mass movement as camouflage and reminds a few students that they are to go another room. For some, a smaller environment, with fewer distractions, will help them to achieve to their capacity on the test.

The bell rings. After some initial confusion the students settle down as they figure out the altered seating arrangements. They buzz at the gum, the granola bar — anxious noise. Charlene raises her hand, and the room becomes quiet.

The testing session begins.

In the progression from poor national test scores to national conversation to politics to elections to policies, the pulses of culture land in real class rooms seeking data with which to measure progress and guide change, and have led to the ubiquity of the scenes just described, across schools of all types, and replicated frequently throughout the school year with varying intensity.

Though voices are raised about the cost of such obsession, to my ear they have smacked too much of romance, and are too little backed by articulate  research, and so lack the intellectual rigor that the testing itself, in fits and jerks, promises to produce and so is less vulnerable to challenge.

However. Questions of creativity, of art or music, of critical thinking, of patriotic, civic and historical instruction are valid, though are marginalized in the current troubled educational environment driven by a testing culture that emphasizes basic skills, and which sometimes seems to have orphaned the more elusive liberal arts.

Then there are the haunting voices from Finland suggesting we have gotten it all backwards, anyway.

Given the combined concert of quality choirs such as these, it is prudent to question the singular dominance of testing, to restrain our thrall at the march of data, and to seek a balanced perspective. Best not to have a tool wag the question.

As often, David Brooks offers a perspective that I lean on somewhat in this caution. In his column “Driven to distraction by big data,” as published in the February 20 Seattle Times, Mr. Brooks takes several philosophical steps back to gain perspective. While acknowledging variously the utility of data (and I might add particularly as computation power grows), he offers caveats about the limits of its use.

In Brooks’ analysis, data “excels at measuring the quantity of social interactions, but not the quality.” In a classroom, data produced through the observations in a classroom might produce the number of times a teacher interacted directly with individual students, and might even be able in the hands of an astute recorder be transformed into categories that reflect crude quality of interaction that might be used in evaluation.

But compare such data with the perspective of an evaluating principal, who experiences the personality of the teacher in many settings, informal and formal, in conversation, in interactions with other teachers, in the classroom, and so forth day in and day out. The depth of perspective of a socially adept principal, experienced in assessing the value of the teacher’s ability to communicate with colleagues and students alike, will surpasses the relatively thin ability of data to assess the same skill.

Brooks again: “Data struggles with context.” “People are really good at telling stories that weave together multiple causes and multiple contexts.” For example, again the principal might note that a given teacher has a positive effect on colleagues, is a kind of keystone communicator or mentor in a particular sector of the school, and might in such a way bolster the test scores of students beyond her classroom.

Similarly, a principal might observe where data could not that a teacher serves as a magnet for marginal kids, and it is relationship with that adult that keeps kids in school even if not greatly productive there. Possibly it is parallel to the effect found in Head Start studies, which find no discernible benefit from Head Start for student academic skills over time, but do demonstrate that Head Start participants were more likely to graduate from high school, and less likely to be involved in crime.

Nonetheless, the march of data and in particular our ability to manipulate it via computers has been formidable.

In medicine, the superior ability of machines to incorporate and analyze data will bring new precision and predictability to medicine. New machine algorithms will enhance the diagnostic skill of doctors themselves, and will allow tech assistants to make some diagnoses now primarily in the provenance of doctors, who are increasingly in short supply and expensive. In the March 2013 Atlantic,  Jonathan Cohn (“The Robot Will See You Now”) reports that Watson, the super computer that bested former champions at Jeopardy, is being taught to diagnose medical conditions by absorbing literature in the field of medicine, and the data stream from the medical tests and self-report of individuals. In time Watson and its cyber descendants will become a critical adjunct to teams of doctors, nurses, physician’s assistants, and other holders of technical licensure. These innovations may yet play a role in reducing the cost of medical care as the Baby Boomers move toward the years when they will suck up medical care dollars.

Since I write from Florida, and under the influence of baseball’s spring training, allow me to point out as a frivolous example that baseball has its sabermetrics, a form of data mining first championed by Billy Beane, general manager of the Oakland Athletics. Beane, challenged by the parsimonious budget provided by Oakland ownership, transformed a traditional network of old boy expertise by unearthing player talent through sophisticated data analysis, in enough cases talent detected imperfectly by old school shoe leather.

Note that in both examples, that of baseball and of medicine, the data use becomes innovative and action is taken when the problem looms in the near term, whether it is the need to compete with wealthier teams, or the medical and fiscal urgency of the entitlement crisis, and the arrival of the boomers to the point of their frailty.

Education has not escaped the trend.

(To be continued.)

