School Politics: Is Teacher Evaluation Destined for the Rabbit Hole? (Part A)

Summary: Teacher evaluation reforms encounter headwinds to be expected amid some sentiment that we don’t know enough about what we are trying to do, and alongside worries about the cost in time to those principals doing the evaluating. Meanwhile, the standardized tests we have come to regard as a sacred cow of teacher evaluation may no longer be up to the task.

Teacher evaluation reforms, in those states and communities at the cutting edge, are experiencing the kind of growing pains one would expect. In a New York Times article February 19, 2012, “States Try to Fix Quirks in Teacher Evaluations”, by Jenny Anderson, evaluating principals and other professionals from several states discuss their experience as they implement the new mandates they face.

According to Grover J. Whitehurst, a senior fellow at the Brookings Institution, state efforts to reform teacher evaluation

“are racing ahead based on promises made to Washington or local political imperatives that prioritize an unwavering commitment to unproven approaches. There’s a lot we don’t know about how to evaluate teachers reliably and how to use that information to improve instruction and learning.”

I like that phrase – “unwavering commitment to unproven approaches” – too often the mantra in the educational game. The only corollary I would add is that the commitment continues until the next new exciting thing shows up, often with a new round of folks in power.

Presumably the intertwining poles of teacher evaluation and improvement of instruction and learning include how we use the results of standardized testing. Though such a metric on one level makes intuitive sense, the interwoven nature of a high school student’s experience, particularly in a school that emphasizes reading, writing, and math across the curriculum, raises some open ended questions about the degree to which the results in an individual English teacher’s classroom, for example, should be used as indicative of the quality of his or her teaching.

Theoretically standardized test scores would be more clearly indicative of the quality of a teacher’s work on an elementary level where a student has one teacher, but how is a principal to moderate the evaluation of a teacher who has a high percentage of special education students or second language students, or one who had the misfortune to inherit a number of students who had a weak teacher in the previous year?

How reliable are test results if tests have been dumbed down, as feared in some states, in response to the pressures of No Child Left Behind?

Not easy questions, but which are of the type not unexpected as rhetoric is turned into real products whose tires are now hitting the road. As one of the principals quoted in Anderson’s article comments, “We’re building it on the fly.”

On top of these quandaries and by the testimony of principals quoted by Anderson, the more pivotal concern is the time it takes them to faithfully implement the new teacher evaluations, to the sacrifice of other critical duties.

That is, the new evaluations are enormously time intensive, which shouldn’t surprise, because properly supervisorial relationships are at base human relationships, delicate ones, ideally designed not to bludgeon, but to promote growth. They take time to build, with trust to be established, and a growth process created, so that the relationship is collaborative, rather than an equation in which one side intimidates, and the other side hunkers down.

These concerns echo some of my earlier comments (see post 4/9/12 “School Culture and Politics: Whither the Money?”) about the student contact time versus money conundrum we face in schools. That is, the evidence from charter schools and the at risk student literature, among other sources, argues we need more staffing to increase contact time with students with whom we face more difficult challenges, but have yet to make the case to the satisfaction of voters and politicians, and even ourselves, that we know enough about what we are doing to justify increased expenditures in a difficult economic and political environment.

So what is it that takes so much time for supervising principals, aside from the given that a successful set of such relationships needs careful tending? In Tennessee, according to the NY Times article, there are four areas of evaluation, each subdivided into twelve subcategories, which proliferate in all to 116 subcategories. Egad. Sounds a bit like a blueprint for a factory floor. It gets better. Each new teacher is observed six times a year; veteran teachers four times a year, and each time all 116 subcategories must be scored. The article reports that it takes four to six hours just to input the data (hopefully the total of all observations, not for each observation), and then of course there are pre-conferences in which “teachers explain and show the lesson, and post-conferences in which feedback is given.”

The state of Tennessee has proven itself somewhat flexible in the face of administrator complaint about the time burden, so that the mandate for full use of the 116 subcategories has been mitigated, among other labor saving devices. For my taste, when a system gets too unwieldy, it is prima facie flawed.

Disturbingly, and in true bureaucratic fashion, while Tennessee state education officials have shown themselves willing to tweak the system some, their current message is that it will not be changed significantly. As in many other cases, the devil will be in the details. In a large system, such as that governing the schools of Tennessee, good ideas cannot wilt before mere whining on the grass roots level. But such firmness becomes destructive intransigence when the “whining” turns out to be based on widespread experience and cogent argument, in this case from principals in the middle of the fray.

Something will have to give, and the result may be counter productive. Now the Tennessee legislature is considering a bill to lower the bar for tenure to a lower score than that previously in place, from a score of five, to four or three. Starts to sound a bit like dumbing down the test.

One more time: when the system is unwieldy, or constructed in such a way as to poorly define the endgame, the politico educational nexus finds ways to mitigate the harmful effects, but in the process may vitiate the original noble attempt, which in this case was to find a way to reform teacher supervision and evaluation, ultimately the quality of teaching, and to a subversive extent, get rid of poor teachers.

In other states, Delaware, Maryland, and New York specifically, state directed reform of evaluation procedures has been tabled or delayed in various ways because of the same issues – how do we evaluate teachers accurately and within the time allotted? I suspect the answer will be a compromise – simpler, less time intensive systems, hopefully more efficient ones, until such time as the political process, and the voting public is ready to trust more money in school hands for enhanced staffing, in this case in the principal cum evaluator ranks.

This is not to say all is bad. Despite these crunching difficulties, I believe in my own school, and by similar testament from a Tennessee state education official, that “rich conversations” around teaching are increasing in frequency, and instruction has improved by becoming more consciously oriented around specific goals, and via data drawn from everyday student performance. That we may becoming too concerned with the trees and forgetting about the forest – that is, focused on skills training to the detriment of broader educational goals such as critical thinking or the fostering of citizenship – does not detract from the fact that we have to get much better at giving kids those very skills – reading, writing, and computation – without which they will be crippled in the marketplace.

To be continued next week.

