When big data is bad data

Disks of never-before-released data from the Department of Education landed with a befuddling thud in New York City’s newsrooms at the end of February. The swarm of spreadsheets had promised to provide a single ranking of 18,000 teachers (by name!) from zero to 99 based on students’ standardized test scores.

A bonanza for education reporters, right? Time to celebrate? Well, not exactly; not for me, anyway.

My intrepid journalism students wondered why I didn’t seem to share their enthusiasm for the data. Wasn’t I the same teacher who became semi-deranged when they turned in stories without any quantitative evidence? Think of the stories to be done, the fun graphics to design.

Here were not only reams of data, but hot data—from the center of a national controversy over how teachers should be evaluated. Adding to the buildup, the reports had been locked away for more than a year while a city judge refereed a high-octane legal fracas between the teachers union and the city over whether to release them. Nearly a dozen news organizations had become either witting or unwitting pawns in this dispute when they filed Freedom of Information requests for the data’s release.

“Isn’t it our job to bring information into the light, and let the public judge for themselves?” one student asked me. She had learned her lessons well.

Last year, I was certain what my killjoy answer would be: Just because you have data doesn’t mean it is always right to publish it—especially if you know the numbers are no good. And these numbers do have huge problems. Everyone from economists, to educators, to knowledgeable city education reporters know that the arcane algorithms that generated the teacher-rating numbers are as statistically flawed as they are politically fraught.

The complex formulas are meant to measure how much value a teacher contributes to a student’s learning growth (or lack of growth) over time. It would be useful if they actually did. But the data are riddled with mistakes, useless sample sizes, flawed measuring tools, and cavernous margins of error. The Department of Education says that a math teacher’s ranking could be off by 35 percent; an English teacher’s by 53 percent. That means a reading teacher with a ho-hum 35 could either be as horrid as a 1 or as awesome as an 86—take your pick. What election survey with these kinds of gaping margins would be published in the papers?

Most damning—and most often ignored in the coverage—is that the sole basis for these ratings are old student tests that have since been discredited by the New York State Board of Regents. The 2007-2010 scores used for these teacher rankings were inflated, the Regents determined. The Department of Education had lowered the pass score so far that the tests had become far too easy. So not only were the algorithms suspect, but the numbers fed into them were flawed. News organizations that publish them next to teachers’ names run the risk of not only knowingly misleading the public, but also of becoming entangled in the political web surrounding teacher evaluations, which extends from the mayor’s office, to the state house, to unions, philanthropy board rooms, and to the White House.

And yet, nearly every city news organization went ahead and printed them anyway.

To my mind, all reasons not to publish still exist. They are still true. But in the last month, I’ve come around to an opposite, perhaps more cynical, conclusion about the virtues of making them public. Publishing them, it seems to me, has had an odd, clarifying effect. Releasing the data to public scrutiny, alongside context and caveats, has exposed just how flawed they really are.

Apparently, the public has received that message. A Quinnipiac poll released in mid-March showed that 58 percent of the respondents approved of releasing the teacher data reports, while at the same time 46 percent believed they were flawed. The more the public sees, the less enamored they are (Go, Journalism!).

Perhaps that was what philanthropist Bill Gates feared in February when he lectured the media days before the data were released. In a February 23 op ed in The New York Times, “Shame is not the Solution,” Gates warned news organizations not to humiliate teachers by publishing their names next to their value-added rankings.

What was up? Gates is usually bullish on the use of test scores to evaluate teachers. Concern for their feelings had rarely been a top concern. Yet, he argued, correctly, test scores were not “a sensitive enough measure to gauge effective teaching” all by themselves.

Some chose to believe Gates had finally come to appreciate the complexities and nuances of good teaching, something that cannot be boiled down to a number based on students’ tests. But if that were the case, he would have backed away from endorsing the value-added formulas. Instead, he advocated only that they be kept from the public. It’s more plausible to me that he is worried the public will turn against the test-driven accountability agenda promoted by his foundation—which has fueled policy for the last several decades—as these not-ready-for-primetime rankings are scrutinized to death. Whatever the case, it was an unexpected signal from one of the nation’s most influential data-driven reformers.

Even more surprising was the detour taken by Arne Duncan, the US Secretary of Education. Last year, Duncan applauded the Los Angeles Times for being the first newspaper to print rankings—its own—next to teachers’ names. He encouraged other news organizations to do the same. This year, he recently posed and answered his own question to Education Week’s Stephen Sawchuck, “Do you need to publish every single teacher’s rating in the paper? I don’t think you do.”

All of the above

Obviously, the mood in recent months had been dialed back from fever pitch to tepid, possibly in recognition that teacher-bashing as a reform strategy had seen its best days.

Last year, New York Mayor Michael Bloomberg and Joel Klein, then the city’s Chancellor of Education, gave these numbers their high-five support. Even after Klein left to head the education division of Rupert Murdoch’s News Corp., his city department went so far as to encourage local news organizations to make sure they FOILED for the teacher data in a timely fashion—with names. And back then the department responded to their requests with uncharacteristic speed.

At the New York Daily News, Deputy Editor Arthur Browne said he discussed releasing the reports with department officials several times last year when he was editorial-page editor, in response to the bold Los Angeles Times move. “There wasn’t any resistance on Tweed’s part to us getting them,” Browne said in a recent interview, referring to Tweed Hall, city schools headquarters. Pressure mounted among the city’s education reporters—who were nearly all opposed to publishing the data. Some even threatened to quit over it. Just as the drama began to boil over, the United Federation of Teachers filed a lawsuit to stop the release.

That left Klein’s laid-back successor, Dennis Walcott, holding the data bag more than a year later, after the union’s court appeals had run their course. On February 24, Walcott finally released the reports, all wrapped in caveats and finger wagging. “It would be irresponsible for anyone to use this information to render judgments about individual teachers,” he wrote in a Daily News op ed that same day. He repeatedly reminded readers that the numbers were two years old, and should never be used in isolation. “I’m deeply concerned that some of our hardworking teachers might be denigrated in the media based on this information. That would be inexcusable.” Clearly, his heart wasn’t in it.

Walcott’s sheepish tone mystified news editors as their staff scrambled to build apps and technical platforms to house the numbers. “It was disarming,” said Mary Ann Giordano, editor of the New York Times’s SchoolBook.org. “Walcott was stepping back from these numbers, blaming news organizations if they published names.”

Editors had to work fast to decide what to do. Publish the raw spreadsheets? Take the numbers at face value and march out the best and worst teachers, one by one? Write thoughtful critiques of all the downsides of the data and publish them anyway, next to teachers’ names? Or refuse to publish at all, for fear of misleading the public with faulty figures and maligning teachers with bogus data?

Maybe it was the robust news climate, or all the tangled messages from on high, but for New York’s media, the answer was: all of the above.

Some outlets rose above the fray and refused to go near the reports on principle. The local Riverdale Press and two citywide online news services—InsideSchools.org and GothamSchools.org—all took the high road. “No amount of context could justify attaching teachers’ names to the statistics,” wrote Elizabeth Green, GothamSchools.org’s editor, in a column she had prepared a full year earlier. By contrast, NY1, the local 24-hour cable news television station, downloaded the entire Department of Education spreadsheet collection, which included three years of scores and more than 100 data points per teacher.

The New York Post surprised no one by taking the most reckless road of all, galloping through the numbers as if they represented reality, scooping up names for its gallery of the “best” and the “worst” teachers. Its editors and reporters did not bother to dwell on the caveats and nuances, or even to include, at least at first, each score’s margin of error. (It added the intervals later).

The low point was on day two, when the Post ran a photo and story about Pascal Mauclair, the so-called city’s “worst teacher,” thus handing the union its first real teacher data report martyr.

The teachers union reported that Mauclair’s father opened his Queens apartment door the first day of the public release to find Post reporters telling him his daughter was the worst teacher in the city. Next, reporters found their way to his daughter’s apartment. She called police. Reporters turned to neighbors for comment. The Post story the next day identified Mauclair at the “bottom of the heap,” amongst those who do “zero, zilch, zippo” for students.

The backstory of her score does more to undermine the validity of the stats than the Post had in mind. Its’ reporters might have spent their time digging into the calculations behind her zero rating, by interviewing Mauclair’s principal and colleagues at PS 11 in Queens, where she taught small sixth grade classes of recent immigrants. Her students do not speak English. It’s not uncommon for some to take the state exams after being in her class for only a few months. The union says her score was based on 11 students, only 7 of whom had enough data to compute a real report—a meaningless sample size by any measure. Her fellow teachers, parents of students, and her principal were nonplussed. “I would put my own children in her class,” Principal Anna Efkarpides told Leo Casey of the UFT. The Queens school is consistently one of the highest performing schools among similar schools, and Mauclair is one of its top teachers. “The truth is the truth.”

By contrast, the Daily News managed to steer clear of its rival’s instinct to tick off the 10 worst and 10 best. In many ways it exhibited the most caution of all the city newspapers, by weeding out all those teachers whose rankings were based on only one year’s worth of classes.

Last year, the News’s Arthur Browne told me that the data was obviously not perfect, but that was no reason not to publish. This year, he had apparently done more homework. “We were leery of naming names if we couldn’t be invested in the accuracy of them,” said Browne, who also edits the op ed page. “We screened out the biggest problems in the database. We got the margins of error down into the zero range. We are committed to publication of the data, with all the caveats. We believe the public can make sound judgments.”

Still, there were some head-scratchers. The News’s first-day headline was a case in point: “More Than a Dozen Teachers Earned Lowest Scores.” This was a “Bridges-Help-People-Cross-Rivers” kind of headline. The rankings are calculated on a curve, meaning there will always be dozens at the bottom, dozens at the top, wide swaths of fair-to-middlin’ in between. That’s the nature of a bell curve, another controversial aspect of this calculation, which means the city will never be able to announce that all its teachers are high-performing.

The Times goes both ways

This brings us to the puzzling experience of reading about the test data in The New York Times. In partnership with WNYC public radio, the Times produced the city’s most sophisticated stories, and, next to The Wall Street Journal, the most polished graphics. Reporters took care to detail the data’s myriad errors, political nuances, and to put them into context. A careful reader could not help but come away believing the numbers were anything but radioactive.

Then the Times published every one of them anyway, with names.

Anna Phillips hammered out incisive blog after blog for SchoolBook.org about mistakes teachers found in their reports, about the DOE’s conflicting messages, about parents’ reactions. National education columnist Michael Winerip found a top-ranked school with bottom-ranked teachers to illustrate the numerical idiosyncrasies. In a second column he was the first to argue that publishing the bad numbers was the best thing that could happen to ultimately discredit them. SchoolBook.org editors created a helpful 14-point FAQ column covering nearly every base (except the inflated state tests). Teachers were invited to contribute blogs for the site. One 20-year veteran teacher wrote about being slapped with a 6th percentile ranking one year and exonerated by a 96th the next, underscoring how pointless, and demoralizing, they were.

And yet, there the rankings were, on display on its SchoolBook.org homepage, begging the question, why publish them at all?

Two reasons, explained Jodi Rudoren, the Times’ education editor at the time (she now heads the paper’s Jerusalem bureau): First, “We’re in the business of disclosing information that’s in the public interest,” she said. And second, “We do not operate in a vacuum,” meaning if the Times didn’t publish them, another news organization would anyway.

Some attention was paid to minimizing harm. The Times invited teachers to add comments next to their scores on a Google Doc, for example. By last count, only 60 out of a possible 18,000 had participated, most of them correcting their reports. Editors briefly flirted with the idea of somehow fiddling with the search function so that teachers’ rankings would not be the first thing that popped up when anyone Googled their name. “We considered it, but it’s a weird business to get involved in—suppressing searches,” said Rudoren. “It would be a gesture blocking the tabloidization of this data. But in the end, we felt it was not our role.”

The bottom line for Times editors was the fact that the Department of Education used this data to evaluate teachers, most recently stalling tenure decisions for those trapped in the bottom. Currently, the State Department of Education is generating new value-added reports that will be used in the city and beyond to make high-stakes decisions about teachers; and the legislature is embroiled in tortured debates over whether to make the results partially or fully public. “We thought it was important to provide parents with the same information that the DOE was using to evaluate teachers, shedding light on its decisions,” said Rudoren.

Maybe so, but are parents taking the data seriously? Principals in both New York and Los Angeles worried that parents would arm themselves with these numbers and storm their offices, causing chaos by demanding to switch their children from low-scoring teachers to higher scoring ones. If they are, the union representing New York City principals hasn’t heard about it yet. And if Los Angeles is a bellwether, in the two years since parents have been able to read their teachers’ scores in the paper, there has been little organizing around them.

Clear as mud

Perhaps this data dump was the best thing to happen to those who have been trying to steer the national school conversation away from testing and more testing, to thinking and learning how to learn. What should be central in all these stories is the fundamental problem: What’s the best way to evaluate teachers? How can authentic learning be measured? Should standardized tests be used at all to do it? They are one-day snapshots of how well one student answers a handful of basic, low-level questions. At best they are crude instruments; at worst, they are vulnerable to manipulation.

Sarah Wysocki’stale from the nation’s capital shines a cautionary headlight on the real-life dangers that may lie ahead for districts that put a lot of stock in these numbers. In DC, value-added numbers count for a full half of a teacher’s evaluation, even more than in New York. Bill Turque reported in The Washington Post that the highly regarded fifth-grade teacher received a stellar review by her principal and peers, and was then fired because so many of her students didn’t show any progress on their math and reading tests last year.

Wysocki offered a new twist on the data troubles. She pointed out to Turque that about half her students arrived in her fifth grade class with what she believed were inflated scores from their previous school—a school that is now under investigation for test tampering. Any honest teacher could never hope to improve on fraudulent scores. She appealed her dismissal, lost, and moved away.

Back in my own journalism class, I decided to walk through some of the teacher data online to see what we could learn. World Journalism Preparatory High School, in Flushing Queens popped up on the screen, a small, energetic 6th through 12th grade school that my students have become very familiar with since it opened in 2006. The DOE gave it a “B” grade this year on its controversial School Report Cards, and past years’ numbers have shown steady improvement. Its’ graduation rate is better than the average high school’s; its’ Regents English scores are above average. But the teachers? According to the clumsy measures, they ranked among the very worst in the city. Of course, it was hardly possible that all the children were teaching themselves.

Principal Cynthia Schneider has learned to ignore these spreadsheets over the last three years. “It’s not good data. It’s bad data, and we know it,” she said. “We know what we’re doing here.” The school had to send only eight kids to summer school last year to catch up, she said. The high school is ranked number 41 in the city. Still, her students and teachers suffered the indignity of a front-page article in the hyperlocal Whitestone Times—a photo of the school, plus the poor teacher rankings. Reporters did not call her for comment.

“I’m all about trying to get a handle on matching student achievement to teacher effectiveness. That’s a good thing,” said Schneider. “But that’s not what this does. At all.”

The data? Clear as mud. And now just about everybody in New York and beyond knows it.

LynNell Hancock is the H. Gordon Garbedian Professor of Journalism at Columbia, and director of the school’s Spencer Fellowship in Education Journalism.

When big data is bad data

About

Support CJR

Advertise