The next salvo came with the Fall 2010 opening of Waiting for Superman, the emotionally charged and popular documentary that identifies poor classroom teachers as the primary cause of urban school failure. Promoted with the help of foundation support (Gates, Walton, Broad) and star power (Oprah Winfrey), the film promoted a now-familiar drumbeat: fire bad teachers, close failing schools, open privately run charter schools, incentivize teaching. NBC dedicated a week in September 2010, coinciding with the film’s release, to stories on public education, mining the film as its main source of ideas. Critics later exposed factual errors in the documentary and took issue with its agenda, which may have had something to do with the film’s failure to win an Oscar nomination. But in the fall, broadcast anchors and national columnists were using the documentary as a crash course.

The Los Angeles Times could not have chosen a better climate in which to launch its investigative project, “Grading the Teachers.” On August 14, 2010, the paper ran its first stories, and announced it would soon publish its own value-added ratings for 6,000 local elementary school teachers. Names included.

The idea had taken root in 2009, when education reporter Jason Song wrapped up a series called “Failure Gets a Pass,” about teacher discipline. Investigative reporter Jason Felch then joined Song to look further into evaluations. Frustrated by the district’s lack of hard numbers, the reporters and editors decided to calculate their own. The paper hired a respected economist from the Rand Corporation, Richard Buddin, who created a teacher-performance analysis using students’ third through fifth grade state math and reading exams.

Bold and gutsy, no question. No major news outlet had ever attempted to develop its own job performance system for individual public employees, let alone for something as nuanced as what teachers do. Times reporters and editors had thought through some of the ramifications. Before publication, the paper set up a website open only to the 6,000 teachers, so they could log in early and post comments if they wished. (About a third of the teachers did so.) Sidebars included an airing of the data’s shortcomings and the newspaper’s methodology.

Caveats aside, the sum of the series is a strong endorsement of the value-added model—inevitable, perhaps, because it used the paper’s own. “By the time we were done with the reporting,” said Felch, “we found this was a very, very valuable statistic.” It was certainly popular. The Times’s teacher-rating site has attracted 1.8 million hits since it was launched; each of its page-one stories ranks among the most read of the year.

Here’s a simplified look at how value-added models work: analysts estimate how well a child is expected to score on reading and math tests this year by looking at her past results. The difference between the estimate and this year’s actual score is attributed to the current teacher, for better or for worse. Each teacher’s effectiveness with multiple students over several years then is boiled down to one statistic—a percentile ranking that ranges from most to least effective compared to his or her peers.

Most teachers fall into the vast middle of the bell curve, where one score is virtually indistinguishable from another. Experts agree: the numbers are much more useful for the very top and bottom teachers. The father of value-added education stats himself, economist William Sanders, told NPR’s Morning Edition that he worries that parents using the data might jump to false conclusions about the teachers in the middle.

Proponents note that value-added numbers factor out “outside influences” like poverty and parents’ education levels, because students are compared to themselves, not to one standard. The measures, then, are less likely to give teachers low ratings just because they teach disadvantaged children.

Still, their limitations are legion. First, only a fraction of a district’s teachers are included—only those who teach reading and math; there are no standardized tests for other subjects. No allowance is made for many “inside school” factors, such as the effect of team teaching, after-school tutors, substitute teachers, a child or a teacher who is absent for long periods of time, or an unstable school environment—a new principal, a violent incident, a district overhaul. And finally, critics ask: Since the number is based on manipulating one-day snapshot tests—the value of which is a matter of debate—what does it really measure?

LynNell Hancock is the H. Gordon Garbedian Professor of Journalism at Columbia, and director of the school's Spencer Fellowship in Education Journalism.