Tuesday, April 17, 2007

Teacher Quality Stats--The Basic Measurement

This is the fourth in a series of posts about teacher quality statistical measurements. For more of these posts, click on the Teacher Quality Stats category on the left.

Student Achievement--The Basic Measurement.
In order to build a statistical view of a teacher we have to start with a few basics building blocks. Although I have noting more than anecdotal data to support this hypothesis fully, most teacher evaluations are done with little regard to actual student performance. I assume that most techer evaluations consider student performance somewhat, but in the end, the evaluation is based upon other criteria, such as attendence at work, evaluations of lesson plans, once or twice in-class evaluations by the reviewer and other factors such as continuing education credits and perhaps even a subjective feeling about a teacher. Indeed many blogging teachers have complained that evaluation by principals engenders feelings of favoritism or other politics.

Traditionally, many of the measurement of teacher quality would be experience, education, certifications, personal attitudes and other such things that we think of when we think about quality teachers. Many of those qualities were enshrined in the "High Quality Teacher" sections of NCLB. But I believe that those concepts may not have as much relevance as we believe.

In Moneyball terms, Michael Lewis might describe the manner in which teachers are currently evaluated as "old knowledge." In baseball, old knowledge was represented by some baseball scouts and the traditional thinking about the future of baseball players. What Oakland A's General Manager Billy Beane learned was that past peformance from a player is a much better indicator of future performance. So Beane started gathering data, and not just traditional data about players, like their batting average or hits or strikeouts. But other data such as slugging percentage, on-base percentage, number of walks, number of pitches seen by a batter against pitchers. Baseball, of course, lends itself to this kind of statistical knowledge and there are literally dozens if not hundreds of people at every game gathering that basic information. Beane was after the "new knowledge."

But the "new knowledge" in gauging teacher knowledge may not be new, but it is going to a different criteria for judging teacher quality. Admittedly, much of what is discussed in this post is not new idea. But having said that the most basic building block of that knowledge is the advancement of students in the current measurement scheme. Simply put, a value-added model.

The current measurement scheme breaks down the student into grades and proficiency levels. Thus, there are 12 grades from elementary high school and four proficiency levels, below basic, basic, proficient, and advanced. The proficiency levels in my scheme are worth .25 points, depending on their relation, with proficient being equal to the grade level. Thus every student has a grade level and a proficiency level, such a Grade 3 Proficient (3.0) or Grade 5 Below basic (4.5--two levels below grade five). We have tests, such as they are, that measure this grade and proficiency level. (The quality of the test is another variable that will have to be addressed, but there are more than a few policy concerns about the tests).

Thus, at the beginning of an academic year, we must know the grade level and proficiency of each student in a teachers class. That is our starting point for measuing student achievement and therefore teacher quality.

At a minimum performance level, we can and should expect each teacher to increase the achievement level of each student by one grade level per academic year. Thus a fourth grade teacher should raise each student from a third grade level to a fourth grade level. Thus a teacher who achieves this minimal level of success would be judged as having accomplished 1 improvement point in her student. Grade 3 proficient to grade 4 proficient is the increase of 1. In a class of 25, a teacher would garner 25 points if she did just what was expected of her.

For every change in proficiency level, the value added is .25, that is one quarter of a grade level for each of the four profieincy levels. So, for example, a teacher is able to advance a student from grade 3 basic to grade 4 proficient will have added 1.25 improvement points for that student. This simple value added measurement looks solely at the student achievement subjectively and without regard to any external factors, such as the student's race, socioeconomic status or even ESL status. Each student is then a given data point for that teacher.

For example, let us take two 4th grade teachers, each with 25 students. Here are their class breakdowns of before and after an academic year.
Teacher ATeacher B

If we had aggregated the data for each teacher, we would have come to the conclusion that each teacher was a quality teacher, in that overall each did a good job advancing their students. Indeed, looking just as the summary data, one could easily think that Teacher A did a better job that B. Although the average increase was 1.05 for Teacher A versus 1.09 for Teacher B, Teacher A had a higher median increase than B, but also had a wider range of success. Teacher B didn't have any students loose ground, but also didn't experience a dramatic turnaround like A did.

So the question is which techer is better? Each teacher has apparently improved the lot of their students. By the old measures, we would say that each of these teachers is a quality teacher. We might even be right, so aggregating the data might lead us to believe that Teacher B did a better job that teacher A, but is that necessarily the case?

Of course, looking at only one year is not particularly conclusive. Over time, several years of performance may yeild better results as to which of these two teachers provides the most value added for their students.

The value-added method serves two important protections. First, as stated above, all student centered information aisde from proficiency level is completely outside the picture. No information on race, socio-economic status, parentage, residence or ESL status or special ed status enters the equation. Each student-teacher relationship is measured solely on successes of that relationship.

Second, teacher centered information such as where the teacher works, the education and background of the teacher, years of experience, everything is isolated in favor of looking only at what the teacher does with each student in hard terms. In pure statistics, this is the cleanest you can get. In terms of pure accountability, there can be no other base measurement. Either a student is advancing on par with expectations or the student is not.

The value added measurement also allows for large numbers of data collection points. Each student-teacher relationship is measured. The more data points you get, the more reliable a measure of the teacher's quality you get.

This measurement is not new. In fact, as Brett Pawlowski noted to me, the Education Consumers Foundation has used these kinds of measurements in a pilot project on school quality in Tennessee. See his posts here, here, and here.

However, suffice to say at this stage, the value-added data would have to be collected on a student-by-student, teacher-by-teacher basis so that each teacher-student relationship can be measured. That is already the case in Tennessee although student and teacher level data is kept private, school by school aggregate data is available.

Future Postings
Of course, there are aspects on teaching data collection that I have not addressed and I expect a certain amount of criticism along those lines. In future posts, I intend to talk about class size, the issue of resources in the schools (that is the per pupil spending), the issue of subjective evaluation-type subjects, accounting for student turnover, ESL and special ed status, socio-economic status, and a whole range of information. There are other teacher-centered measurements that need to be addressed, such as certifications, licensure, education, years of experience, years in subject, years in school, years in grade, etc.

I suspect that once such data were actually collected on a wide scale, properly evaluated and analyzed, we might find that the old knowledge of what makes a good teacher may be completely debunked.

4 comments:

Anonymous said...

This is not a bad idea. I think it would take some refinement before you could actually use it to measure teachers. The problem with measuring teachers is that there are so many influences in student's lives other than the teacher and the school. What happens if Mom and Dad get divorced? Or for the kids I teach, what happens when Dad gets out of jail and comes home to wreck havoc? Great idea, though. It got me thinking.

Unknown said...

As I had stated, the value added measurement is the starting point. Other aspects will have to factored in, including things like resources for each child, SES, race and the number of days in school.

However, intangibles like divorce and other events in a student's life cannot be accounted for. But in every data set there are outliers, that is why the model is designed to look at a large number of student-teacher relationships. If a teacher has 24 students who made progress and one who did not, and the pattern of progress continues over the course of years, then one or two cases will be looked at as outlier cases.

I am not a statistician by training and I know that the model will need refinement as I move forward. But there are some factors that are beyond the control of teachers and administrators and cannot be accounted for by any statistical model. But the goal of this project is to strip out these extraneous items and search for quality in teachers that can be studied and replicated.

Anonymous said...

Excellent series - I'd like to revisit it in a few weeks when I can spend more time thinking about it, but I wanted you to know that it's appreciated.

Unknown said...

Interesting. My questions and comments were way too long for a comment, so see here.