Wednesday, July 25, 2007

Multiple Test Format Accountability Schemes

EdWeeks NCLB: Act II blog is looking and following closely the reauthorization of the No Child Left Behind Act. One of the big questions they are now asking is whether the accountability provisions will survive the reauthorization.

In my comments to that post, I noted that the political environment is far too heated for any sort of drastic changes, so my prediction is that Congress will punt until after the 2008 elections. But one of the deeper questions about the accountability provisions is one that will have to be addressed. Do we put too much emphasis on a single, snap shot in time measurements?

I will agree that once a year testing produces incentives that may not actually raise the academic achievement of students. Too much time is spent preparing for the tests, too much emphasis is given to "bubble kids" and there are far too many opportunities for the state and school districts to game the system.

I believe that accountability is important and that data doesn't lie. We cannot scrap the accountability provisions all together, but we do need to look for alternatives. One of the alternative methods that is being tested is the so-called growth model, whereby states can show that their students are learning, thus meeting the definition of adequate yearly progress.

So if there are already alternatives for measuring AYP, what about different models for testing. Well some of those ideas include the use of so-called multiple measures for gauging AYP. I am not sure what Congress means by that, but Education Trust and other groups worry that such multiple measures will gut the accountability provisions and confuse parents.

Yet, I am not sure. Multiple measures can mean several things and what I would like to see the term used in a way that would help the growth model people--multiple state wide tests used to measure progress. At a minium we would be talking about three or four tests a year. The first test of each year would give a baseline starting point for the students. The middle test or tests would provide data to schools and teachers, allowing them to modify teaching content and the final test would provide year end measures.

The wealth of test data that is possible with this model is far superior to any single test format currently available and can help alleviate some of the worst features. Additionally, it can help advance many of the teacher quality issues that I and other have spoken about.

As a starting point the multiple test format ("MTF") allows for mid-year corrections to curricula and teahcing methods. The test data would allow for schools to address areas that show weaknesses on the tests, helping students achieve the necessary standards on later tests. MTF also enable experts who develop tests and curricula to get a better match between the test standards and the curricula so that the disconnect between what is taught and what and how it is tested shrinks.

One of hte most common complaints about the current single test format is the "high stakes" testing atmosphere that is roundly decired by teachers and testing critics. A MTF lowers that stress level as multiple tests breeds a familiarity with the format and provides a sort of leveling of scores. One bad day can be overcome with regular performance on the remaining test days.

The MTF is not all about helping students, but can also help school districts and administrator to identify exceptional teachers and teachers needing some professional development. Growth models and MTF would allow teachers to be measured against their peers in a number of different factors and with far more data available to study, the evaluation becomes much more accurate. Issues such as merit pay can be more readily accepted because the primary means of measuring educational skill is based on multiple tests, not on one test.


Of course, a MTF evaluation would cost more and that is a real hurdle for such a scheme. However, a series of four tests can provide so much more data on what is working, what is not working, the matching of curricula to tested items, the impact of a teacher upon student learning, etc, that the investment might be appealing. Instead of guessing what is happening in classrooms, principals, policymakers and parents would have real data, available in time so that corrections can be made instead of weeks or months after the fact.

No comments: