After a long stretch of silence I return with a newsletter that is going to land in your in boxes with a thud. It results from two really excellent Brown Bags. In fact, one of the Brown Bags, featuring Cathy Taylor on the WASL, inspired me to write the longest column I have ever produced for the AWM Newsletter, duly included here. Despite its being long, though, I couldn't stand to ignore the other Brown Bag, featuring John Palmieri and Jack Lee on Writing Courses, because that one was also a classic. I will start with the latter, albeit briefly.
    The issue of teaching students to write mathematics and science well is hardly a new one, here or elsewhere. A number of years ago an attempt was made to address it by designating certain courses to be writing courses. Unfortunately, the designation was rather a shapeless one, and eventually it became evident that the writing aspect was having very little impact on students taking the courses. Recently the university's administration decided that it was time to admit that the old system was broken and work towards a new one. Grants were made available to faculty members to try out new ideas. John Palmieri won such a grant and, joined by Jack Lee, set about trying out his plan. The plan was based on the theory that no amount of commentary on a paper was going to penetrate deeply into a student's understanding unless the student also had to act on that commentary. Correspondingly, a certain number of problems were designated "Portfolio Problems". These went back and forth through several iterations, with the student responsible for creating a new and improved draft each time. Along with this came a very clear exposition to the student of what the goal was -- what kind of clarity and conciseness constitute good writing in mathematics.
    It doesn't take much reading between the lines to register that that format puts a huge burden on the professor. Jack and John acknowledged that, and had some mechanisms for offsetting it. On the other hand, both came out with quite positive feelings about the effort, based on student response and student progress. They are even planning to run a workshop on how to run such a class -- there's conviction for you!

    Onward to the following week's Brown Bag, as written up for the AWM:

    It was in 1990 that I began to put into action a plan to expand my love of teaching mathematics into some knowledge about Mathematics Education as a field. My opening salvo was attending an MER (Mathematicians and Educational Reform) workshop. Most of the sessions were fascinating, but one of the optional ones caused me to give a delicate shudder: Assessment. How could any respectable person occupy their time with such a grungy topic?
    I've come a long way since then. I've even become intrigued with, and played around with, a number of forms of classroom assessment, some of them modifications of the classic sit-down-and-shut-up test, some rather farther into left field. Simultaneously I have been aware of the assessment effects of the reauthorization of Title I of the Elementary and Secondary Education Act, which resulted in almost every state producing its own standards and assessments, and the cataclysmic impact of No Child Left Behind. These two have given me a constantly increasing awareness of the complexity and importance of large scale assessment of the learning and teaching going on in schools statewide.
    Recently the last remnants of that original reaction were erased, and I came to realize that there are people occupying themselves with assessment who are not merely respectable but stellar. Furthermore the rest of us owe them a great debt of gratitude. In the process of learning that, I also found out a number of details and connections that had hitherto eluded me. For me, the context is the state of Washington, but the issues involved are present in all 50 states. My impression (small attack of chauvinism) is that Washington's procedures were particularly exacting, and the number of people involved and degree of follow-through were also outstanding. This I leave to the reader to figure out by checking on his or her home state.
    My source of all this information was a pair of talks by my colleague Catherine Taylor, who is a professor in the University of Washington's College of Education. Her field of specialty is Assessment, and she has recently returned to campus after a three year stint as adviser to the Office of the Superintendent of Public Instruction. She spoke first to a bunch of members of the mathematics department, and then to a bunch of graduate students who have been working with K-12 teachers. Each group came in armed with many negative reactions to our state's current test, and in each the mood change was palpable. As one of my colleagues in the Math Department put it: "I'm a convert!"
    So what was it we learned? It started with some prehistory: the original Title I Act. It was passed by Congress in the late sixties with the admirable intention of improving the education of underachieving poor students. Unfortunately it had some fatal flaws, such as a provision that each school must keep improving its students' test scores, but if the scores improved beyond a certain point the school would abruptly lose its funding. Eventually a study by John Cannell unearthed some dramatic findings – for instance, that test manipulations were managing to make the average performance in nearly all states be above the national average – and some unpleasant consequences of the format. The response to this was a 1993 reauthorization of Title I that mandated that states create their own academic standards and allowed them to choose or create their own assessment systems. The Washington legislature then set up a Commission on Student Learning (CSL) to address the task of producing both the standards and the assessment system. That's where things start getting impressive. The CSL didn't simply sit down and start writing. They assembled committees of educators and community members from throughout the state, and used their input. From that they produced the Essential Academic Learning Requirements (EALRs – pronounced as if they had something to do with long, thin fish). Then they sent the proposed EALRs out for review by an even larger community and revised them based on the reviews. The EALRs form a careful, thoughtful set. In mathematics they strongly reflect the NCTM Standards, with an emphasis on understanding and using mathematics, with computational fluency to be based on understanding of the operations being computed with. They also feature the inclusion of problem-solving, mathematical reasoning, mathematical communication, and connections as part of the content standards.
    And that, with all of its community consultation and review and multiple re-writings, was the easy part. After it came the construction and management of the WASL (Washington Assessment of Student Learning). Catherine gave us a full page diagram of the steps and stages of that, and filled in with further details. I lost track of the number of iterations of writing and reviewing and re-writing that went into it, but I do know that well over a year of work went into it before the first field test was run, and that's less than halfway down her page. Then came pilot testing and a huge job of figuring out the scoring. The test is criterion referenced rather than norm-referenced, which means that instead of being designed to produce a bell-shaped curve of scores, it aims basically to establish whether students have reached a level of proficiency appropriate to their grade level. Given that the EALRs had established that proficiency to include reasoning and ability to communicate, pure multiple-choice testing was clearly out of the question. There are some multiple-choice items (I liked Catherine's example of "Which of the following pieces of information do you have to have in order to solve the problem you just read?"), but also short-answer questions, where the answer must include some form of justification, in words, pictures, graphs, diagrams or whatever else the student chooses to use, and extended- response questions that open out in many directions. Next a consistent scoring system was established, then data for items were analyzed to select those that would be used on future tests.
 After the test was administered for the first time, a collection of people closely in touch with children of the relevant age took the test themselves and estimated where they would put the bar. Parents and teachers put it high, administrators put it low. Information about what percentage of the students who took the pilot test would be rated proficient given each of the bars was eventually released into the conversation, after which a suitable compromise took effect.
    Meanwhile the test items were field-tested in a large number of school districts and then examined by experts (including Catherine) for all manner of biases. The check for cultural bias ran beyond academic expertise – folks from OSPI held fora within various ethnic communities and learned yet more. For instance, a Native American elder pointed out that children of his nation would not get as far as the mathematics of a problem based on a survey, because it is not in their culture to ask questions of a stranger. Catherine ran multitudinous statistical tests for bias and found, for instance, that on the short-answer questions girls and minorities were at a slight advantage, and on the multiple choice question boys and whites were at a slight advantage, but the advantages balanced out.
    The writing assessment specialist for Washington's Department of Education worked with the scoring contractor to set up a rigorous training system for scoring the tests, which is done by teachers hired for the purpose. Tests run on the resulting scores indicate an extremely high rate of consistency in grading. In short, this is a really classy assessment.
    Then we get to the issue of public reaction. That's where the egg hits the fan. Partly, of course, that's because on any issue the noise tends to come from the negative. Beyond that, though, are some deeper issues. A fundamental one is sheer unfamiliarity. Teachers inevitably teach to the test – the system pretty much demands it – and for many years most tests have been geared to speedy production of calculations. The WASL is designed to change the whole slant of the assessment, and not only is that disorienting, but it very demanding of teachers.  On the other hand, to whatever extent teaching to the test changes the slant of the teaching towards achieving the EALRs, the pain is offset by some genuine benefits. Less easy to offset is the incredible pressure put on schools, and thence teachers, by the high stakes introduced by the No Child Left Behind Act.  Of course teachers shouldn't pass the stress along to their students, but it's very hard not to. And what parent likes to see a child quaking at the thought of a test?
    With all this information rattling around in my head, I've been pondering what we as mathematicians, Washingtonian or not, can do. So far all I have been able to come up with is "Find out more". I propose this not simply as an intellectual exercise, but so that we who might actually be listened to have an answer to questions like "My fourth grade son hasn't learned long division yet and I learned it in third – doesn't that mean he is getting less math?" [Answer: not if the time that would have been spent on the mechanics of division goes into its conceptual underpinnings] or "They used to put addition of fractions with unlike denominators on the fourth grade test and this one doesn't have anything nearly that advanced – isn't that a dumbing down?" [Answer: a norm-based test is designed to spread scores out along a curve, so it puts in questions that are way above and way below expectations in order to make distinctions among students who are far away from the norm.]
    And, of course, my recurrent response to educational issues: stand by to support K-12 teachers in any way you can – they are a beleaguered population if ever there was one!

Addendum: Catherine very kindly proof-read my original draft of this column and corrected the more egregious of my errors. She then produced a comment on my final paragraph that I liked so much that I shall now reproduce it, thus converting hers to the column's concluding paragraph:

Mathematicians should look at what is in the tests because what is tested is what will be taught. If they think that kids should learn to think mathematically or attack ill structured problems with some confidence, be able to apply math concepts and procedures in real world situations, graph, diagram, etc., then they should be looking to see if that is what is being 'valued' on their state's test. What is tested tells kids what is valued AND what is tested tells them what it means to be a mathematician or to use mathematics. That's why our culture is so math phobic - we have a very skewed idea about what it means to DO mathematics.