|
Cover Story
High-Stakes Questions

Do high-stakes tests boost student achievement? Can good schools be labeled 'low-performing'?
Which of the following is true?
(a) If the President of the United States says a school is great, it won't be put on a list of failing schools.
(b) If the U.S. Department of Education says a school is exemplary, it won't be put on a list of failing schools.
(c) If a school is succeeding in imparting both factual information and critical thinking skills to a diverse student body, it won't be put on a list of failing schools.
(d) A nationwide scientific experiment conducted over two decades has shown that high-stakes tests do not improve student achievement.
You won't be graded on this, but it's a very high-stakes question, because the new federal Elementary and Secondary Education Act (ESEA) mandates a testing regime that impacts every school.
"Every teacher knows tests have a role to play," says NEA Student Achievement Director Stephanie Fanjul. "Teachers use tests all the time, including standardized tests. We want to be sure our students are learning and growing. But there are lots of ways that we collect that information, not just tests. Almost never does a bubble sheet reflect back the breadth of what a child understands. When tests are punitive, all the attention is focused on the scores. That doesn't help us educate our children."
So now let's try to answer our question. Let's see...as experienced test takers, we know we can improve our chances by eliminating clearly wrong answers. This is an open-Web test, so let's do some research.
Checking out (a), we learn that President Bush, touring the country last May to promote the new federal education law, stopped at Vandenberg Elementary School in Southfield, Michigan, and said, "This is a successful school....This school doesn't quit on kids, and that's why it's heralded for its excellence." A few weeks later, Vandenberg found itself on the Michigan low-performing schools list.
For (b), we discover that USA Today found 19 U.S. Department of Education Blue Ribbon exemplary schools on low-performing lists.
Let's try (c). We investigate Hart Middle School in Rochester Hills, Michigan, a vibrant learning environment--and a Blue Ribbon winner. It, too, did not escape the low-performer list (see "Blue Ribbon or Below Par?").
That leaves (d), which seems the least likely answer--except that it's true.
Scientists at Arizona State University announced last December results of the most comprehensive study ever conducted of high-stakes testing. High-stakes testing, they concluded, does not improve student achievement.
The Arizona researchers took advantage of the fact that a giant experiment on high-stakes tests had been inadvertently conducted for the last 20 years: 28 states adopted high-stakes tests, while the others did not.
So the scientists posed a simple question: How did student achievement change when states put on the high-stakes pressure? Did students in high-stakes states improve more or less than students in the states that left student assessment mostly to local educators?
For an answer, the researchers looked at results on several national tests, including the reading and math tests of the National Assessment of Educational Progress (NAEP), which are given to random samples of students under the auspices of the U.S. Department of Education and are widely accepted as the nation's best measure of achievement. The Arizona scientists also looked at SAT, ACT, and Advanced Placement scores.
"The data presented in this study suggest that after the implementation of high-stakes tests, nothing much happens," researchers Audrey Amrein and David Berliner reported. "No consistent effects across states were noted. Scores seemed to go up or down in a random pattern."
There was actually a small tendency for states that adopted high-stakes testing to improve less on national tests than states that avoided the high-stakes pressure. But in most high-stakes testing states, the public impression has been that the tests work. That's because scores on the high-stakes tests themselves generally did improve, so state officials were able to claim success.
But the higher scores were apparently due to the enormous amounts of time and effort that schools poured into teaching the content and exact wording patterns that students would see on those particular tests. The improvement did not carry over into better performance on other tests of the same general content--they did not reflect real gains in learning.
More Drop-outs
In a second study, the Arizona researchers studied what happened to drop-out rates and graduation rates when states adopted high-stakes graduation tests.
The results were disturbing. Graduation rates fell. Drop-out rates rose.
Further investigation revealed that the reason wasn't just that students got discouraged or fled the pressure. The researchers found that some administrators encouraged low-achieving students to leave school, probably to improve their schools' scores.
The twin studies were funded by the Great Lakes Center for Education Research and Practice, a collaborative of NEA state affiliates in Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin. The studies were then reviewed by independent research scientists.
The results of these studies are consistent with earlier, more limited scientific assessments of high-stakes testing in Texas and Massachusetts.
High-stakes tests "result in narrowing the curriculum," says study co-author David Berliner. "People are doing anything they can to achieve the scores they need. They're not teaching students to transfer knowledge, but just to answer questions like those on the state test. They're not preparing students for other tests that measure the broader curriculum."
Up until now, Berliner said, high-stakes testing has impacted low-income and minority students most heavily. The study found that states adopting high- stakes tests tended to have more of these students than states that steered clear of these tests.
37 Ways to Fail
Under ESEA, state education planners are predicting that huge numbers of schools will begin showing up on low- performer lists in the 2004-05 school year.
Over the past few months, these officials have been examining the test scores of their schools to project how many are likely to run afoul of the new law, and many of these estimates have been made public. Many states expect that a majority of their schools will bear the low- performer label by 2004-05. Louisiana is predicting 85 percent; North Carolina, 60 percent; Massachusetts, close to half. Under the law, these schools will be eligible for extra help. But they will also be subjected to escalating penalties (see timeline on pages 8 and 9).
The high numbers are the result of two major provisions of the federal law.
One provision specifies that every state must set test score targets such that at least 20 percent of its students start out in schools that are below par.
The test score targets for each state are being calculated this year. Schools will have until the 2004-05 school year to hit these targets before they start suffering penalties.
The second provision applies the same test score standards, not just to the whole school, but to every major subgroup in the school: special education students, each racial or ethnic group, English-language learners, and low-income students. The idea is that a school should not be able to hide the low scores of a disadvantaged minority by averaging them in with a larger number of affluent, high-scoring students.
Will this help students? Texas has already been using a similar accountability system (see NEA Today, March 2002 issue and has found that the gap between minority and white students' scores on the state test shrank. But a RAND institute study by Stephen Klein, Laura Hamilton, Daniel McCaffrey, and Brian Stecher, published in October 2000, showed that the achievement gap in NAEP scores actually got slightly larger.
Besides holding each subgroup of students to the same standard as the entire student body, ESEA also requires that at least 95 percent of students in each subgroup must take the test. If any group misses the target test score in either reading or math, in any of the grades tested, or if fewer than 95 percent in any group don't take the test, the whole school fails.
A study for the Council of Chief State School Officers calculated that a school could have up to 18 test score targets to reach in each grade tested. Adding other requirements in the law, the council's study counted 37 separate opportunities to fail in each grade.
The reasoning behind the new federal law is that high-stakes pressure will force schools to pull themselves up. The predictions released by state education officials, however, indicate they are bracing for the worst.
Many state planners believe, on the basis of current test scores, that many schools where the student body as a whole reaches the target scores will "fail" because one or more subgroups in one or more grades misses the target. This is why they project that the number of "failing" schools will be much higher than 20 percent in 2004-05.
And after 2004-05, the lists are predicted to get even longer because the law requires that the target be raised every two or three years on a steep path that reaches 100 percent proficiency in the year 2014. The California Department of Education recently projected its results and found that by 2014, 98 percent of its schools will flunk.
To avoid this high failure rate, some states are starting to lower their standard of what is "proficient" so more schools can meet the federal targets. But many states don't want to go that route. California, for example, decided not to alter its standards despite that expected 98 percent failure rate.
Then there's the issue of funding-- hint: Current levels won't improve any school's chance of success. Despite the new rules that apply to every public school under ESEA, the federal share of education costs remains about 8 percent, and the President's proposed budget for the 2003-04 school year provides for no overall increase--it actually cuts ESEA by $90 million. A recent study in New Hampshire estimated that the new law will cost $575 per student, but the federal government will only provide $77.
"This whole situation demonstrates that you can't build a good accountability system from afar," says NEA Student Achievement Director Stephanie Fanjul. "It has to be created on the ground, by teachers and support professionals in partnership with the families, and with school boards and superintendents, all focused on the goal of helping our children learn.
"Our members have wholeheartedly supported accountability systems developed in that way. People in their communities know best how to meet their challenges, and government at its best facilitates that process. It doesn't dictate from Washington."
--Alain Jehlen
For More: See the Arizona studies at www.asu.edu/educ/epsl/EPRU/epru_2002_Research_Writing.htm, and the RAND study at www.rand.org/publications/IP/IP202.
ESEA Testing Timeline
2002-03
Using a federal formula applied to 2001-02 state test results, each state calculates a percentage of students who should score "proficient" in reading and math this year. Each school must meet this target for all students and for each subgroup of students. In many states, officials say most schools are below their targets. Schools have two years to reach the target. (For schools that fell short of state targets for two or more years under the old federal law, the new law says penalties are supposed to have started this year.)
2003-04
Schools strive to make "adequate yearly progress" (AYP), enough to reach the target by 2004-05.
2004-05
States must raise the target percentage of "proficient" students even higher. Title I schools that have not made AYP for two years are to get extra help. Also, they must offer students the choice of transferring to other public schools and must pay for transportation.
2005-06
Annual testing in reading and math is now required in each of grades 3-8 and at least once in grades 10-12. Title I schools that have not made AYP for three years must offer supplemental services such as tutoring.
2006-07
Title I schools that have not made AYP for four years must take "corrective action," ranging from hiring an outside expert to replacing some staff members.
2007-08
States must again raise the target percentage of students who score "proficient." (These test score targets go up in equal increments to reach 100 percent in 2014.) Meanwhile, annual testing in science begins.
June 30, 2008
The law expires or (more likely) is reauthorized by Congress with changes.
2008-09
Title I schools that have not achieved AYP for six years must reopen as charter schools, replace staff, hire private managers, or give control to the state.
2010-11
States once again raise the percentage of students who must score "proficient" in each school and in each subgroup of students.
June 2014
Every student in America is "proficient."
Blue Ribbon Or Below Par?
A Great School Gets a Bad Label
Educators at the Hart Middle School in Michigan were delighted last May when they learned they had won the coveted Blue Ribbon award from the U.S. Department of Education.
But the euphoria crashed at the first staff meeting of the new school year when the principal broke the news that Hart was also on a list of low-performing schools. "We were all just floored," says science teacher Nate Childers. "There was a lot of anger. How could this happen?"
Hart wasn't the only school with this strange Blue Ribbon/low-performer rating. USA Today conducted a partial survey of schools on low- performing lists and found 19 Blue Ribbon schools among them--19 schools singled out for excellence and then told they were below par and must shape up or face punishment. Ten of the 19 were in Michigan.
Hart fully deserved its Blue Ribbon. It does an outstanding job of educating a highly diverse student body, which includes children of millionaires and children from a trailer park.
Each week, two or three teachers go out to the trailer park after school to work with students there. Hart also has a Saturday school for students who need extra help. Parent involvement is strong.
Hart has solid school programs across the board. There's an active school choir, and the faculty recently installed climbing walls in the gym to challenge students.
And the academic program is top of the line. Science teacher Childers, for example, takes his students to a local river every year to monitor pollution. One year, they discovered a spike in their readings, traced the pollution to a construction site, and got it shut down--an unforgettable learning experience for eighth graders.
"This is not a simulation," notes Childers. "They're doing real science."
Support professionals were among those interviewed by the federal evaluator during the Blue Ribbon competition. They, too, play a key role in building the educational environment.
But the low-performing label has nothing to do with unforgettable learning experiences or sound educational environments. It's attached to any school that doesn't meet test score standards.
More than 40 percent of Michigan schools were on last year's low-performer lists, more than in any other state. Why? Because for last year's list, the new federal law rested on old state standards. Michigan had extremely tough standards, intending them as targets that schools should shoot for. Michigan not only required that students score high, but also that the scores rise each year. Although Hart students scored high, scores didn't rise enough--ironically in science--to meet the requirement.
Now that the stakes are higher, Michigan and some other states with high standards have decided to relax them in an effort to shrink the low-performer list. Hart Middle School is off the hook--for now.
But the first year's results from the new federal law are just a taste of things to come. State education officials are predicting even more schools will land on low-performer lists when the new law reaches a new phase of implementation in the 2004-05 school year (see accompanying story).
Meanwhile, after USA Today released its list of 19 Blue Ribbon "low performers," the federal Department of Education took steps to avoid a repeat. In the future, getting a Blue Ribbon will depend on test scores.
The Straitjacket of Standardized Tests
A Portland teacher wonders: Where is the standardized test that can measure passion for learning, respect for others, and human empathy?
By Tom McKenna
When I first met Sol Shapiro he was in his 80s, living alone in a retirement home. He was the first person my Portland high school history class interviewed for an oral history project about old South Portland, Oregon. My students were primarily African American. Sol was Jewish. They were young. He was old. Neither was really excited about the encounter.
We met in Sol's apartment. The interview began.
"My name is Sol Shapiro. I am very familiar with old South Portland." Silence. Quietly, almost imperceptibly, Sol began to weep. My students were stunned. They turned off the video camera. Soon Felicia ventured to place her hand on Sol's shoulder, "It's OK, Mr. Shapiro, we understand."
Sol cleared his throat, removed his glasses, and dabbed his tears. "Excuse me, please," he said. "I'm very sorry." He rose and left the room. My students looked at me with puzzled expressions. "What do we do now?"
We waited. After a few minutes, Sol returned. He started sharing artifacts from his life with us, a steady stream of photographs, letters, and religious pieces, each accompanied by a tale from his past. Students came to class the next day with a new outlook on old people.
Unfortunately, given the demands of current educational "reform," teachers who want to give their students this kind of indelible learning experience are finding it more and more difficult to do.
We feel pressured to prepare students to do well on high-stakes, standardized tests. These tests have become the measure of our students' learning. Clearly, the tests threaten to define the way we teach. In a world enriched with difference, the hidden curriculum of much of this "reform" is singularity, sameness, and compliance.
What could a multiple-choice test reveal about what my students learned in the South Portland project? Lives changed. Students were moved to social action. They sat in an orthodox synagogue with yarmulkes on their heads and learned about Judaism. They became passionate experts on urban renewal.
Oral history can be a powerful classroom tool. Out of necessity, students acquire valuable skills in pursuit of learning that matters. They formulate questions for interviews. They work collectively to solve problems that threaten to derail hours of work. Text needs to be written and written well. Discovery leads to questions, and research is needed to find answers to those questions. Research leads to surprise, surprise breeds excitement, excitement spills over into passion.
James never missed history class. Often, he had to sneak in and out of my room to hide from the dean because he rarely attended his other classes. He was our number one cameraman and interviewer. Jennifer uncovered a quote from a neighborhood meeting, long lost in dusty boxes, that moved her to angry tears. History came alive for her when she read the comment of a state official about the people who would be moved by urban renewal: "Frankly, we don't give a damn about the renters."
The result of our work was a 30-minute documentary about South Portland and the urban renewal that destroyed it. We were invited to show it at the Portland Art Museum auditorium. I got there early and stood outside to help direct my students and their families to a facility where none had ever been, in a part of town where few ever ventured.
About 250 people attended our premiere that night. The students deftly answered questions and talked extensively about their experience. Afterward, students, their families, and former residents of South Portland gathered at my home. Students commandeered my stereo, and their music boomed throughout the house. I went to turn it down, but stopped when I saw what was going on in my living room. Dancing hand in hand were two groups of people about as different as I could imagine and who, when I first approached them about getting together, resisted the idea. Young, old, African-American, Jewish were joined together in a celebration of each other and of the new understanding that our project helped them achieve. They embraced the differences that once kept them apart.
Find me the standardized test that can measure the meaning of that embrace.
Tom McKenna (tmckenna@pps.k12.or.us) is the Social Studies Coordinator for Portland Public Schools. This article is adapted from a story originally published in The Oregonian. Names have been changed.
|