This paper provides an overview of some current issues in test administration, and also serves as an introduction to the other papers in this symposium. State and local achievement testing programs consume large amounts of financial resources, as well as large amounts of time for students, teachers, and other staff involved in the collection and reporting of test data. In addition, many research studies, program evaluation, and policy decisions rely heavily on the results of these achievement tests, with the assumption that the test data are collected under standardized conditions. Are these assumptions about standardized conditions valid? If not, how can we improve the administration of the tests we give?
Observing and monitoring the testing in my own district has led me to conclude that often test administrations are not standardized. Consequently, on such unstandardized occasions, the tests do not produce valid results. While afew of the incidents I have observed during the last several years of monitoring involved breaches of appropriate practice that were extreme and intentional, such ethical lapses have been quite rare. Instead, unintentional misadministrations occurred with much greater frequency and it is these more common problems that seem particularly worthy of discussion. In addition, as the purpose and format of current assessments evolve, some contextual factors seem to be contributing to these administration issues. Other school districts may be facing these same issues and the papers in this symposium may help start a dialogue about these issues.
Many of the mistakes we observed resulted from teachers who were complacent, apathetic, or ignorant about the test administration. For example, prior to the state writing assessment, a teacher failed to remove or cover a bulletin board about steps to good writing; the bulletin board had been in her room for the past 18 months. The teacher said, "It was like part of the furniture; I didn't even notice it or think about it." A bit less complacency and more concern about the process of testing would have been useful in this case! Surprisingly, a significant number of teachers also apparently do not read the instructions prior to administering the test. Some teachers apparently believe that they know how to administer any standardized test because they have administered a previous test. For example, we found one teacher administering a new reading test that was supposed to include two five-minute rest breaks within the testing time, instead of the one break included in the district's previous test. The teacher failed to read the instructions or notice this difference. She compounded her error by misreading the script during the testing, and therefore, omitted the section of testing time that would have fallen between the two breaks. Students lost 20 minutes they should have had to work on the test, and all of the scores were invalidated. A bigger concern is the number of other teachers who made the same mistake but were not one of the less-than-one-percent of classes audited or monitored.
A second type of problem occurred as teachers struggled with the increasing complexity of testing procedures. For example, on North Carolina's standardized mathematics tests, calculators can be used for some items but not for other items. Often, annual changes in state or local test administration procedures make correct administration of the tests even harder. One year, the State of North Carolina allowed middle school students to use special calculators if students had been instructed in how to use them. The next year, the state required the students to have access to the calculators when taking the same test, even if students had not been taught how to use them. In North Carolina, it is acceptable to read test instructions to learning disabled students during the state's math test, but not during the reading test.
The complexities of these examples pale, however, when you look at the requirements of some performance assessments. Such assessments may require students to perform multi-step activities or experiments that in turn require complex procedures for set-up and administration. Students may work in groups, while teachers attempt to ensure that their actual answers are not shared. Deviation from the "standardized" procedures of such assessments are hard to define and the consequences to validity of any variations in administration are hard to interpret. The complexities, meanwhile, lead to mistakes even by conscientious teachers.
The administration of some tests has become extremely burdensome for school staff. While teachers and policy makers may agree on the benefits of performance assessment when it is an integral part of instruction, they may not agree on who should score student papers for the state's writing assessment, nor on who should count out the 1,200 paper protractors into classroom sets of 30 for use in mathematics assessment for middle schools. How much instructional time should be devoted to administering and scoring state-mandated performance tasks in the classroom? Ideally, all tests would be part of an instructional unit and teachers would administer them as part of their instruction. Unfortunately, concerns about accountability, reliability, and test security usually mean that state-mandated tests are just "add-ons" to teacher workloads. Furthermore, the burdens of these tests can also impact students adversely. In North Carolina middle schools, the state-mandated computer skills tests require so much time to administer that the testing shuts down the schools' computer labs for other purposes for up to six weeks of the school year. (These tests take 90 minutes for each group of 15 students. The tests also require the presence of two teachers or other professional staff.)
Many of the test administration problems we have observed could be ameliorated with better pre-service and in-service training for teachers and principals. Pre-service training should include information about various types of assessments and standardization issues. In-service training should provide information on these topics, as well as information on each particular test. Many teachers and principals come to their roles with very little academic training in the area of tests and measurements. We will need help from colleges of education to address these pre-service issues.
Our concern about more inservice training of current teachers is not based solely on our observation of their mistakes. At the 1993 NCME meeting in Atlanta, Bill Moore presented data showing that a sample of teachers identified less than half of inappropriate testing practices. While we acknowledge the need for training, however, the task can seem almost insurmountable in larger districts. Practically speaking, how do we train 5,000 teachers in a large school system? Assuming the testing department within a school district had resources to conduct the training, many test directors would still find it difficult to mandate such training because of restrictions on teacher work hours, extra-duty activities, or other staff development requirements that consume teachers' time. There is some hope, at least, for training principals and building test coordinators because there are fewer of them. Relying on them to train teachers may be the only alternative in large school districts.
In our observations at schools with test administration problems, some clear patterns emerged. Most problems occurred in schools with fairly new principals (three years or less experience) and with a new or weak test coordinator. Effective BTCs need to be knowledgeable and committed, but they also need the full support and backing of the principal. If we are to rely on BTCs to train teachers in test administration, they must have sufficient authority and resources to do their jobs. Kevin Matter's paper discusses the ways to ensure success of BTCs, as well as other ways to improve test administration practices in school.
In a high stakes testing environment, such as currently exists in a number of states and local districts, test administration issues receive more scrutiny than when the stakes are low. The State Board of Education in North Carolina recently appointed a statewide panel to monitor compliance issues related to the high-stakes testing program created by reform legislation. At the same time that such scrutiny is increasing, however, the higher stakes associated with these testing programs also increase the pressure on schools and districts to boost students' scores. In the absence of knowledge about appropriate test practices, teachers may be more likely in high stakes testing environments to implement strategies that are questionable. For example, at the school level, some students may be discouraged from taking the test if the test results have consequences for schools or teachers. We need to address the gap in teachers' knowledge about acceptable and ethical practices to preserve the credibility of the testing programs we are asked to implement.
In recent years, there has been more emphasis on including special education students and ESL students in testing situations. This trend results from the overall emphasis on greater "inclusion" of these students, but also as part of a well-intentioned effort to hold schools accountable for all their students. More testing of special populations of students, however, raises issues of exemptions and accommodations for such students. While there are advantages to having instructional standards for all students, the special accommodations required for testing special education students (e.g., small group, real-aloud, or Braille translations) can become overwhelming at the building level and the data produced by such tests are often difficult to interpret. If students are scoring at the chance level, we are also risking some trauma to students while obtaining no useful information from the effort.
Clearly, some of these test administrations issues are not new (e.g., ethical test preparation and exemptions for special populations) although they may have new dimensions, while other issues are fairly recent (e.g., large scale administration of complex performance assessments). The papers that follow provide additional insights into both the issues and possible ways to address them. Marty Ward has recently chaired a state-wide committee working to develop "model" state and local policies about testing. She has also worked in the testing office of the state education department in North Carolina and now heads the testing program in Guilford County (Greensboro), North Carolina. Kevin Matter has worked in school district testing offices for almost 20 years. In addition to winning frequent awards for testing reports and publications, he has developed an excellent system for ensuring high quality test administrations within his own district. Finally, Guy Glidden, outgoing NATD president, has prepared a discussion guide to help guide our discussions towards improved test administrations in our own state and district testing programs.
Moore, W.P. (1993). Preparation of Students for Testing: Teacher Differentiation of Appropriate and Inappropriate Practices. Paper presented to the National Council on Measurement in Education Annual Meeting, Atlanta, April 13, 1993.