Cover
letter from ECS President Ted Sanders
Synthesis of Reviews of
“The Value-Added Achievement Gains of NBPTS-Certified Teachers in
Tennessee:
A Brief Report”
by Susan Fuhrman
Introduction
In May, 2002, J.E. Stone, Ed.D, of the College of Education at East Tennessee State University, produced a 7-page paper entitled, “The Value-Added Achievement Gains of NBPTS-Certified Teachers in Tennessee: A Brief Report.” The Education Commission of the States asked four scholars to review the study. This synthesis summarizes the comments of Dominic Brewer, Susan Fuhrman, Robert Linn, and Ana Maria Villegas.
Research Problem
The Stone study addresses the question of whether Tennessee’s teachers certified by the National Board for Professional Teaching Standards (NBPTS) “…are exceptionally successful in improving the achievement scores of their students.” The author uses the scores of teachers in the Tennessee Value Added Assessment System (TVAAS) data base and defines “exceptional” teaching as that which “brings about an improvement in student achievement equal to 115% of one year’s academic growth in the local school system.” That standard is used by the state of Tennessee and is the same standard used to identify “high performing” teachers in a new Chattanooga, TN incentive program. Stone finds no NBPTS teachers meeting the standard in all the required subjects and over the three years required by the Chattanooga system and therefore questions the effectiveness of the Board system.
The reviewers agree that the problem Stone addresses, the relationship between student achievement gains and teacher certification by the National Board for Professional Teaching Standards, is important. As one reviewer noted, “…the paper is one of the very few that actually attempts to link student achievement to national Board Certification.” Previous studies have shown that teachers value the certification process as an important professional development experience (Kelley and Gardner 2002) and that certification has been related to quality teaching. For example, Bond et al (2000) found that the students of Board certified teachers showed deeper understanding of teacher-designed units than the students of teachers who sought certification but did not get it (as cited in Borman 2002). But these studies have not focused on the link to student achievement. At least one reviewer also noted that these previous studies have not been the subject of the kind of scrutiny here being applied to the Stone study.
In part, the absence of studies focusing on the consequences of NBPTS for student learning relates to the newness of Board certification; only recently are there sufficient numbers of certified teachers to support such research. In part, the absence of such studies relates to the Board’s own approach to identifying excellent teachers—examining their practices rather than the learning of their students. Reviewers believe that, regardless of whether Board certification takes student achievement into account in its own processes, policymakers are supporting Board activities in the hope of improved student performance, and they want to know whether certification is related to student learning.
Subjects and Sample
The study sample consists of 16 of the 40 teachers in Tennessee who have received NBPTS certification. It is not clear how the 16 teachers were selected from the 40 with NBTS certification. Reviewers presume that the other 24 teachers are not in grades 3-8, where students are tested annually under the Tennessee Value Added Assessment System.
Stone’s failure to explain how the 16 teachers were selected from the total “n” of 40 is just one example of the absence of any descriptive data about the subjects. Reviewers would have liked the following information about the sample teachers:
Without such descriptive information, readers cannot judge how representative the 16 teachers are of all Board Certified Teachers, of teachers in their school systems or of teachers in the state. One cannot look for patterns between the teachers’ performance and their preparation/credentials or between their performance and their students’ characteristics. As one reviewer says, “…by not describing the sample adequately, Dr. Stone provides no way for readers to explore alternative explanations for the reported findings,” and “the non-random nature of the selections process restricts the generalizability of the findings to the teachers studied. Technically speaking, this is a descriptive study of 16 teachers.”
Instruments
Stone uses teacher effects data from the Terra Nova test, as included in Tennessee’s value-added analysis, the TVASS system. Reviewers understand that Terra Nova is a commonly used commercial assessment and probably a reasonable choice for measuring student achievement. However, readers are given no information about the extent to which the assessment is aligned to Tennessee’s academic standards and thus about whether it is a valid measure of student learning in that state. Further, some teachers increase student scores on multiple-choice tests like Terra Nova by narrowly focusing on the specific knowledge and skills it covers. If teachers recognized by the NBPTS do not focus so narrowly while other teachers do, their students may not perform as well as the students of other teachers.
Reviewers appreciate the sophistication of the value-added analyses conducted by William Sanders that comprise the TVASS system. However, readers are given no explanation of how these scores are derived, and the Sanders system has been criticized for its secrecy and absence of outside scrutiny. The model controls for students’ prior achievement, but this may not be enough to assure that all background factors are irrelevant and that teachers’ scores represent only the value they add. As one reviewer says, “it is not entirely clear that taking prior achievement into account is all that is needed to level the playing field for teachers who teach students who come from different backgrounds and who receive different amounts of academic support from home during the school year.” Further, teachers’ scores or percent gains are calculated relative to other teachers in their systems. It is not clear that it is appropriate to compare teachers from different systems.
Procedures
Stone’s approach is to examine the percent gains for the NBPTS teachers in the various tested subjects they teach and for the various years in which testing data is available for these teachers.
Reviewers note several problems in Stone’s approach. First, a sample of 16 teachers is much too small to support any generalizations. Furthermore, because teacher scores from year to year are extremely volatile (as evident by examination of Appendix A), the TVASS system only considers teachers’ reports “official” when three years of data are available. Such data are not available for 10 of the teachers included in Stone’s study. Therefore, the true sample includes only 6 teachers, an obviously inadequate number.
Second, the volatility of teacher scores raises additional issues. Not only are there significant year to year differences, but teacher scores vary significantly by subject and by schools system. “Large variability suggests that the results are quite unreliable,” according to one reviewer. Consider the differences among school systems. A teacher with a relatively small effect could be cited for large percentage gain if the systemwide effect was also relatively small.
Third, examining the scores in different ways leads to different conclusions. For example, examination of medians and percentile scores as opposed to means shows that the scores in the 75th percentile come close in language arts and mathematics to the 115% percent gain chosen by Stone to represent exemplary performance.
Finally, because, as noted previously, nothing is known about teachers in the sample, there is no way to interpret the scores presented in Appendix A. We don’t know if the teacher scores in the table were gained after Board certification or represent both pre and post measures. We don’t know the fields of certification and have no way of interpreting different subject or grade level scores. As one reviewer notes, “In other words there is no way using the methods presented one could make any valid inferences about cause and effect.”
Results and
Conclusions
The reviewers are unanimous in asserting that the conclusions reached by Stone, that “the findings of this study present a serious challenge to NBPTS’ claims…” and that “…they suggest that public expenditures on NBPTS certification be suspended…,” are completely unsupported by the study. These conclusions severely overreach, considering the methodological limitations identified by reviewers. It would be hazardous enough to base recommendations about the whole NBPTS system on a study of teachers in only one state. Relying on the Stone study, which provides no data about sample teachers that would enable interpretation and includes an extraordinarily small sample, is impossible.
Stone anticipates some criticism in his own brief report. About small sample size, he says that other studies of NBPTS, with different findings also have small samples. Reviewers find this response and Stone’s other attempts to rationalize the deficiencies of the sample inadequate. Other studies do not, as this one does, focus on the link between NBPTS status and student achievement. Given the importance of this topic, it is imperative that studies have adequate sample size and include sufficient information to interpret findings.