Assessment: An Accountability Revolution. In
Marci Kanstoroom and Chester E. Finn, Jr. (Eds.),
BetterTeachers, Better Schools. (Washington,
DC: Thomas B. Fordham Foundation, 1999).
Citizens and policymakers concerned with the
quality of public schooling are calling for improved
standards and accountability. They want
objective evidence that students are meeting high
Several states have responded by implementing
periodic achievement examinations and setting
a high score as the expected minimum. Virginia,
for example, set a requirement that 70% of each
school's students must meet or exceed a challenging
cutoff score on its new Standards of Learning
test. In a pilot run in 1998, less than
3% of schools were successful. Similarly,
Washington set a standard (but not a mandate)
that 80% of each school's students should exceed
a high minimum score on the Washington Assessment
of Student Learning. Only 70% of students
at the State's highest performing school were
Inevitably such high rates of failure call the
standards themselves into question (Sanders &
Horn, 1995). Can all schools reasonably
be expected to reach the same standard and within
the same time-frame? For schools whose students
are mostly high achievers, the task may be surmountable;
but what about the schools with high percentages
of disadvantaged students? Moreover, can
annual reports showing only the percentage of
students reaching a set standard accurately and
fairly reflect the work being done with low achievers?
Clearly, if percentage passing is the only indicator
of school and teacher performance, the answer
must be no. A given school or teacher could
make great strides with disadvantaged students
yet fail to reach a standard that would easily
be attained by a more advantaged population.
Beyond considerations of accuracy and fairness
there is the question of whether a standards-based
accountability policy creates perverse incentives.
Surely it does. Teachers are encouraged
to recommend difficult-to-teach students for programs
in which they will be exempted from testing--special
education, for example. States such as Texas
are already encountering this problem. Also,
instead of working to encourage all students to
do their best, standards-based accountability
effectively requires schools and school systems
to put their greatest effort into ensuring that
low achieving students reach the minimum.
In effect, they are encouraged to serve the least
talented students at the expense of the most talented.
Few parents or policymakers would agree with these
If minimum score requirements are lowered in response
to these concerns, a different question arises:
Have policymakers accommodated standards to the
least well equipped students and thereby set a
mediocre expectation for all? Alternatively,
if they adjust expected minimums to account for
differences in student populations, are they effectively
consigning some students to inferior schooling?
Thus far states have responded to these questions
by maintaining that students in all schools should
be expected to reach the same high minimum standards.
Whether these standards will be politically sustainable
in the face of disproportionate failure rates
among disadvantaged and minority students remains
to be seen.
Plainly, a better approach to school and teacher
accountability is needed and fortunately one is
available: Value-added assessment. Value-added
assessment permits citizens and policymakers to
see how much the students in a given school or
classroom are gaining annually in relation to
their past history of achievement. As explained
below, it permits schools and teachers to be judged
on the basis of how much progress they have made
with their students regardless of entering achievement
levels. The perverse incentives resulting
from standards-based accountability are thereby
Yet value-added assessment does not replace standards-based
assessment. As a matter of sound policy,
schools and school systems must be concerned both
with student academic gains and with the percentage
of students who attain the achievement minimums
required for advancement or graduation.
What value-added assessment offers is a means
whereby citizens, policymakers, and school administrators
can accurately determine how much schools, school
systems, and individual teachers are contributing
to the attainment of expected achievement levels
irrespective of the students they were assigned.
There is widespread agreement that teacher quality
is critical to classroom success. A recent
report by Sanders and Rivers (1997) has shown
very substantial differences in the achievement
gains earned by students who have excellent teachers
versus those who have ineffective ones.
The differences were large enough to decisively
shape the subsequent academic careers of the students
The teachers in the Sanders and Rivers study were
all fully licensed as are most teachers in public
schools. Plainly, however, there were important
differences in their effectiveness and to some
extent these differences were due to their training.
Surprisingly, the question of the degree
to which teacher training enables teachers to
boost student achievement is one for which there
are few clear answers. The quality of teacher
training has traditionally been assessed by a
review of program inputs, e.g., whether the program
includes certain courses, whether faculty are
properly credentialed, whether the institution
hosting the program is properly funded and accredited,
etc. Whether teachers trained by a fully
approved program are able to do a superior job
of producing measured student achievement, however,
has not been unambiguously answered by research.
In response to growing recognition of the need
for improved teacher quality, the National Council
for the Accreditation of Teacher Education (NCATE)
has proposed changes in its standards for accreditation
of teacher training programs. Instead of
the traditional review of inputs--i.e., a curriculum-based
review--NCATE plans to conduct "performance-based"
reviews--i.e., reviews of program strengths and
weaknesses as evidenced by the demonstrated knowledge
and skills of program graduates. Performance-based
reviews will purportedly answer the question of
whether newly trained teachers are truly prepared
to teach. The problem, however, is that
NCATE considers teachers to be well prepared when
they understand, use, and believe in the pedagogical
concepts that the teacher training community has
been prescribing for years. In all likelihood,
improved teacher training by this definition will
only produce more of the same.
NCATE is already the largest accreditor of teacher
training programs. Its standards are being
promoted nationally as the key to improved teacher
preparation. If NCATE and its allies have
their way, NCATE standards will become de facto
national standards for teacher training.
Value-added assessment provides a critically important
alternative to NCATE's concept of teacher quality.
Instead of judging teacher quality by observing
NCATE's preferred competence indicators, value-added
assessment permits observers to appraise the record
of student achievement gains produced by recent
program graduates. In other words, instead
of judging teacher effectiveness only on the basis
of indicators that NCATE believes predict effective
teaching, value-added assessment permits effectiveness
to be judged on the grounds of demonstrated success
in producing actual achievement. The Sanders
and Rivers study noted above examined just such
data drawn from the Tennessee Value-Added Assessment
TVAAS is the heart of Tennessee's education accountability
system. TVAAS has been around since the
late 1980s, and since 1995, it has been enlarged
to produce value-added teacher effectiveness data
for review by principals, and other school system
personnel since 1995. With a data base such
as that provided by TVAAS, any state that desired
to assess the quality of its teacher training
programs could do so by aggregating the value-added
performance of novice teachers trained by each
Value-added assessment is further explained below
and an example of a TVAAS "Teacher Report"
is provided in the Appendix below.
Value-added assessment is a system of statistical
analysis that summarizes annual gains in student
achievement. The most recent and mathematically
sophisticated version of value-added assessment
was developed by Dr. William Sanders at the University
of Tennessee (Sanders,Saxton, & Horn, 1997).
It has been used in Tennessee since the early
nineties. A slightly different kind of value-added
assessment using a different type of statistical
analysis was used as early as 1984 in the Dallas
(TX) Independent School District (Webster &
Tennessee tests all students annually in grades
3-8 with a customized version of McGraw-Hill's
Terra Nova instrument. The testing program
is called the Tennessee Comprehensive Assessment
Program (TCAP) and the results are used to inform
students, parents, and teachers about individual
pupil achievement. The Tennessee Value-Added
Assessment System (TVAAS) produces annual reports
of the aggregate student achievement gains produced
by each teacher, school, and system in Tennessee's
public schools. The annual reports for
school systems are broken down by school and grade
for each of the 5 subject areas measured by the
TCAP exam (math, science, reading, language, and
social studies). The reports for teachers
aggregate the gains earned by all students for
which the teacher was responsible. Under
current Tennessee law, school and school system
but not individual teacher reports are made public.
TVAAS reports express achievement gains in scale
score points and in the form of comparisons to
national, state, and local averages. For
example, 25 points is a typical amount of gain
in student math achievement produced by 4th grade
Tennessee teachers. The average gain in
math produced by 4th grade teachers nationally
is 26 points. Therefore, the typical Tennessee
teacher is producing gains in 4th grade math equal
to 96% of the national average. In 4th grade
science, by contrast, Tennessee teachers are producing
gains equal to 115% of the national average.
For critically important comparisons, a three-year
rolling average is used to assure statistical
stability. For example, Washington County's
Boones Creek Middle School produced a three year
average gain (1993-95) of 65 scale score points
in language arts for grades 5-8. The national
average gain in language arts for grades 5-8 is
50 scale score points. Thus, Boones Creek
Middle school produced gains equivalent to 130%
of the national average language arts gains.
By comparing student current achievement to their
past performance and aggregating the results,
value-added assessment permits all stakeholders
to see the impact of individual teachers, schools,
and schools systems on the average progress of
the students for which they are responsible. Not
incidentally, value-added assessment can also
be used by education decision-makers to assess
the effectiveness of everything from the latest
curricular innovations, to the preparedness of
novice teachers, to the quality of the programs
in which teachers were trained.
The statistical analysis employed in value-added
assessment is an advanced form of "analysis
of variance" called Henderson's "mixed
model." It is described in "The Tennessee
Value Added Assessment System" by Sanders,
Saxton, and Horn (1997) and in several other sources
cited below (Harville, 1976; Harville, 1977; Henderson,
1982; McLean, Sanders, & Stroup, 1991; Patterson,
& Thompson, 1971; Raudenbush, & Bryk,
1988; Sanders, 1989). It produces a "best
linear unbiased estimate" of the influence
on annual student achievement gains attributable
to teachers, schools, and school systems. Of technical
significance, value-added estimates of teacher
influence are derived from a multi-year "layered"
computational model and corrected by a "shrinkage
estimate." These two features substantially
reduce the possibility of false negative or false
positive estimates and ensure that the resulting
indicators of achievement gain are as exact as
fairness will permit.
Compared to other methodologies for computing
student achievement gains, value-added analysis
is more precise and less vulnerable to manipulations
that can distort results. For example, the hierarchical
linear regression analysis used in Dallas, TX
can exaggerate the differences attributable to
a factor such as funding and correspondingly underestimate
the differences attributable to teaching effectiveness
if a variable such as per-pupil spending is prematurely
entered into the statistical adjustment of student
Value-added assessment is statistically robust
but the validity of its results depends on certain
preconditions. At a minimum, it requires annual
testing of students in all grades with a reliable
and valid achievement test. Portfolio assessment
and other forms of assessment that lack reliability
and objectivity will not suffice. Neither will
standardized achievement tests that have been
revised to enhance their marketability to educators
at the expense of diminished academic content.
No amount of analysis can transform the substance
and meaning of fundamentally flawed data. Perhaps
this limitation is best expressed in the statistician's
time-honored adage "Garbage in, garbage out."
Yet, nothing in the use of value-added assessment
precludes teachers from also using portfolios
or any other form of assessment they deem necessary
and desirable. Most educators believe that schooling
should serve aims beyond those that can be measured
by achievement tests and so they favor a variety
of assessments. Parents and the public are not
necessarily opposed to these broader aims, but
they do disagree with the vast majority of educators
about priorities. Whereas educators may view measured
academic achievement as only one outcome among
many, parents, taxpayers, and policymakers view
it as the indispensable essential core of student
(and teacher) results. No matter what other benefits
good schooling may produce, those who fund the
schools and who enroll their children in them
will not be satisfied if the visible gains in
objectively measured academic achievement are
insufficient. Like an annual audit conducted by
an external auditor, value-added assessment is
an objective means whereby the consuming public
can see whether its priorities are being respected
and its hopes fulfilled.
In addition to requiring the use of a valid and
reliable achievement test, value-added assessment
requires that the items used in each annual testing
be fresh, non-redundant, and tied to an underlying
linear scale. The forms used at each grade level
must include a sufficiently wide range of items
such that "ceiling" and "floor"
effects are highly improbable. Also the scores
produced by the test must be reported on a common
scale that spans the range of grades for which
the test is appropriate.
The purpose of these requirements is to insure
that the effectiveness of teachers, schools, and
systems is tracked yearly, measured in understandable
terms, and not artificially limited by the assessment
process itself. In particular, the use of fresh
test items insures that the gains calculated from
value-added assessment represent student progress
along the full spectrum of curricular objectives
and not just improvements in the material sampled
by the test, i.e., it discourages teaching to
In order to insure fair assessments of teachers,
Tennessee's value-added assessment reports include
only those pupils who have attended school for
at least 150 days and are not eligible for special
education services. [Special education students
are assessed through "individual education
plans."] For teachers who have taught a given
student for less than a full year, only those
students who have been the teacher's responsibility
for more than 75 days are counted. Teachers
whose subjects are not covered by the annual achievement
examinations (e.g., art and music) are not assessed
by value-added indicators.
Value-added assessment offers several important
advantages when compared to other forms of educational
1. It expresses teacher, school, and school system
effectiveness in terms of increases in achievement
over previous performance. In other words, in
the computations that underlie teacher, school,
and system effectiveness, each student is compared
to his or her own record of achievement over a
period of several years. By contrast, most present-day
education accountability systems assess effectiveness
by comparing current student achievement to an
average or to an arbitrarily prescribed standard.
The failure of education accountability systems
to consider gain relative to previous achievement
can result in misleadingly negative evaluations
for educators who are producing substantial but
insufficient gains with disadvantaged students
or misleadingly positive evaluations of educators
who are producing mediocre gains with talented
and advantaged students.
2. It excludes from the estimates of teacher,
school, and school system effectiveness the influence
of all pre-existing differences among students.
These include but are not limited to race, socioeconomic
status, previous learning, intelligence, and all
other factors, known and unknown, that have influenced
previous achievement. In contrast to "regression
analysis" approaches to student gain assessment,
the "mixed model" approach employs statistical
"blocking" to remove the contribution
of suspected biasing influences. Blocking has
the advantage of removing differences without
the necessity of measuring and computing the magnitude
of each of the various excluded factors.
As counterintuitive as the notion of removing
differences may seem, empirical studies of value-added
assessment have demonstrated that it does remove
differences among students and thereby levels
the playing field for teachers, schools, and systems.
Statistical analyses of Tennessee's value-added
scores have shown no relationship between annual
gains and previous achievement, race, eligibility
for free or reduced lunch, or any other of a variety
of potentially biasing differences among students
(University of Tennessee Value-Added Research
and Assessment Center, 1995).
Although value-added assessment removes pre-existing
differences, it must be noted that such differences
are not the only factors beyond an educator's
control that can influence student gains.
Neither mixed-model analysis nor any other means
of educational gain assessment automatically removes
the effects that might result from "exogenous"
influences arising during the course of the school
year. For example, student illness or a natural
disaster during the course of a school year might
adversely affect someone's achievement gains.
Conversely, improvements in family income or the
introduction of better community health care might
contribute positively to achievement.
The influence of exogenous variables can and must
be considered--especially as they impact a given
school year--and mixed model methodology is able
to incorporate such considerations. Happily, however,
value-added analysis--properly interpreted--minimizes
the need to do so. First, data is averaged over
a period of years (permitting positive and negative
influences to counterbalance each other) and,
second, the gains of teachers, schools, and systems
can be compared to the gains of other teachers,
schools, and systems that have been exposed to
the same or similar influences.
3. Mixed model value-added assessment is able
to isolate the achievement effects produced by
an individual teacher so long as students have
been taught by that teacher for at least 75 days
per semester. As a result, it is possible to assess
teaching effectiveness regardless of whether teaching
has been undertaken on a departmental basis, a
team basis, or a traditional self-contained classroom
4. The influence of a given teacher on student
gains is expressed in the form of a "shrunken"
or "regressed" estimate, i.e., an estimate
that guards against an unfair assessment. In other
words, the value-added system takes a very conservative
approach to assessing teacher impact and thus
insures that those who are identified as effective
or ineffective are deserving of their classification.
5. Value-added assessment using mixed-model methodology
makes use of all student scores despite the fact
that some students will have missed tests and
have incomplete sets of data. By contrast, methodologies
such as regression analysis exclude students for
whom complete data is lacking and thus they typically
remove substantial numbers of students when analyses
span 4 or 5 years. Because poorer performing
students are often the ones to miss tests, the
exclusion of such students can substantially inflate
achievement gain estimates.
6. As described above, value-added assessment
permits comparisons to national average student
gains and thus provides an understandable measure
of student progress. However, a caveat
must be noted. Gain scores depict how well
students are progressing beyond their previous
skills and knowledge but do not show how they
stand with respect to an external benchmark of
attainment, i.e., a national norm or prescribed
standard. For this reason, comparison to national
average gains is not a sufficient basis for judging
education outcomes. A complete assessment requires
consideration of both value-added performance
and performance referenced to an external standard.
Tennessee's value-added reports, for example,
are concerned primarily with average student gains
and the comparison of those gains to national
averages. Additionally, however, the TVAAS
reports include average levels of achievement
and appropriate national norms against which they
may be judged.
An alternative to Tennessee's reporting system
is one in which the annual learning gains produced
by a given teacher, school, or system are compared
to the annual learning gain necessary to bring
students to an externally referenced benchmark.
Although not currently used by any state, such
a report would make it possible to consider both
indicators simultaneously. For example, a school
system with a substantial number of disadvantaged
students might need to produce learning gains
equal to 110% of the national average gains in
order to reach national grade level standards
by the 8th grade.
Although it employs some complex statistics, value-added
assessment creates a simple but enormously important
change in the educational landscape. It enables
parents, taxpayers, and education decision-makers
to see for themselves whether schools are working.
It does so by greatly simplifying the process
of interpreting reports on school effectiveness.
Such a change can revolutionize education. The
public has been flooded with information about
school quality but making sense of it has required
experts and most of the experts have been educators
who work for or with the schools. Now schools
can produce a balance sheet and report an objective
bottom line that is understandable to the interested
citizen. Eventually, resources and students will
flow to the effective schools and away from the
For the interested reader there is a fairly extensive
literature pertaining to value-added assessment.
Although no one has yet written an account of
mixed model methodology suitable for a general
audience, it has been critically examined and
reviewed by a number of scholars and policy experts.
Perhaps more importantly, value-added assessment
has been used successfully in Tennessee for nearly
10 years and many of the interested parties have
learned how to interpret and make use of it. Some
schools have used it to identify weaknesses and
have, as a result, made phenomenal gains.
Other schools--notably rural schools and schools
with high percentages of economically disadvantaged
students--have been able to show that they are
doing a much better job of teaching than had been
evidenced by indicators such as expenditures and
the use of the latest educational practices. On
the whole, student achievement in Tennessee has
been improving over the years that value-added
assessment has been in place.
For a general description see
McLean, R. A., & Sanders, W. L. (1984). Objective
component of teacher evaluation A
feasibility study (Working
paper No. 199). Knoxville University of Tennessee,
Sanders, W. L., & Horn, S. P. (1994). The
Tennessee Value-Added Assessment System
(TVAAS) Mixed model methodology
in educational assessment. Journal of Personnel
Evaluation in Education,
Sanders, W. L. & Horn, S. P. (1995). Educational
assessment reassessed The usefulness of
standardized and alternative
measures of student achievement as indicators
assessment of educational
outcomes. Education Policy Analysis Archives
3(6). Available http//epaa.asu.edu/epaa/v3n6.html
Sanders, W. L. & Rivers, J. C. (1996,
November) Cumulative and residual effects
teachers on future student
academic achievement. (Available
from UTVARC , 225
Morgan Hall, P.O. Box
1071, Knoxville, TN 37901-1071)
Sanders, W. L., Saxton, A. M., & Horn, S.
P. (1997). The Tennessee value-added assessment
system, a quantitative,
outcomes-based approach to educational measurement.
Millman (Ed.). Grading
teachers, grading schools, Is student achievement
a valid evaluation
137-162). Thousand Oaks, CA Corwin Press.
Sanders, W. L., Saxton, A. M., Schneider, J. F.,
Dearden, B. L., Wright, S. P., & Horn, S.
(1994). Effects of building
change on indicators of student academic growth.
Webster, W. J. & Mendro, R. L. (1997).
The Dallas value-added accountability system.
Jason Millman (Ed.).
Grading teachers, grading schools, Is student
achievement a valid
evaluation measure? (pp.
81-99). Thousand Oaks, CA Corwin Press.
For empirical findings with respect to value added
model performance see
University of Tennessee Value-Added Research and
Assessment Center. (1995). Graphical
summary of educational
findings from The Tennessee Value-Added Assessment
from UTVARC , 225 Morgan Hall, P.O. Box 1071,
For an evaluation and policy analysis see
Gormley, W. T. Jr., & Weimer, D. L. (1999).
Organizational Report Cards. Cambridge,
Harvard University Press.
For a comprehensive technical review see
Bock, R. D. & Wolfe, R. (1996, January 23).
Audit and review of the Tennessee Value-Added
(TVAAS) Preliminary report. (Available from
the Tennessee Office of
Comptroller of the Treasury, State Capitol, Nashville,
For technical discussion of "mixed model"
statistical analysis see
Harville, D. A. (1976). Extension of the Gauss-Markov
Theorem to include the estimation of
Annals of Statistics, 4(2), 384-395.
Harville, D. A. (1977). Maximum likelihood approaches
to variance component estimation and
to related problems.
Journal of the American Statistical Association,
Henderson, C. R. (1982). Analysis of variance
in the mixed model Higher level,
and random regressions. Biometrics, 38, 623-640.
McLean, R. A., Sanders, W. L., & Stroup, W.
W. (1991). A unified approach to mixed linear
Statistician, 45(1), 54-64.
Patterson, H. D., & Thompson, R. (1971). Recovery
of interblock information when block sizes
are unequal. Biometrika,
Raudenbush, S. W., & Bryk, A. S. (1988). Methodological
advances in analyzing the effects of
schools and classrooms
on student learning. Review of Research in
Education, 15, 423-479.
Sanders, W. L. (1989). A multivariate mixed
model. In Applications of mixed models in
related disciplines (Southern Cooperative
Series Bulletin No. 343, pp. 138-
144). Baton Rouge
Louisiana Agricultural Experiment Station.
For a review of several current approaches to
assessing educational effectiveness
Millman, J. (Ed.). (1997). Grading teachers,
grading schools. Is student achievement a valid
Thousand Oaks, CA Corwin Press, Inc.
For sample school and system value-added reports
Tennessee Department of Education
6th Floor, Andrew Johnson Tower
710 James Robertson Parkway
Nashville TN 37243-0375
For school and system value-added reports in a
consumer-friendly on-line display:
The Tennessean (Nashville)
For technical information regarding value-added
analysis and its implementation
Dr. William Sanders
University of Tennessee Value-Added Research and
225 Morgan Hall
P. O. Box 1071
Knoxville, TN 37901
Phone (423) 974-7336
Fax (423) 974-7448
TVAAS TEACHER REPORT
TEACHER COPY ***
Teacher: XXXXXXXXXXXXXXXX (000000000)
System: XXXXXXXXXXXXXXXX (000)
School: XXXXXXXXXXXXXXXX (000)
Gains and (in parentheses) their Standard
USA Norm Gain:
State Mean Gain:
1995 Teacher Gain:
1995 System Gain:
1996 Teacher Gain:
1996 System Gain:
1997 Teacher Gain:
1997 System Gain:
Teacher vs Norm:
NDD from Norm
NDD from Norm
NDD from Norm
Teacher vs. State:
NDD from Mean
NDD from Mean
NDD from Mean
NDD from Mean
Teacher vs System:
NDD from Mean
NDD from Mean
NDD from Mean
NDD from Mean
NDD from Mean
Note: NDD = Not Detectably Different (within
2 standard errors).
Teacher 3-Year-Average Gain in Scale Score
Units with Approximate 95% Confidence Intervals
(----L------T S N-------)
Soc. St. (----------LS*------------)
Legend: T = Teacher Gain, L = System
(LEA) Mean Gain, S = State Mean Gain, N = National
Norm Gain. An asterisk (*) indicates that
2 or more of the above symbols coincide.
The estimated teacher gains presented here
are the official TVAAS estimates from the statistical
mixed model methodology which protects each teacher
from misleading results due to random occurrences.
Each teacher's gain is assumed to be equal to
the average gain for the district until the weight
of the data pulls the estimate away from the district
This year's estimates of previous years' gains
may have changed as a result of incorporating
the most recent student data.
J. E. Stone is an educational psychologist and
the College of Education at East Tennessee State
He also heads the Education Consumers ClearingHouse
This paper was completed with the support of The
Foundation Endowment, 611 Cameron Street, Alexandria,
<firstname.lastname@example.org> (703) 683-1077
Reprints may be obtained from The Foundation
Endowment or from the Thomas B. Fordham Foundation,
Street, NW, Suite 600, Washington, DC 20006,