Choices and Decisions for Test Instruments in Cognitive and Achievement Assessment

Choices and Decisions for Test Instruments in
Cognitive and Achievement Assessment
Region 4 Education Service Center, Houston, Texas.
Thursday 7 June 2012
Ron Dumont Ed.D., NCSP
Director: School of Psychology
Fairleigh Dickinson University
1000 River Road – TWH101
Teaneck, NJ 07666
[email protected]
John Willis Ed.D., SAIF
Senior Lecturer in Assessment – Rivier College
419 Sand Hill Road
Peterborough, NH
[email protected]
LD means a disorder in one or more of the basic psychological processes involved in understanding or in using language.
You cannot identify a LD without specifying a disorder and showing how it impairs schoolwork.
Severity of LD is not measured by the severity of the weakness in basic process (es). It is measured by the severity of the
impact on academic achievement. Real-life academic achievement is often more important than achievement test scores.
Careful, diagnostic assessment of achievement is often the core of the evaluation.
The “exclusions” are not as important as most people think, as long as the disorder in basic process (es) and impact on
academic achievement have been documented thoroughly and properly.
Students may have LD as a disability secondary to another disability, even intellectual handicap.
Global intelligence measured by total IQ scores (GCA, MPC, BCA, etc.) is usually not a helpful construct for
understanding the cognitive functioning of students with specific learning disabilities. Official definitions of LD do not
define “intellectual ability” even as “intelligence,” much less as an “intelligence test score.” Cognitive ability factors are
usually the most helpful level of analysis.
The factor structure of a test for persons with certain, specific disabilities may be very different from the factor structure for
the norming sample.
It is almost always better to adopt an appropriate test than to adapt an inappropriate one for a student with a severe
Parents, teachers, and the students themselves make important contributions to the evaluation, and they must be included in
the process. Examiners should elicit genuine referral questions, not just “issues” to be answered by the evaluation.
Interviews and questionnaires are essential parts of a complete evaluation.
To ensure the valuable contributions of parents, teachers, and students, evaluation results must be as clear as possible.
Jargon and statistics must be defined very clearly.
The total evaluation must be integrated, which is not achieved with a staple.
LD identification is a professional judgment by a team, not an exercise in arithmetic.
However, any arithmetic involved should be accurate.
Tests and scores are not interchangeable. A student’s age- and grade-equivalent scores (which are horrible statistics) will
not come out in the same rank order as the student’s standard scores. Age-based and grade-based norms differ, and often
both must be reported. Discontinuities between fall, winter, and spring norms can be dramatic. The same performance
yields very different scores on different tests. The same grade equivalent yields different standard scores on different tests.
Reading tests use an extraordinary number of means of assessing reading. They are not interchangeable. Often, several
reading tests must be used for a complete picture. Reading fluency and study skills are important.
Math tests almost always require limits-testing. Correct test scores, including “math-fact” errors and misreading of
operations signs, must not be confused with the reality of the student’s math skills.
Writing must be assessed carefully. The best formal written expression tests have many flaws. Writing samples may be
needed in addition to tests.
Achievement testing should include detailed descriptions of actual skills, gaps, and weaknesses. It is useless simply to
print a table of test scores and describe the scores in words.
Tests that combine separate skills in single scores (e.g., reading decoding and reading comprehension or math calculation
and math applications) are as useful as a second handle on a snow shovel.
Discrepancy formulae are statistical nightmares. The absence of some discrepancy should not be used to exclude children
from LD classification. Discrepancies should be thought of as presumptive in nature, not exclusive.
Norms are much more important than most people think. Norms are worse than most people think. Validity and reliability
matter. Validity for specific purposes and reliability over realistic spans of time are rarely documented.
Diagnoses are not political or economic decisions.
Relative and transient weaknesses must be taken seriously.
Examiners must use the best instruments available. Inadequate tests should be used only when they are absolutely
necessary and the best existing for the purpose.
All disabilities, including LD, can be seen as mismatches between learning style and instruction. Changes in circumstances
can alter the need for special education.
Evaluation processes that always or never recommend highly restrictive placements are equally suspect.
Fads in diagnosis and treatment must be avoided.
Evaluations must be a careful, thoughtful, thorough process, whether initial or re-evaluations
Concrete recommendations, individually planned for the student, are an important goal of the evaluation. Stock,
boilerplate recommendations are not much help. A useful evaluation with recommendations does not cost much more than
a useless one without recommendations.
Computers don’t write reports. People do.
Evaluations should be individual and humanistic, consider multiple intelligences, reflect reality beyond test scores, and
accept the possibility of improvement in areas of weakness as well as potential stability of individual patterns of strengths
and weaknesses.
There is often a huge gap between the science of LD identification and the social policy involved with the identification of
Best practice and educational law are often in conflict.
There is often a distinct difference between an evaluation for classification and an evaluation for diagnosis of educational
A. Test results must be placed in context, including cautious, skeptical interpretation of:
1. The student's actual, real-life performance in school;
2. The student's educational history;
3. The student's personal, familial, and medical history;
4. Reports and referral questions from therapists, parents, and the student; actively solicit reports
and questions;
5. The student's self-reports;
6. Previous evaluations; and
7. Concurrent findings of other evaluators.
8. All data must be integrated, which is not achieved with a staple.
B. Commonalties must be given due weight; disparities must be acknowledged and discussed.
C. The evaluation and preliminary conclusions may be done "blind," but the final decisions must take
into account the above considerations.
A. Do not alter standardized tests especially normed, standardized tests.
1. Obey basal and ceiling rules, and make sure we have them right, since they vary from test to
2. Read test instructions and items verbatim; give demonstrations as instructed.
a. Do not coach, help, or teach except as instructed.
b. Do not ad lib.
c. Do not trust our memories.
d. Practice until you can deliver your lines in a smooth, conversational tone.
e. Do not give unauthorized feedback.
f. Adhere to time limits, timed presentations, and timed delays.
3. Learn and practice a test to complete mastery before using it. Obtain qualified supervision when
using a new test.
4. If we "test the limits," explain what we did, why, and how. Make absolutely certain that results
of limit-testing cannot possibly be confused with valid scores. Do not test limits in ways that
will influence later items, e.g., by providing extra practice.
5. For students with severe or low-incidence disabilities, try to adopt appropriate tests rather than
adapt inappropriate ones.
6. Test students in their specific native language (e.g., Puerto Rican vs. Castilian Spanish or
American Sign Language vs. Signed English) with tests normed in those languages.
7. Consider consequences of taking subtests out of context.
B. Pay attention to test norms [we are responsible for using appropriate tests.]
1. Make sure that tests are normed on a genuinely representative sample:
a. Sufficiently large;
b. Appropriately stratified and randomized (or exhaustive);
i. sexes
ii. geographic regions
iii. racial and ethnic groups
iv. disabilities
v. income and educational levels
vi. other germane variables.
vii. interactions of these variables
c. National, international, or appropriate, clearly specified subgroup;
d. Truthfully and completely presented in test manual;
e. Recent; and
f. The appropriate test level for the student.
g. age-based vs. grade-based norms [we often need both for a complete understanding];
h. consider differences in norms for sexes, races, regions, incomes, and other variables.
2. When the best-normed test in existence for the purpose is not very well normed:
a. Look again.
b. In the report clearly explain the problems and the probable consequences.
C. Error in norms tables
1. Printed norms.
2. Computerized norms.
D. Princess Summerfallwinterspring
E. Be skeptical of publishers' claims.
A. Standard error of measurement (SEm).
1. Consistently use 90% (1.65 or 1 2/3 SEm) or 95% (1.96 or 2 SEm) confidence bands, even if it
is difficult for you.
2. Explain the meaning of the confidence band clearly.
3. Be certain we understand it ourselves.
4. Believe it; recognize that a test score was obtained once, at a specific time and place.
5. Recognize and explain that the confidence band does not include errors and problems with test
administration and conditions.
6. Distinguish between reliability and validity.
7. Distinguish between standard error of measurement (SEm) and standard error of estimate
8. Be sure to use the correct confidence band for the appropriate score: raw, W, standard score,
percentile rank, etc.
B. Determine (and worry about) how reliability data were obtained [we are responsible for using
appropriate tests].
1. How large were the samples? [They are often very small.]
2. Are there reliability data for students similar to the student being tested?
3. Are we using internal consistency measures to estimate test-retest reliability?
4. Are the time intervals comparable to those we are concerned with?
C. Cut-off scores.
D. Be skeptical of publishers' claims.
A. Validity for what purposes?
B. Validity for what groups?
C. Determine (and worry about) how validity data was obtained [we are responsible for using
appropriate tests.]
1. How large were the samples? [They are often very small.]
2. Are there validity data for students similar to the student being tested?
3. What were the criterion measures? Are we using a closed circle of validating tests against other
very similar tests?
4. Are the time intervals comparable to those we are concerned with?
D. Construct validity.
E. Standard error of estimate (SEest).
F. "Incremental validity."
G. Interpret tests only in ways for which validity has been established.
H. Keep a record of all test data for follow up establishing trends and understanding how the test
works in the local situation.
A. Use two straight-edges in the table, and if necessary, photocopy the norms table and draw lines and
circle numbers.
B. Check accuracy of tables by inspecting adjacent scores.
C. Read table titles and headings aloud while scoring.
D. Recheck all scores.
E. Check them again.
F. Get someone else to check them.
G. Score by both age and grade norms, if available, and compare the results.
H. Record the student's name and the data on all sheets of paper.
I. Check the student's birthdate and age with the student. Calculate the age correctly by the rules for
the particular test.
J. Make sure we (not just business managers) are on publishers' mailing lists.
K. Perform thought experiments with tables, e.g., What if the student had made two lucky or unlucky
guesses? What if the student were 6 months older or younger? etc.
L. Record all responses verbatim.
M. Keep raw data for future use.
N. Use consistent notations for correct and incorrect answers, no responses, "I don't know" responses
("I have no clue"), and examiner's questions. Make sure the examinee cannot determine which
notations you are making from the number or direction of pencil strokes.
0. Use protractors and templates consistently, carefully, and correctly. If uncertain, have someone
check your work.
P. Follow computer-scoring instructions slavishly, including the sequence in which you turn on the
CPU and peripheral equipment.
Q. Check accuracy of computer results by occasionally hand scoring.
R. Make sure you have the latest version of the scoring program, that you know of any new glitches
in it, and that you have the protocols that go with that version.
S. Understand and clearly explain differences among standard scores, scaled scores, normal curve
equivalents, percentile ranks, and other scores.
T. Use age-equivalent ("mental age") and grade-equivalent scores sparingly, if at all, explain them
and their myriad limitations clearly, and make sure they have some relationship to reality. Bracket
them with 90% or 95% confidence bands, just as you do standard scores.
A. What to use.
B. What not to use.
C. When to use them.
A. What to use.
B. What not to use.
C. When to use them.
A. What to use.
B. What not to use.
C. When to use them.
Assess all relevant skills with appropriate instruments or procedures.
Use tests with sufficient numbers of items.
Fill in gaps in skills assessed.
Follow up hypotheses.
A. Between what and what? Hope and experience?
B. Reality vs. test scores.
C. Co-normed tests and linked tests.
D. How to understand and make appropriate choices.
1. Simple difference.
2. Regressed difference.
3. Grade-equivalent differences
4. NCE differences.
A. Distinguish clearly between different tests, clusters, factors, subtests, and scores with similar titles.
1. e.g., "Reading Comprehension" is not the same skill on different reading tests.
2. e.g., "Processing Speed is not the same ability on different intelligence tests..
B. Explain with words and pictures all the statistics we use in our reports.
C. Explain differences between different statistics for different tests that we combine in our reports.
D. Explain the names (e.g., "Below Average") for the statistics we use in our reports.
E. Explain differences between different names for the same scores on various tests that we combine
in our reports.
F. Distinguish clearly between findings and implications.
G. Interpretation and recommendations require understanding of the disabilities and the programs, not
merely of the tests.
H. Identification of a disability is a reasoned, clinical judgment, not an exercise in arithmetic.
I. Offer specific, detailed recommendations and give a rationale for each.
J. Beware of fads in diagnoses and recommendations, both general and our own.
K. Eschew boilerplate.
L. Use computer reports to help interpret data and plan reports. Do not include or retype the actual
printouts in reports.
M. Remember that students' skills in related areas may differ dramatically and unexpectedly
N. Use tests that distinguish between skills rather than lumping them together. For example, a
combined reading score based on both oral reading and reading comprehension is about as useful as
a second handle on a shovel.
0. A full scale, composite, or total IQ score is not kismet.
P. If it happens, it must be possible.
Q. Resist pressures to lie.
R. Explain the mechanism of the disability.
1. For example, a specific learning disability is a disorder in one or more of the basic
psychological processes involved in understanding or in using language, spoken or written
which may manifest itself in an imperfect ability to listen, speak, think, read, write, spell, or do
mathematical calculations. So tell us about the student's disorder(s) .
2. Similarly, a serious emotional disturbance must be based on a psychological disorder. So
specify, define, and explain the disorder, not just the behaviors.
S. Report genuinely germane observations from test sessions, but be clear (in our own minds as well
as in our reports) that behavior in a test session may be unique to that test session and may never be
seen in any other context.
T. Pay attention to our own observations. If we cite the student's boredom or fatigue, we should not hit
the Autotext button for "Test results are assumed to be valid." Explain why we did not use more
and shorter test sessions. [Why didn't we?]
U. There is no such thing as a routine triennial reevaluation.
V. Consider practice effects when tests are re administered. Consider differential practice effects on
different subtests. How many WlSCs are too many?
W. Severity of an educational disability is measured by its impact on school functioning and
achievement, not by scores on diagnostic tests.
X. Evaluate the entire pattern of the student's abilities, not merely weaknesses.
Y. Revisit the verbatim record of the student's actual responses before accepting "canned"
interpretations from the manual, handbook, or computer printout. For instance, WISC-IV
Comprehension measures Social Studies achievement as well as "social comprehension," Picture
Completion almost never measures the "ability to distinguish essential from nonessential details,"
and young children can earn high scores for "verbal abstract reasoning" with entirely concrete
responses to Similarities.
Z. Base conclusions and recommendations on multiple sources of convergent data (not just test
All of this information is either taken or adapted from Table 2.1, pp. 32-41, in D. P. Flanagan, K. S. McGrew, &
S. O. Ortiz (2000), The Wechsler Intelligence Scales and Gf-Gc Theory: A Contemporary Approach to
Interpretation (Boston: Allyn & Bacon), which was slightly changed from Table 1-1, pp. 15-19, in K. S. McGrew,
& D. P. Flanagan (1998), The Intelligence Test Desk Reference (ITDR): Gf-Gc Cross-Battery Assessment
(Boston: Allyn & Bacon). "Most all definitions were derived from Carroll [J. B. Carroll (1993) Human Cognitive
Abilities: A Survey of Factor-Analytic Studies (Cambridge, Eng.: Cambridge University Press)]. Two-letter factor
codes (e.g., RG) are from Carroll (1993a). Information in this table was adapted from McGrew [K. S. McGrew
(1997), Analysis of the major intelligence batteries according to a proposed comprehensive Gf-Gc framework in
D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (1997), Contemporary Intellectual Assessment: Theories Tests,
and Issues (New York: The Guilford Press)] with permission from Guilford Press" (Flanagan, McGrew, & Ortiz,
2000, p. 41). For the most recent information, please see D. P. Flanagan, S. O. Ortiz, and V. Alfonso (2007),
Essentials of Cross-Battery Assessment (2nd ed.) (New York: Wiley & Sons) D. P. Flanagan, S. O. Ortiz, V.
Alfonso, & J. T. Mascolo (2006). Achievement test desk reference (ATDR-II): A guide to learning disability
identification (2nd ed.) (New York, NY: Wiley)., and N. Mather and L. Jaffe (2002), Woodcock-Johnson III:
Recommendations, reports, and strategies (New York, NY: Wiley). This information is provided only as an
illustrative guide to information in the publications cited above and cannot stand alone, so do not copy this
information. You will need the explanations and worksheets in at least one of the cited references. Not all of
these classifications may agree with those in the cited sources, so – again – it is absolutely necessary to use one or
more of those sources.
Fluid Intelligence Gf
[Note that this is reasoning,
not math knowledge, Gq.]
Comprehension/Knowledge Gc
General Sequential Reasoning (RG) [deduction]
KAIT Logical Steps
WJ III Analysis-Synthesis
LIPS-R Picture Context
LIPS-R Visual Coding
UNIT Cube Design
Induction (I)
Raven's Progressive Matrices
WAIS, WISC Matrix Reasoning
DAS Matrices
DAS Picture Similarities
DAS Sequential and Quantitative Reasoning (also RQ)
KAIT Mystery Codes
WJ III Concept Formation
WJ III Number Matrices
LIPS-R Classification
LIPS-R Design Analogies
LIPS-R Repeated Patterns
LIPS-R Sequential Order
RIAS Odd-Item Out
Quantitative Reasoning (RQ)
DAS Sequential & Quantitative Reasoning (also I)
Piagetian Reasoning (RP)
Speed of Reasoning (RE)
Language Development (LD)
OWLS Oral Expression
WISC/WAIS/WPPSI Comprehension (also K0)
WISC/WAIS/WPPSI Similarities (also VL)
DAS Similarities
DAS Verbal Comprehension (also LS)
RIAS Guess What (also K0 and VL)
RIAS Verbal Reasoning (also VL)
RIAS Verbal Memory (also Gsm MS)
Comprehension/Knowledge Gc
[VL emphasizes vocabulary
over broader language
Visual Processing Gv
WJ III Memory for Sentences (also Gsm MS)
WRAML Story Memory (also Glr MM)
Lexical Knowledge (VL)
WISC/WAIS/WPPSI Vocabulary (also LD)
DAS Word Definitions (also LD)
DAS Naming Vocabulary (also LD)
WJ III Picture Vocabulary (also K0)
WJ III Verbal Comprehension (also LD)
RIAS Guess What (also K0 and LD)
RIAS Verbal Reasoning
Listening Ability (LS)
OWLS Listening Comprehension
WJ III Oral Comprehension
NEPSY Comprehension of Instructions
General (verbal) Information (K0)
WJ III General Information
WISC/WAIS/WPPSI Comprehension (also LD)
WISC/WAIS/WPPSI Picture Completion (also Gv CF)
RIAS What's Missing (also Gv CF)
PPVT-III (also VL and LD)
WJ III Picture Vocabulary (also VL)
RIAS Guess What (also LD and VL)
PPVT-III (also VL and LD)
WJ III Picture Vocabulary (also VL)
Information about Culture (K2)
KAIT Famous Faces
K-ABC Faces and Places
WJ-R Humanities
General Science Information (K1)
WJ-R Science
Geography Achievement (A5)
WJ-R Social Studies
Communication Ability (CM)
Oral Production and Fluency (OP)
Grammatical Sensitivity (MY)
Foreign Language Proficiency (KL)
Foreign Language Aptitude (LA)
CHC Theory and Cross Battery Knowledge (CHC CBA)
Spatial Relations (SR)
WISC/WAIS/WPPSI Block Design (also VZ)
DAS Pattern Construction
LIPS-R Figure Rotation (also VZ)
WRAML Visual Learning (also MV & Gsm MS)
WJ III Spatial Relations (also VZ)
WISC/WAIS Object Assembly (also CS)
Visual Memory (MV)
WRAML Finger Windows (also Gsm MV)
WRAML Visual Learning (also SR & Gsm MS)
DAS Recall of Designs
DAS Recognition of Pictures
KAIT Memory for Block Designs
WJ III Picture Recognition
[The current versions of CHC theory
do not handle drawing tests very well.
Carroll (1993) did not have many, if
any drawing tests (e.g., VMI, Bender)
in his massive data set.]
Auditory Processing Ga
[Current versions of CHC
theory need more sophistication in dealing with
phonological awareness.]
LIPS-R Immediate Recognition
LIPS-R Forward memory
WRAML Picture Memory
WRAML Design Memory
RIAS Nonverbal Memory
Visualization (Vz)
WPPSI-R Geometric Design (also P2)
LIPS-R Form Completion (also SR)
LIPS-R Matching
LIPS-R Paper Folding
LIPS-R Figure Rotation (also SR)
WISC/WAIS/WPPSI Block Design (also SR)
VMI-5 (also P2?)
Bender Visual-Motor Gestalt (also P2?)
DAS Block Building
DAS Matching Letter-Like Forms
WJ III Spatial Relations (also SR)
WJ III Block Rotation
Closure Speed (CS)
WPPSI/WAIS Object Assembly (also SR)
WJ III Visual Closure
Flexibility of Closure (CF)
LIPS-R Figure-Ground
RIAS What's Missing (also Gc K0)
WISC/WAIS/WPPSI Picture Completion (also Gc K0)
Spatial Scanning (SS)
WISC-III PI Elithorn Mazes
Porteus Mazes
Serial Perceptual Integration (PI)
K-ABC Magic Window
Length Estimation (LE)
Perceptual Illusions (IL)
Perceptual Alternations (PN)
Imagery (IM)
Phonetic Coding: Analysis (PC:A)
GFW Auditory Discrimination
GFTA Test of Articulation
WJ III Incomplete Words
WJ III Sound Awareness
Phonetic Coding: Synthesis (PC:S)
WJ III Sound Blending
Speech Sound Discrimination (US)
WJ-III Auditory Attention (also U3)
WJ III Sound Patterns - Voice (also U3)
Wepman Auditory Discrimination Test
Resistance to Auditory Stimulus Distortion (UR)
WJ III Auditory Attention and GFW Selective Attention
Memory for Sound Patterns (UM)
General Sound Discrimination (U3)
WJ-III Auditory Attention (also US)
Temporal Tracking (UK)
Musical Discrimination & Judgment (U1, U9)
WJ III Sound Patterns - Music
Seashore Music Appreciation Test
Maintaining and Judging Rhythm (U8)
Sound-Intensity/Duration Discrimination (U6)
Sound-Frequency Discrimination (U5)
Hearing & Speech Threshold Factors (UA,UT,UU)
Absolute Pitch (UP)
Sound Localization (UL)
Short-Term Memory Gsm
Memory Span (MS)
DAS Recall of Digits
K-ABC Number Recall
K-ABC Word Order
WJ III Memory for Words
WRAML Number-Letter Memory
WJ III Memory for Sentences (also Gc LD)
WRAML Sentence Memory (also Gc LD)
WRAML Finger Windows (also Gv MV)
WRAML Visual Learning (also Gv SR & MV)
RIAS Verbal Memory (also Gc LD)
Working Memory (MW)
WISC/WAIS Letter-Number Sequencing
WJ III Numbers Reversed
WJ III Auditory Working Memory
Learning Abilities (L1)
Long-Term Storage and
Retrieval Glr
Associative Memory (MA)
KAIT Rebus Learning and Delayed Recall
WJ III Memory for Names and Delayed Recall
WJ III Visual-Auditory Learning (also MM)
WJ III Visual-Auditory Learning Delayed (also MM)
LIPS-R Delayed Recognition
LIPS-R Associated Pairs (also MM)
LIPS-R Delayed Pairs (also MM)
WRAML Sound-Symbol
Meaningful Memory (MM)
WRAML Story Memory (also Gc LS)
WJ III Story Memory and Delayed Recall
WMS-III Logical Memory II
CMS Stories 2
WJ III Visual-Auditory Learning (also MA)
WJ III Visual-Auditory Learning Delayed (also MA)
LIPS-R Associated Pairs (also MA)
LIPS-R Delayed Pairs (also MA)
Free Recall Memory (M6)
DAS Recall of Objects
WRAML Verbal Learning
Ideational Fluency (FI)
WJ-III Retrieval Fluency
Associational Fluency (FA)
Expressional Fluency (FE)
Naming Facility (NA)
WJ-III Rapid Picture Naming
[Glr involves the ability to store and
retrieve information, not the amount
of information that is stored (Gc).]
[Glr tasks involve controlled
learning. General information
tests, for example, cannot distinguish
information that was forgotten from
information thsat was never known
at all.]
Word Fluency (FW)
Figural Fluency (FF)
NEPSY Design Fluency
Figural Flexibility (FX)
Sensitivity to Problems (SP)
Originality/Creativity (FO)
Learning Abilities (L1)
Processing Speed Gs
Perceptual Speed (P)
WISC/WAIS Symbol Search (also R9)
WJ III Visual Matching (also R9)
WJ III Cross Out
LIPS-R Attention Sustained
Rate of Test Taking (R9)
WISC/WAIS Digit Symbol-Coding
WJ III Pair Cancellation
WISC Symbol Search (also P)
WJ III Visual Matching (also P)
DAS Speed of Information Processing (also Gt R7)
Number Facility (N)
Decision/Reaction Time
or Speed Gt
Simple Reaction Time (R1)
Choice Reaction Time (R2)
Semantic Processing Speed (R4)
Mental Comparison Speed (R7)
DAS Speed of Information Processing (also R9)
WJ III Decision Speed
Speed of Making Errors (RXXXXXX)
Quantitative Knowledge Gq
Mathematical Knowledge (KM)
Mathematical Achievement (A3)
Reading/Writing Grw
Reading Decoding (RD)
Reading Comprehension (RC)
Verbal (printed) Language Comprehension (V)
Cloze Ability (CZ)
Spelling Ability (SG)
Writing Ability (WA)
English Usage Knowledge (EU)
Reading Speed (RS)
Please see the attached grids showing characteristics of some commonly used achievement tests. The performance
characteristics of achievement tests are especially important. For example, an untimed reading comprehension
test with only one or two sentences per item may give very different results from a timed reading test with several
sentences per item, and a test of writing a story may yield a very different impression than one of writing
individual sentences.
Flanagan, D. P., Ortiz, S. O., Alfonso, V. & Mascolo, J. T. (2006). Achievement test desk reference
(ATDR-II): A guide to learning disability identification (2nd ed.). Hoboken, NJ: Wiley. A serious treatment of
achievement tests from the standpoint of CHC theory (a computerized version of the book's CHC test
classifications available for download at: and at
Alessi, G. (1988). Diagnosis diagnosed: A systematic reaction. Professional School Psychology, 3 (2), 145-151. A
provocative and important article. Why do we diagnose children as LD when there are other factors that should be
explored? Examiners readily acknowledge in theory, but almost never cite in practice such causes of underachievement as
poor instruction, defective school management policies, or inadequate curriculum.
Bracken, B. A. (1988). Ten psychometric reasons why similar tests produce dissimilar results. Journal of School Psychology,
26, (2), 155-166. If you ever administer more than one test to a child you have discovered that not all tests give the same
results. Dr. Bracken explains 10 of the most common reasons for such dissimilar results.
Breaux, K. C., & Frey, F. E. (2009) Assessing Writing Skills Using Correct–Incorrect Word Sequences: A National Study.
Poster Session, National Association of School Psychologists Conference Retrieved from
Brunnert, K. A., Naglieri, J. A., & Hardy-Braz, S. T. (2008). Essentials of WNV assessment. Hoboken, NJ: Wiley.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, Eng: Cambridge University
Carroll, J. B. (1997). The three-stratum theory of cognitive abilities. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison
(Eds.), Contemporary intellectual assessment (ch. 7, pp. 122-130). New York, NY: Guilford Press.
Cheramie, G. M., Goodman, B. J., Santos, V. T., & Webb, E. T. (2007) Teacher perceptions of psychological reports submitted
for emotional disturbance eligibility. Journal of Education and Human Development, 1(2). Article accessed through
Cheramie, G. M., Parks, L., & Schuler, A. (2011). Math problem-solving: Applying a processing model to LD determination.
In N. Mather & L. E. Jaffe (Eds.) Comprehensive evaluations: Case reports for psychologists, diagnosticians, and special
educators (pp. 356-371). New York, NY: Wiley.
Cheramie, G. M., Stafford, M. E., & Mire, S. S. (2008) The WISC-IV General Ability Index in a non-clinical sample. Journal
of Education and Human Development, 2(2). Article accessed through
Cohen, S. A. (1971). Dyspedagogia as a cause of reading retardation: Definition and treatment. In B. Bateman (Ed.), Learning
Disorders (Vol. 4, pp. 269-291). Seattle, WA: Special Child.
Dehn, M. J. (2006). Essentials of processing assessment. New York: Wiley.
Dumont, R., Willis, J. O., & Elliott, C. D. (2008). Essentials of DAS-II assessment. Hoboken, NJ: Wiley.
Dumont, R., Willis, J, & McBride, G. (2001). Yes, Virginia, there is a severe discrepancy clause, but is it too much ado about
something? The School Psychologist, APA Division of School Psychology, 55 (1), 1, 4-13, 15.
Elliott, C. D. (2007). Differential Ability Scales 2nd edition introductory and technical handbook. San Antonio: The
Psychological Corporation. A valuable text on cognitive assessment in general, not only the DAS-II. There is a brief
appendix with a clear, concise, helpful explanation of Item Response Theory.
Embretson, S. E., & Hershberger, S. L. (Eds.) (1999). The new rules of measurement: What every psychologist and educator
should know. Mahwah, NJ: Lawrence Erlbaum.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Farrall, M. L. (2012). Reading assessment: Linking language, literacy, and cognition. New York, NY: Wiley. The definitive
text on reading assessment. Don't assess reading without it.
Fiorello, C. A., Hale, J. B., McGrath, M., Ryan, K., & Quinn, S. (2002). IQ interpretation for children with flat and variable test
profiles. Learning and Individual Differences, 13, 115 – 125.
Flanagan, D. P. (2001). Assessment Service Bulletin Number 1: Comparative features of major intelligence batteries: Content,
administration, technical features, interpretation, and theory. Itasca, IL: Riverside Publishing.
Flanagan, D. P., & Harrison, P. L. (Eds.) (2012). Contemporary intellectual assessment third edition: Theories, tests and
issues. New York: Guilford Press. Thirty-six chapters of authoritative theoretical and practical information on the history
of intelligence assessment; CHC (Gf-Gc), PASS, triarchic successful intelligence, and multiple-intelligences theories;
newest Wechsler scales, Differential Ability Scales 2nd ed., Kaufman Assessment Battery for Children 2nd ed., StanfordBinet 5th ed., Woodcock-Johnson III Normative Update, Das-Naglieri Cognitive Assessment System, Universal Nonverbal
Intelligence Test, Stanford-Binet Intelligence Scale 5th ed., Reynolds Intellectual Assessment Scales, and NEPSY-II;
contemporary interpretive approaches and their relevance for intervention; cognitive assessment in different populations;
and contemporary and emerging issues. Many of the chapters are written by the authors of the tests and developers of the
Flanagan, D. P., & Kaufman, A. S. (2009). Essentials of WISC-IV assessment (2nd ed.). Hoboken, NJ: Wiley.
Flanagan, D. P., Ortiz, S. O., & Alfonso, V. (2007). Essentials of cross-battery assessment (2nd ed.). Hoboken, NJ: Wiley.
See also The word on the McGrew, Flanagan, and Ortiz integrated cross-battery method
and on practical application of CHC theory to assessment
Flanagan, D. P., Ortiz, S. O., Alfonso, V. & Mascolo, J. T. (2006). Achievement test desk reference (ATDR-II): A guide to
learning disability identification (2nd ed.). New York, NY: Wiley. A serious treatment of achievement tests from the
standpoint of CHC theory (a computerized version of the book's CHC test classifications available for download at: and at
Floyd, R. (2002). The Cattell-Horn-Carroll (CHC) Cross-Battery Approach: Recommendations for school psychologists.
Communiqué, 30 (5), 10-14.
Floyd, R. G., Evans, J. J., & McGrew, K. S. (2003). Relations between measures of Cattell-Horn-Carroll (CHC) cognitive
abilities and mathematics achievement across the school-age years. Psychology in the Schools, 60 (2), 155-171.
Goldman, J. J. (1989). On the robustness of psychological test instrumentation: Psychological
evaluation of the dead. In Glenn G. Ellenbogen (Ed.) The Primal Whimper: More Readings from the Journal of
Polymorphous Perversity. New York: Ballantine, Stonesong Press. Essential information on testing the dead. See
Hale, J. B., & Fiorello, C. A. (2001). Beyond the academic rhetoric of 'g': Intelligence testing guidelines for practitioners. The
School Psychologist, 55, 113-117, 131-135, 138-139.
Hale, J. B., & Fiorello, C. A. (2004). School neuropsychology: A practitioner's handbook. New York: The Guilford Press.
Readable and valuable for evaluators who are not neuropsychologists or neurosurgeons.
Hale, J. B., Hoeppner, J. B., & Fiorello, C. A. (2002). Analyzing Digit Span components to assessment of attention processes.
Journal of Psychoeducational Assessment, 20 (2), 128-143.
Horn, J. L., & Blankson, N. (2005). Foundation for better understanding of cognitive Abilities. In D. P. Flanagan, & P. L.
Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (2nd ed.) (pp. 41-68). New York:
Guilford Press.
International Reading Association (1982). Misuse of grade equivalents: resolution passed by the Delegates Assembly of the
International Reading Association, April 1981. Reading Teacher, January, p. 464.
Jaffe, L. E. (2009). Development, interpretation, and application of the W score and the relative proficiency index (WoodcockJohnson III Assessment Service Bulletin No. 11). Rolling Meadows, IL: Riverside Publishing. Retrieved from
Kaufman, A. S. (2009). IQ Testing 101. New York: Springer Publishing. A wonderfully accessible, but informative
review of the history and practice of intelligence testing. Recommended for both evaluators and their spouses
who work in other fields of endeavor.
Kaufman, A. S., & Kaufman, N. L. (Eds.) (2001). Specific learning disabilities and difficulties in children and adolescents:
Psychological assessment and evaluation. Cambridge, Eng.: Cambridge University Press..
Kaufman, A. S., Lichtenberger, E. O., Fletcher-Janzen, & Kaufman, N. L. (2005). Essentials of KABC-II assessment.
Hoboken, NJ: Wiley.
Lichtenberger, E. O., Mather, N., Kaufman, N. L., & Kaufman, A. S. (2004). Essentials of assessment report writing. New
York, NY: Wiley.
Lichtenberger, E. O., & Breaux, K.C. (2010). Essentials of WIAT-III and KTEA-II assessment. New York, NY: Wiley.
Lichtenberger, E. O., & Kaufman, A. S. (2009). Essentials of WAIS-IV Assessment. New York, NY: Wiley.
Lyman, H. B. (1997). Test scores and what they mean (6th ed.). Boston: Allyn and Bacon. Tremendously valuable, tightly
focused, very readable discussion of the statistics used in reporting test scores. Highly recommended.
Mather, N., & Jaffe, L. E. (2004). Woodcock-Johnson III: Reports, recommendations, and strategies (with CD). New York,
NY: Wiley. An extraordinarily thorough and helpful treatment of the WJ III, including many very useful forms, sample
reports, 158 pages of specific, practical recommendations, and 85 pages of detailed explanations of teaching strategies,
which evaluators can (and should) use in their reports. Even if you never use the WJ III, this book is extremely useful with
any assessment interpretation.
Mather, N., & Jaffe, L. E. (Eds.) (2010). Comprehensive evaluations: Case reports for psychologists, diagnosticians, and
special educators. New York, NY: Wiley. Fifty-eight sample evaluation reports with commentaries by the authors
addressing a wide variety of disabilities and referral concerns. An invaluable resource for anyone writing psychological,
neuropsychological, or educational evaluations.
Mather, N., Wendling, B. J., & Roberts, R. (2nd ed.) (2009). Writing assessment and instruction for students with learning
disabilities (2nd ed.). San Francisco: Jossey-Bass. Don't even think of testing or teaching writing skills without this book,
which contains detailed, specific, helpful information and advice and 210 figures and exhibits of student writing samples
with commentary.
Mather, N., Wendling, B. J., & Woodcock, R. W. (2001). Essentials of WJ III Tests of Achievement testing. New York, NY:
McBride, G. M., Dumont, R., & Willis, J. O. (2004). Response to response to intervention legislation: The future for school
psychologists. The School Psychologist, 58, 3, 86-91, 93.
McBride, G. M., Dumont, R., & Willis, J. O. (2011). Essentials of IDEA for assessment professionals. New York: Wiley.
Don't get sued without it.
McCallum, S., Bracken, B., & Wasserman, J. (2001). Essentials of nonverbal assessment. Hoboken, NJ: Wiley. General
principles, UNIT, LIPS-R, and other instruments. Includes interpretive worksheets.
McGrew, K. S., Flanagan, D. P., Keith, T. Z., & Vanderwood, M. (1997). Beyond g: The impact of Gf-Gc specific cognitive
abilities research on the future use and interpretation of intelligence tests in the schools. School Psychology Review, 26,
National Research Council (2002). Mental retardation: Determining eligibility for Social Security benefits. Committee on
Disability Determination for Mental Retardation. Daniel J. Reschly, Tracy G. Myers, & Christine R. Hartel, editors.
Division of Behavioral and Social Sciences and Education. Washington, D.C.: National Academy Press. Free at
Despite the apparently narrow focus of the title, this volume is an excellent compendium of recent information about the
diagnosis of mental retardation, produced by the 16-member Committee of the National Research Council. There are
informative chapters on intellectual assessment, adaptive behavior assessment, the relationship of intelligence and adaptive
behavior, and differential diagnosis. Highly recommended for evaluators and teams making identifications of mental
retardation or differential identifications of mental retardation and specific learning disabilities.
O'Neill, A. M. (1995). Clinical inference: How to draw meaningful conclusions from tests. New York, NY: Wiley. A
valuable antidote to the increasing mechanization and depersonalization of assessment. Highly recommended.
Ortiz, S. O., & Flanagan, D. P. (2002a). Cross-Battery Assessment revisited: Some cautions concerning "Some Cautions"
(Part I). Communiqué, 30 (7), 32-34. See Floyd (2002), Ortiz & Flanagan (2002b), Watkins, Glutting, & Youngstrom
(2002) and Watkins, Youngstrom, & Glutting (2002).
Ortiz, S. O., & Flanagan, D. P. (2002b). Cross-Battery Assessment revisited: Some cautions concerning "Some Cautions"
(Part II). Communiqué, 30 (8), 36-38. See above.
Pierce, A., Miller, G, Arden, R., & Gottfredson, L. S. (2009). Why is intelligence correlated with semen quality? Biochemical
pathways common to sperm and neuron function, and their vulnerability to pleiotropic mutations, Communicative &
Integrative Biology, 2(5), 385-387.
Prifitera, A., Saklofske, D. H., & Weiss, L. G. (Eds). (2008). WISC-IV: Clinical assessment and intervention 2e. Burlington,
MA: Academic Press (Elsevier). Tons of good information, including norms for the Cognitive Proficiency Index (CPI) as
well as the General Ability Index (GAI).
Psychological Corporation (2010). WIAT®–III Essay Composition: “Quick Score” for Theme Development and Text
Organization. Retrieved from Good example of the importance of frequently revisiting Web pages for all the tests we use.
Raiford, S. E., Weiss, L. G, Rolfhus, E., & Coalson, D. (2005/2008). General Ability Index. WISC-IV Technical Report #4
(updated December 2008). San Antonio, TX: Pearson Education. Retrieved September 28, 2009 from
Reynolds, C. R. (1997). Forward and backward memory span should not be combined for clinical analysis. Archives of
Clinical Neuropsychology, 12, 29-40.
Reynolds, C. R., & Fletcher-Janzen, E. (Eds.) (2007). Encyclopedia of Special Education (3rd ed.). New York, NY: Wiley.
Everything you'd want to know.
Roid, G. H., & Barram, R. A. (2004). Essentials of Stanford-Binet Intelligence Scales (SB5) assessment. New York, NY:
Sattler, J. M. (2008). Assessment of children: Cognitive foundations (5th ed.) San Diego: Jerome M. Sattler, Publisher. THE
book on assessment.
Sattler, J. M., & Dumont, R. P. (2004). Assessment of children: WISC-IV and WPPSI-III supplement. San Diego, CA: Jerome
M. Sattler, Publisher.
Sattler, J. M. & Hoge, R. (2006). Assessment of children: Behavioral, social and clinical foundations (5th ed.) San Diego:
Jerome M. Sattler, Publisher. Comprehensive treatment of educational and psychological assessment of children
emphasizing particular disabilities, behavior assessment, and other clinical topics. Essential reference, along with the
companion Cognitive Foundations. The valuable appendices include many extremely helpful semistructured interviews.
Sattler, J. M., & Ryan, J. J. (2009). Assessment with the WAIS-IV. San Diego, CA: Jerome M. Sattler, Publisher.
Schrank, F. A., & Flanagan, D. P. (2003). WJ III clinical use and interpretation: Scientist-practitioner perspectives. New
York: Academic Press.
Schrank, F. A., Flanagan, D. P., Woodcock, R. W., & Mascolo, J. T. (2001). Essentials of WJ III cognitive abilities
assessment. New York, NY: Wiley.
Schultz, M. K. (1988). A comparison of Standard Scores for commonly used tests of early reading. Communiqué, 17 (4), 13.
Specific data are obsolete, but the issue remains fresh as the morning dew.
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of
literacy. Reading Research Quarterly, 21, 360-407. Important paper frequently misquoted and misinterpreted.
Stanovich, K. E. (2009). What intelligence tests miss: The psychology of rational thought. New Haven: Yale University
Press. Provocative and only slightly unfair critique. Worth reading.
Steingart, S. K. (2004). The web-connected school psychologist: The busy person's guide to school psychology on the Internet
(2nd ed.). Longmont, CO: Sopris West. Two versions: inexpensive with a really valuable CD and extremely inexpensive
without the CD. Companions to Sandy's wonderful Web site, all of which we highly
recommend. In addition to a huge, tremendously useful collection of thoughtfully categorized and helpfully annotated
Internet addresses for resources relevant to school psychology, the book includes a clear, detailed, practical tutorial for
taking advantage of the Internet's resources and a glossary. Don't go on line without it.
Sternberg, R. J. & Kaufman, S. B. (Eds.) (2011). The Cambridge handbook of intelligence.. New York, NY: Cambridge
University Press.
Watkins, M. W., Glutting, J., & Youngstrom. E. (2002). Cross-battery cognitive assessment: Still concerned. Communiqué, 31
(2), 42-44. See Floyd (2002), Ortiz & Flanagan (2002a, 2002b), and Watkins, Youngstrom, & Glutting (2002).
Watkins, M. W., Youngstrom, E. A., & Glutting, J. J. (2002). Some cautions regarding Cross-Battery Assessment.
Communiqué, 30 (5), 16-20. See Floyd (2002), Ortiz & Flanagan (2002a, 2002b), and Watkins, Glutting, & Youngstrom
Weiss, L. G., Saklofske, D. H., Prifitera, A., & Holdnack, J. A. (Eds.) (2006). WISC-IV: Advanced clinical
interpretation. Burlington, MA: Academic Press (Elsevier).
Weiss, L. G., Saklofske, D. H., Coalson, D., & Raiford, S. E. (Eds.) (2010). WAIS-IV: Clinical use and interpretation.
Burlington, MA: Academic Press (Elsevier). Includes norms for the Cognitive Proficiency Index (CPI) to contrast
with the General Ability Index (GAI).
Wendling, B. J., & Mather, N. (2009). Essentials of evidence-based academic interventions. New York: Wiley.
Willis, J. O. & Dumont, R. P. (2002). Guide to identification of learning disabilities (3rd ed.). Available (paper) from
[email protected] and (CD) from [email protected]
Willis, J. O., & Dumont, R. (2006). And never the twain shall meet: Can response to intervention and cognitive assessment be
reconciled? Psychology in the Schools, 43, 8, 901-908.
Willis, J. O., Dumont, R., & Kaufman, A. S. (2011). Factor analytic models of intelligence. In R. J. Sternberg & S. B. Kaufman
(Eds.), The Cambridge handbook of intelligence (pp. 39-57). New York, NY: Cambridge University Press.
Zirkel, P. A. Legal Eligibility of students with learning disabilities: Consider not only RTI but also §504. Learning
Disability Quarterly, 32(2), 51-53. Article Stable URL:
1. Is there a problem with academic performance? Problems may be subtle or difficult to document, but if
there are no academic problems at all, there is no educational disability. [A problem with an important life
function other than academic performance might trigger an identification under Section 504 of P.L. 93-112 or
the Americans with Disabilities Act (ADA).] Pay close attention to reports of problems that do not cause low
marks even though they interfere with learning. For example, the teacher might already be providing an
informal program of special education; marks might be based 25% on attendance, 50% on simply turning in
homework regardless of quality; and 25% on class participation; or marks might be based on an erroneous
perception of the student's academic potential.
Does the student have low scores on group or individual achievement tests?
1. Look at any history of test scores. [You may want to use the attached forms for recording past test
scores.] Be cautious, though, with tests that are used so frequently that the expected growth from test
to retest is less than the 90% confidence band or even the SEm. Check the tables.
2. Look at the pattern of strengths and weaknesses on the test scores. Some group tests offer item
analyses. Even though the norm-referenced tests do not function well as criterion-referenced
measures, those analyses may contain useful information.
Is the student receiving low or failing marks in a class?
1. Again, track the history of class marks. [You may want to use the attached form for recording past
2. Try to determine the basis for the student's marks. High marks might be based on special marking
Is the student working much too hard or much too long to earn adequate marks?
1. Parents may be the best source of this information. A parent interview is essential. We need to know
also what the parents would like to learn from the evaluation.
2. Be sure to interview the student. Sometimes it helps to obtain a copy of the report card and discuss it
in detail with the student. What does the student want to learn from the evaluation?
Is the teacher making extraordinary adaptations or accommodations for the student?
1. Teacher interviews are essential. We need to know what has been done, what is being done, how well
those interventions have worked and are working, and what specific things the teachers would like to
learn from the evaluation.
2. The classroom observation is often more useful for observing the teaching and the environment than
for observing the student.
Is there a notably deficient specific area of performance (e.g., tests, homework, note-taking, etc.)?
1. Is there another indication of insufficient academic performance?
2. Has a Problem Solving Model been instituted that involved data gathering relative to the students ability to
respond to scientific, researched based interventions? The law has moved away from using discrepancy
models to identify children with specific learning disabilities. The school is not required to determine if the
child has a severe discrepancy between achievement and intellectual ability to find that the child has a specific
learning disability and needs special education services. The school may use response to intervention to
determine if the child responds to scientific, research-based intervention as part of the evaluation process.
(Section 1414(b)(6))
1. Have the interventions been validated as being scientifically, researched based?
2. Did the interventions use rigorous data analyses that were adequate to test the stated hypotheses and
justify the general conclusions drawn
3. Did the interventions employ empirical methods that draw on observation or experiments
4. Did the interventions use objective procedures that relied on measurements or observational methods
that provide reliable and valid data across evaluators and observers, across multiple measurements and
observations, and across studies by the same or different investigators
Were the interventions evaluated and accepted by peer-review journals or approved panels of
independent experts
3. Are there one or more disorders in basic psychological processes involved in understanding or in using
language, spoken or written? [See] This step
follows next in a logical sequence, but determination of any disorder(s) may not be clear until completion of
psychological, educational, speech and language, occupational therapy, physical therapy, vision, hearing or
other evaluations. There should be multiple, convergent confirmations of any disorders.
Can each disorder be observed or inferred from academic performance?
1. Again, we need to consider all aspects and all measures of academic performance.
2. We are looking for possible cause-and-effect relationships between basic processes and academic
performance. There needs to be a real-life connection between our hypotheses and what is actually
happening with the student's performance in school.
Can each disorder be documented through assessment?
1. Once we have documented the deficient achievement and are looking for possible reasons, it becomes
more permissible to use poorly normed and completely informal measures and observations. Formal
assessment of ability and achievement levels needs to be done, at least in part, with extremely wellnormed, reliable instruments that are valid for their intended purposes, but exploring within the area of
deficient achievement may (and sometimes, given the state of the art, must) be done with less
statistical rigor. The disorders need to be demonstrated clearly, reliably, and convincingly, but not
always as test scores. The severity of a learning disability is measured by the severity of its impact on
achievement, not by the severity of any basic-process disorder.
The McGrew, Flanagan, and Ortiz Integrated Cattell-Horn-Carroll CHC Cross-Battery Approach is a
very useful framework for considering many, though not all, basic-process disorders [See
3. Can the Team make a logical argument that each identified disorder manifests itself in an imperfect
ability to listen, think, speak, read, write, spell, or do mathematical calculations?
It is not enough
simply to specify deficient achievement and a disorder. There needs to be a logical, cause-and-effect
relationship between the two.
A. As noted above, we need to demonstrate how the purported basic-process disorder is impairing the
carefully documented achievement area. This demonstration will require a thorough analysis of the
student's academic skills. A low test score or low class mark is not enough. We need to show the
mechanisms operating in the deficient achievement area(s). Examples of misaligned math problems
worked left-to-right and bottom-to-top might, for instance, demonstrate the effect of a visual perception
problem on math. The assumption that a visual perception problem impaired listening comprehension
might be more difficult to justify unless, for example, we could show how deficient visual imagery was
interfering with the listening comprehension.
B. Research evidence can be cited to show relationships between certain basic processes (e.g., phonological
abilities or rapid naming) and certain areas of achievement (e.g., reading decoding). [See for some examples.]
C. Some clearly identifiable disorders have no discernable effect on achievement. Simply finding a disorder
does not establish a learning disability (e.g., JOW's severe rhythm disorder greatly impairs his singing,
dancing, and clapping in time to music, but the effect on academic achievement is trivial, only diminishing
his appreciation of poetry).
D. It is the disorder in the basic psychological process that distinguishes a specific learning disability from the
disabilities and disadvantages ruled out in the federal regulations [(300.7(c)(10)] for learning disabilities
(". . .learning problems that are primarily the result of visual, hearing, or motor disabilities, of mental
retardation, of emotional disturbance, or of environmental, cultural, or economic disadvantage").
E. It is essential, as much as possible, to distinguish learning disabilities from dyspedagogia and apedagogia
[300.541(1) "The child does not achieve commensurate with his or her age and ability levels in one or
more of the areas listed in paragraph (a)(2) of this section, if provided with learning experiences
appropriate for the child's age and ability levels"].
[Note that Steps 4 through 6 involve determination of Learning Disabilities based upon the “severe
discrepancy model.” If the determination is being made without examining aspects of severe discrepancy,
go to step 7.]
4. What is the best estimate of the student’s actual intellectual ability? See Mark 4:25. The Team must not
allow a disorder to depress estimates of both intelligence and achievement and then mind-lessly conclude
there is no discrepancy between the two. For example, verbal and visual/spatial learning disabilities,
respectively, will depress verbal (Gc) and visual, spatial (Gv) intelligence measures. For another example, a
disorder in quantitative knowledge (Gq) would depress the WISC Arithmetic and Verbal IQ scores and DAS
Sequential & Quantitative and Nonverbal (fluid) Scale scores. Logically, the intelligence test should be chosen
only after the basic-process disorders have been delineated. The McGrew, Flanagan, and Ortiz Integrated
Cattell-Horn-Carroll CHC Cross-Battery Approach can be a very useful framework for considering
intellectual abilities [ See].
A. Which scales, factors, or subtests on intelligence tests are likely to be depressed by the disorder or
B. Which intelligence test, scales, or factors would be likely to yield an estimate of actual intellectual ability
uncontaminated by the disorder or disorders?
C. What is the best estimate of the student’s actual intellectual ability based on those measures?
D. Have we considered at least all of the broad abilities in the McGrew, Flanagan, and Ortiz Integrated
Cattell-Horn-Carroll CHC theory? It is not prudent, for example, to use a test, such as the WISC-III, that
omits fluid reasoning unless we supplement it with a measure of that ability.
5. Is there a severe discrepancy between the student’s level of intellectual ability (4. C.) and the student’s
achievement in one or more of the following areas? Remember that achievement and even ability may be
assessed by means other than test scores (1. B. – 1. F.). Maintain a bias in favor of reality. Achievement tests
must be chosen thoughtfully. For example, a very brief achievement test is not a valid measure of academic
performance for a student with a short attention span, and an untimed, silent reading test will not pick up
problems with reading fluency. Do not obsess over formulae. Some data will not fit formulae. The Team must
apply reasoned, professional judgment, not simply indulge in an exercise in arithmetic. By our interpretation
of federal law and by most state laws, it is not lawful to deny services to a student who truly has a learning
disability simply because of the results of a statistical exercise. A statistical comparison of ability and
achievement must use only one set of norms (e.g., national age or grade) [See] and should use predicted
achievement scores rather than simple differences
[ , ach.htm ,]. Remember
that these achievement areas have many components, including, for example, vocabulary or factual
knowledge, fluency, independence. Few, if any, achievement tests cover all aspects of the requisite skills. Do
not use tests on which the student receives very low or nearly perfect raw scores; find tests on which the
student passes and fails several items [].
oral expression;
listening comprehension;
written expression;
basic reading skill;
reading fluency
reading comprehension;
mathematics calculation; or
mathematics reasoning.
6. Are the discrepancies caused primarily by the disorders? There is absolutely nothing in IDEA to suggest
that a student cannot have a learning disability in addition to other disorders. However, the particular
discrepancy ("learning problems") in question must not be primarily the result of a vision, hearing, or motor
disability, of mental retardation, of emotional disturbance, or of environmental, cultural, or economic
disadvantage [300.7 (c) (10) (ii)], even if one or more of those disorders or disadvantages may be causing
other, separately identified learning problems. For example, a child with mental retardation might also have a
specific learning disability in math with extremely low achievement severely discrepant from low predicted
achievement because of a disorder in working memory. Similarly, a deaf or blind child might have unusual
difficulty learning American Sign Language or Braille because of visual perceptual weaknesses. If we have
been careful in your identification and analysis of the disorder(s), we should be able to separate them and
their effects from the effects of disadvantages and other disabilities.
7. Does the student require special modifications of, or accommodations in, the educational program in
order to achieve at levels commensurate with age and ability (4. C.)? Here is the crucial issue for
identification under Section 504 or the ADA. The needed accommodations or modifications should be more
than we would routinely ask of a teacher of moderate skill, experience, and dedication.
8. Does the student require a uniquely designed program of special instruction in order to achieve at levels
commensurate with age and ability (4. C.)? This is the crucial issue for identification of an educational
disability. If the student does not require a uniquely designed program of special instruction, but meets the
other criteria, the identification should probably be under Section 504 rather than the Individuals with
Disabilities Education Act.
Some Useful Assessment Addresses
Best Two Sources we have Found – Always Start Here
Sandra Steingart's School Psych Resources Online
Jim Wright's Intervention Central
Assessment (approximately alphabetical)
Dumont/Willis on the Web
Current Issues in Ed, Arizona State U. online journal
Dynamic Assessment web site
Marley Watkins: articles, software, tests, ASCA, etc.
Samuel Ortiz's School Psychology Homepage
papers by David Lohman
IAP list home page – lots of valuable documents
reviews of many books on cognitive assessment
Practical Assessment, Research, and Evaluation Dumont Excel test templates Code of Fair Testing Practices in Education
Assoc. of Specialists in Assmt. of Intell. Functioning
Strunk's Elements of Style on line
Official Cross-Battery Web Site
Dynamic Assessment Web Site extensive OCR Guide for High-Stakes Testing
ETS rules for documenting disabilities
Florida Center for Reading Research.
Institute for Applied Psychometrics (IAP)
Joel Schneider's info., scoring programs & tests
large Web site on intelligence testing
Jim Wright's Intervention Central
BICS vs. CALP, ESL assessment
information about with visual impairments
Low Vision Gateway – lots of good information
National Association of Test Directors graphing tools: McDougal, Clark, & Wilson Chronological age calculator
Steve Draper on Hawthorne, Pygmalion, placebo
Disability Evaluation for Social Security
SEDL Reading Assessment Data Base
Gary Canivez: articles, software, etc.
U.S.D.O.E. What Works Clearinghouse (WWC)
Word Finding difficulty - Diane German
Information on Specific Tests
DAS-II information
Insight DVD administered group CHC ability test
Northwest Evaluation Association NWEA MAP
WISC-IV Technical Reports, etc.
WAIS-IV information
Stanford-Binet Assessment Service Bulletins
WJ III Assessment Service Bulletins
RIAS PowerPoint
IDEA one-stop shopping site
U.S. DOE Doing What Works
U.S. DOE updates
FAQ about Section 504
OCR on high stakes testing
Family Policy Compliance Office (FPCO): FERPA
ordering page for Dept. of Ed. publications
Joint FERPA & HIPAA Guidance
Texas Education Agency
excellent, parent-oriented sped law web site
Achievement Test Grid
Tests with similar purposes and even similar names are not interchangeable. Differences in norms
(who was tested, where, and how long ago), content, format, and administration rules (e.g., time limits,
bonus points for speed, rules for determining which items are administered to each examinee) can yield
very different scores for the same individual on two tests that superficially seem very similar to each
other. See, for example, Bracken (1988) and Schultz (1988).
Tests that purport to measure the same general ability or skill may sample different component
skills. For example, if a student has word-finding difficulties, a reading comprehension test that requires
recall of one, specific correct word to complete a sentence with a missing word (cloze technique) might be
much more difficult than an otherwise comparable reading comprehension test that offers multiple choices
from which to select the correct missing word for the sentence (maze technique) or a test with open-ended
comprehension questions. Similarly, for a student with adequate reading comprehension but weak shortterm memory, the difference between answering questions about a reading passage that remains in view
and answering questions after the passage has been removed could make a dramatic difference in scores.
The universal question of students taking classroom tests – "Does spelling count?" – certainly applies to
interpretation of formal, normed tests of written expression.
Such differences in format make little difference in average scores for large groups of examinees,
so achievement-test manuals usually report robust correlations between various achievement tests despite
differences in test format and specific skills sampled (see, for example, McGrew, 1999, or the validity
section of any test manual). [These issues also apply to tests of cognitive ability, where they also receive
insufficient attention.]
Examiners need to select achievement tests that measure the skills relevant to the referral concerns
and to avoid or carefully interpret tests that measure some ability (such as word-finding or memory) that
would distort direct measurement of the intended achievement skill. When selecting or interpreting a test,
think about the actual demands imposed on the examinee, the referral concerns and questions, and what
you know about your examinee's skills and weaknesses.
The tables below are based on those developed by Sara Brody (2001) and are being corrected over
time by Ron Dumont, Laurie Farr Hanks, Melissa Farrall, Sue Morbey, and John Willis. They are only
rough summaries, but they at least illustrate the issue of test content and format and may possibly help
guide test selection. For extensive information on achievement tests, please see the test manuals, the
publishers' Web pages for the tests, the Buros Institute Test Reviews Online, and the Achievement Test
Desk Reference (Flanagan, Ortiz, Alfonso, & Mascolo, 2006).
Bracken, B. A. (1988). Ten psychometric reasons why similar tests produce dissimilar results. Journal of School
Psychology, 26, (2), 155-166.
Brody, S. (Ed) (2001). Teaching reading: Language, letters & thought (2nd ed.). LARC Publishing, P.O. Box
801, Milford, NH 03055 (603-880-7691
Flanagan, D. P., Ortiz, S. O., Alfonso, V. & Mascolo, J. T. (2006). Achievement test desk reference (ATDR-II): A
guide to learning disability identification (2nd ed.). Hoboken, NJ: Wiley.
McGrew, K. S. (1999). The Measurement of reading achievement by different individually administered
standardized reading tests: Apples and apples, or apples and oranges? IAP Research Report No. 1.
Clearwater, MN: Institute for Applied Psychometrics.
Schultz, M. K. (1988). A comparison of Standard Scores for commonly used tests of early reading. Communiqué,
17 (4), 13.
Buros Institute of Mental Measurements (n.d.). Test reviews online. Retrieved from
Words Reading
untimed untimed
Woodcock-Johnson III†
oral resp.
Oral & Written Language Scales (OWLS-II)
Compre- Listening Spelling
hension: CompreSilent
mod. cloze
Gray Oral Reading Test 5 ed.
Gray Silent Reading Test
Wechsler Individual Achievement Test-III
Kaufman Test of Ed. Achievement-II 6†
Test of Reading Comprehension 4th ed.
Peabody Individual Ach. Test-Rev. NU
mlt-chc ‡
2 subtests 7
silent 9
Nelson-Denny Reading Test
words &
nonsense wds.
Gates-MacGinitie Reading Tests 4th ed.
Slosson Oral Reading Test-Rev
Wide Range Achievement Test- 4 ed.
both 5
mlt-chc 10
both 2
Diagnostic Assessments of Reading
Italics = time limits
mod. cloze = modified cloze method, in which the student orally supplies a missing key word in the passage to demonstrate comprehension.
underscored subtests require the student to answer from memory without the item available for review.
Dumont, Farr Hanks, Farrall, Morbey, & Willis 3/7/12
The organization of these tables is borrowed from Table 11.1, p. 308, in Brody, S. (Ed) (2001). Teaching reading: Language, letters & thought (2 ed.). LARC Publishing, P.O.
Box 801, Milford, NH 03055 (603-880-7691 The selection and sequence are entirely arbitrary, whimsical, and based on personal experience.
The listing of categories for tests is as accurate as we could make it, but there may be errors. We would be grateful for corrections and suggestions for improvement.
Norms in one-month intervals.
A wide variety of reading tasks with letters, digraphs, blends, diphthongs, silent –e, r-controlled vowels, syllables, polysyllabic words, etc.
Special norms for examinees who cannot read the first (easiest) grade-appropriate passage and must drop back to easier passages. Interpret very carefully.
Grade norms are seasonal.
Comprehensive Form; the Brief Form combines oral word-list reading and comprehension in one score which we do not find useful for diagnosis.
Unusual multiple multiple-choice format.
For one, examinee must rearrange sentences in logical order. For the other, the examinee answers 5 multiple-choice questions about each passage.
Multiple-choice matching from memory of one of four pictures to each printed sentence after the sentence is removed.
The examinee marks the word he or she is reading 60 seconds after the starting time.
Norms available for extended time.
A Sampling of Tests Measuring Aspects of Phonological Awareness
Melissa Farrall, Ph.D. & Sara Brody, Ed.D.
Test of Auditory-Perceptual Skills - Revised (TAPS-R)
Auditory Word Discrimination Subtest: identifying whether two
words spoken by examiner are SAME or DIFFERENT
Test of Phonological Awareness (TOPA): Marking pictures of
orally presented words that are distinguished by the same or
different sound in the word-final position.
Rosner Test of Auditory Analysis Skills (TAAS): Say
"cowboy" without the "cow." Say "picnic" without the "pic."
Say "cart" without the '/t/." Say "blend" without the /bl/."
Lindamood Auditory Conceptualization Test (LAC-3): Using
colored blocks and felt squares to represent differences or
changes in sequences of speech sounds in syllables.
Woodcock-Johnson III (WJ III)‡: Incomplete Words, Sound
Blending, Auditory Attention., Auditory Working Memory, Rapid
Picture Naming, Word Attack, Spelling of Sounds, Sound Awareness.
Goldman-Fristoe-Woodcock Auditory Skills Test Battery
(GFW) Many auditory tasks as well as reading and spelling
nonsense words. Very, very old norms.
Roswell-Chall Auditory Blending Test:
Blending sequences of sounds spoken by the examiner.
The Phonological Abilities Test (PAT)
by Muter, Hulme, & Snowling
The Phonological Awareness Test 2 (TPAT-2)
by Robertson & Salter
Comprehensive Test of Phonological Processing (CTOPP)
of phonological awareness, memory, & rapid naming.
Kaufman Test of Educational Achievement – II (KTEA-II) ‡
by Alan & Nadine Kaufman
Differential Ability Scales II (DAS-II)
Phonological Processing and Rapid Naming subtests
This table is adapted (stolen) from Brody, S. (Ed) (2001). Teaching reading: Language, letters & thought (2nd ed.). LARC Publishing, P.O. Box 801, Milford, NH 03055 (603880-7691 with extensive revisions by Melissa L. Farrall. The selection and sequence of tests are entirely arbitrary, whimsical, and based on
personal experience. The listing of categories for tests is as accurate as we could make it, but there may be errors. We would be grateful for corrections and suggestions for
A new edition is in development.
Oral & Written Language Scales (OWLS-II)
Spelling Writing Writing Writing Writing Editing
Vocabu- Dictated
ConWords Nonsense
Sentences strained
in context
in context
Test of Written Language 3rd ed.
Test of Written Language 4 ed.
Wechsler Individual Ach Test-III
Woodcock-Johnson III‡
Peabody Individ. Ach Test-Rev.
Kaufman Test of Ed. Ach. – II 13‡
Test of Written Spelling 3rd ed.
Story: Content Syntax Punc- Writing Norms
Score tuation Speed
2 scores
2 scores
2 subtests
3 scores*
2 scores
2 subtests
2 scores
1 test
item analysis
holistic grade stanine
student summarizes story†
both 11
both 12
2 lists
Wide Range Achievement Test-4
Italics = time limits
Dumont, Farr Hanks, Farrall, Morbey, & Willis, 8/1/11.
The organization of these tables is borrowed from Table 11.1, p. 308, in Brody, S. (Ed) (2001). Teaching reading: Language, letters & thought (2nd ed.). LARC Publishing, P.O.
Box 801, Milford, NH 03055 (603-880-7691 The selection and sequence of tests are entirely arbitrary, whimsical, and based on personal
experience. The listing of categories for tests is as accurate as we could make it, but there may be errors. We would be grateful for corrections and suggestions for improvement.
scores are very strongly influenced by the amount written in 15 minutes
The written expression test has several different types of items yielding a total score.
There is a scoring system provided for analyzing stories and essays written separately from the formal test
Part of the scoring of the Writing Samples test
Norms by one-month intervals.
Seasonal grade norms.
Comprehensive and Brief Forms.
The norms are excessively old, but the Goldman-Fristoe-Woodcock Auditory Skills Test Battery was "way ahead of its time" (Kevin McGrew) and is worth studying.
Kaufman Test of Ed. Ach. - II 15‡
Paper &
Wechsler Individual Ach. Test-III
Peabody Indiv. Ach Test-Rev. NU
with paper
and pencil
Wide Range Achieve. Test-4
KeyMath III
Norms for
Norms for
Misreading Signs
both 16
Woodcock-Johnson III‡
w/o paper
and pencil
both 17
Italics = time limits
The organization of these tables is borrowed from Table 11.1, p. 308, in Brody, S. (Ed) (2001). Teaching reading: Language, letters & thought (2nd ed.). LARC Publishing, P.O.
Box 801, Milford, NH 03055 (603-880-7691 The selection and sequence of tests are entirely arbitrary, whimsical, and based on personal
experience. The listing of categories for tests is as accurate as we could make it, but there may be errors. We would be grateful for corrections and suggestions for improvement.
Dumont, Farr Hanks, Farrall, Morbey, & Willis 8/1/11.
Comprehensive Form; the Brief Form combines computation and applications in one score we do not find useful for diagnosis.
A new edition is in development.
Seasonal grade norms.
Norms at one-month intervals.
Summary of Design Characteristics for Several Cognitive Assessment Batteries
Approximate Time
5 - 17
Full Scale. Planning, Attention,
12 (3 per scale)
60 minutes
6 – 18-11
Nonverbal Intelligence Quotient,
6 (3 use pictured
60 minutes
Pictorial Nonverbal Intelligence
objects, 3 use
quotient, and Geometric Nonverbal
geometric designs)
Simultaneous, and Successive.
Intelligence Quotient
2-6 to 17-11
11 to 85
3 to 18
General Conceptual Ability (GCA).
20 (10 core, 10
Verbal, Nonverbal Reasoning, Spatial
Composite IQ. Fluid Scale,
10 (4 per scale and 2
Crystallized scale.
memory recall)
Nonverbal Index, Mental Processing
18 (core or
Index or Fluid Crystallized Index.
30-60 minutes
40 minutes
30-70 minutes
Simultaneous and Sequential
Processing, Learning Ability or Longterm Retrieval, Short-term Memory,
Fluid Reasoning, Crystallized Ability,
Visual-Spatial Processing
4 – 90
Crystallized (Verbal), Fluid
2 (1 per scale)
20 minutes
Full Scale Quotient. Fundamental
20 (4 reasoning, 6
25 to 90 minutes
Visualization or Spatial Visualization,
visualization, 8
Fluid Reasoning, Attention memory
memory, 2 attention)
(Nonverbal), IQ
2 to 20-11
3 to 90
Composite Intelligence Index, Verbal
6 (2 Verbal, 2
and Nonverbal Intelligence, Composite
Nonverbal, 2 memory)
20 to 35 minutes
Memory Index
2 – 90
Crystallized Verbal Intelligence
10-20 minutes
Full Scale IQ Fluid Reasoning,
10 (5 Verbal, 5
60 minutes
Knowledge, Quantitative Reasoning,
Visual-Spatial Processing, and
Working Memory, Full Scale
6 to 90
Nonverbal Intelligence
15 to 20 minutes
5 - 17
Full Scale IQ. Memory Quotient,
15 to 45 minutes
Reasoning Quotient, Symbolic
Quotient, Nonsymbolic Quotient
Summary of Design Characteristics for Several Cognitive Assessment Batteries (continued)
6 – 89
Full Scale-4, Full Scale-2, Verbal
15 to 30 minutes
Full Scale, Verbal Comprehension,
15 (4 verbal
70 minutes
Perceptual Organization, Working
comprehension, 5
Memory, and Processing Speed
perceptual reasoning, 3
working memory, 3
Comprehension, Perceptual Reasoning
16 - 89
processing speed)
6 – 16-11
Full Scale IQ, Verbal Comprehension,
15 (5 verbal
Perceptual Reasoning, Working
comprehension, 4
Memory, and Processing Speed
perceptual reasoning, 3
working memory, 3
70 minutes
processing speed)
2:6 – 7:6
Full Scale
30 to 50 minutes
2 – 90
General Intellectual Ability.
20 (2 per scale plus 6
40 to 90 minutes
Long-term Retrieval, Short-term
Memory, Processing Speed, Fluid
Reasoning, ComprehensionKnowledge, Visual-Spatial Processing,
Auditory Processing
Note: CAS = Cognitive Assessment System; CTONI = Comprehensive Test of Nonverbal Intelligence; DAS-II = Differential Ability Scales –
Second Edition; KAIT = Kaufman Adolescent and Adult Intelligence Test; KABC-II = Kaufman Assessment Battery for Children Second
Edition; K-BIT 2 = Kaufman Brief Intelligence Test, Second Edition; Leiter-R = Leiter International Performance Scale-Revised; RIAS =
Reynolds Intellectual Assessment Scales; SIT-R3 = Slosson Intelligence Test Revised-Third Edition for Children and Adults; SB5 = StanfordBinet Intelligence Scale, Fifth Edition; TONI-3 = Test of Nonverbal Intelligence-Third Edition; UNIT = Universal Nonverbal Intelligence Test;
WASI-II= Wechsler Abbreviated Scale of Intelligence- Second Edition ; WAIS-IV = Wechsler Adult Intelligence Scale Fourth Edition;; WISCIV = Wechsler Intelligence Scale for Children, Fourth Edition; WPPSI-IV = Wechsler Preschool and Primary Scales of Intelligence – Fourth
Edition;; WJ III = Woodcock-Johnson III Tests of Cognitive Abilities
Measures of Phonological Processing Found on a Selection of Tests
Auditory Discrimination Test – 2nd Ed.
Word Pairs-same/different
The Phonological Awareness Test
Rhyming: Discrimination & Production
Segmentation: Sentences, syllables, phonemes
Isolation: initial, final, & medial sounds
Deletion: syllables & phonemes
Substitution: with and without manipulatives
Blending: syllables & phonemes
Decoding: Nonword Reading
Comprehensive Test of Phonological Processing
Deletion, Reversal, Blending, Segmenting, Rapid
Diagnostic Achievement Battery-3rd Ed.
Phonemic Analysis subtest: Deletion
Wechsler Individual Achievement Test-2nd Ed.
Pseudoword Decoding: nonword reading
Gray Oral Reading Tests-Diagnostic
Word Attack
Woodcock Diagnostic Reading Battery
Word Attack: nonword reading
Incomplete Words: Closure
Sound Blending
Illinois Test of Psycholinguistic Abilities-3rd Ed.
Sound Deletion: deletion
Rhyming Sequences: rhyming sound
Decoding: nonword reading
Sound Spelling: Nonword Spelling
Woodcock-Johnson III Tests of Achievement
Word Attack: nonword reading
Spelling of Sounds: nonword spelling
Sound Awareness: Rhyming, Deletion,
Substitution, Reversal
Kaufman Test of Educational Achievement-II
Phonological Awareness: rhyming, matching,
blending, segmenting and deleting
Nonsense Word Decoding: nonword reading
Decoding Fluency: speed of nonword reading
Woodcock-Johnson III Tests of Cognitive
Sound Blending
Incomplete Words: Closure
Auditory Attention
Phonological Processing: match phoneme to
picture of whole word
Sound-picture: Deletion, Substitution
Speeded Naming: Rapid Naming
Woodcock-Johnson III Diagnostic Supplement
Sound Patterns-Voice: word pairs (nonwords) –
Process Assessment of the Learner
Syllables: Segmenting
Phonemes: Deletion
Rimes: Deletion
Pseudoword Decoding: Nonword Reading
RAN Words, Digits, Words and Digits: Rapid
Woodcock Reading Mastery Tests-Revised (NU)
Word Attack : nonword reading
Test of Language Development-3rd Ed.
Phonemic Analysis: Segmenting
Test of Phonological AwarenessKindergarten - Initial Sound: Same/Different
Early Elementary - Final Sound: Same/Different
Test of Phonological Awareness Skills
Rhyming, Closure, Sequencing, Deletion
Test of Word Reading Efficiency
Phonetic Decoding: nonword reading
Social Studies/History
Physical Education
Adapted from Willis, J. O. & Dumont, R. P. (2002). Guide to identification of learning disabilities (3rd ed.).
Age (A) or Grade (G) Norms:
Oral Reading: words aloud from a list
reading nonsense words aloud (phonics)
reading speed/fluency
accuracy in reading paragraphs aloud
Reading Vocabulary: synonyms
antonyms and synonyms
antonyms, synonyms, and analogies
Reading Comp: written multiple-choice questions
oral, multiple-choice questions
oral, open-ended questions
oral words to fill in blanks (cloze)
matching pictures to sentences
Spelling: writing dictated words
writing dictated nonsense words
spelling accuracy in story or essay
multiple-choice spelling test
Writing: writing mechanics/style
writing sentences according to directions
writing fluency
Story/Essay Writing: vocabulary
punctuation/writing mechanics/style
holistic assessment
Math: paper-and-pencil computation
applications – oral without paper and pencil
applications – with paper and pencil
orally presented multiple-choice
multiple-choice presented in writing
* silent reading
scores are reported as:
Reading Vocabulary
Reading Comprehension
Word Study Skills/Word Analysis
Total Reading
Language Mechanics
Language Expression
Total Language
Math Computation
Math Concepts
Math Applications
Total Math
Social Studies
Environment/Science & Soc. Studies
Study Skills
Total Battery
Aptitude/Cognitive Ability Test
Ten Top Problems with Normed Achievement Tests for Young Children
Ron Dumont
John O. Willis
Fairleigh Dickinson University
Rivier College
1. There is no agreed-upon preschool or kindergarten curriculum at national, state, and even, in some cases, local
levels. It is difficult to sample a curriculum that does not exist. For higher grades, there is at least some
commonality among the many curricula at a given grade level. The same skill may be placed at very different
levels. See, for example,,, and
2. Young children are often inconsistent in their responses, which would argue for more items to increase
3. But young children have short attention spans and they fatigue easily, which requires fewer items.
4. Sampling works well for a large domain. For example, if a child is expected to have a reading vocabulary of
3,000 words, it is pretty easy to estimate reading skill with a 25-word test. However, if a child is expected to
have a reading vocabulary of 10 words, your 25-word test could, by pure chance, easily sample all 10 or none
of them, giving an inflated or depressed estimate of the child's reading ability. Similarly, many achievement
tests for young children have only a few letter-naming items, rather than 53. If a child knows ten to sixteen
letters, a ten-item test could easily hit or miss all of them by pure chance. If you test a child on Monday, and
the teacher teaches the vowels on Tuesday, that could be the difference between a score of zero and a score of
five on the ten-item test.
5. Young children develop new skills so rapidly that norms tables should be divided by weeks, not three, four, or
six months. The difference between age 10-0-0 to 10-6-29 and age 10-7-0 to 10-11-29 may be trivial, but the
difference between 4-0-0 to 4-6-29 and 4-7-0 to 4-11-29 is tremendous.
6. Item format matters a lot more for younger children. Most ten-year-olds don't care whether an addition
problem is presented horizontally or vertically, but five-year-olds do. The space between lines on writing
paper and the presence or absence of a dotted midline is a deal-breaker for most kindergarten students.
7. Item gradients are necessarily very steep for younger children. There aren't any clearly defined steps between
not being able to write the letter M and being able to write it.
8. Norming samples are also a huge problem at the preschool level. If you carefully sample geographic regions,
parental education and income, and other germane variables, you can be fairly safe in using whatever public
and private schools (in the right proportions) are available. However, at the preschool level, there is a huge
difference between the Mary Poppins School of Unfettered Self-Expression and Free Play and the John Stuart
Mill Preschool of Relentless Academic Pressure. Low-income kids from the JSMPRAP are likely to score
higher on academic achievement tests than rich kids from the MPSUSEFP. A truly representative national
sample (especially with only 100 to 300 kids per year of age) is virtually unobtainable.
9. There often is insufficient floor for young children on achievement tests. See, for example,
10. Consequently, formal and informal, criterion-based tests (with exhaustive sampling, e.g., all 53 letters, all sums
up to ten, etc.); curriculum-based measurement; classroom observations; and work samples are likely to be
much more informative than normed tests up to at least a mid-second-grade level of achievement (regardless of
actual grade placement.
Bracken, B. A. (1988). Ten psychometric reasons why similar tests produce dissimilar results. Journal of School
Psychology, 26, (2), 155-166.
Bracken, B. A., & Walker, K. C. (1997). The utility of intelligence tests for preschool children. In D. P.
Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories,
tests, and issues (pp. 484-502). New York: Guilford Press.
Schultz, M. K. (1988). A comparison of Standard Scores for commonly used tests of early reading.
Communiqué, 17 (4), 13.
Keith Stanovich's Matthew Effects
Matthew 25:29 "For unto every one that hath shall be given, and he shall have abundance: but from him that hath
not shall be taken away even that which he hath."
A student who gets off to a slow start in reading for any reason is likely to keep falling farther behind, rather than
catching up, as other students continue get more reading done per unit of time and keep progressing. Stanovich, K.
E. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy.
Reading Research Quarterly, 21, 360-407.
This is a really important concept! First, a slow start in reading is a regular education emergency. Regular
education needs to check progress and needs to intervene promptly. Second, not all weak readers have specific
learning disabilities. It might be a cumulative Matthew effect. They need immediate intervention as soon as the
weakness is belatedly discovered, but the intervention may not be special education. Did I mention that this is
really important?
The Mark Penalty
Mark 4:25 "For he that hath, to him shall be given: and he that hath not, from him shall be taken even that which
he hath."
"In this situation, both measures – 'ability' and 'achievement' – are depressed by the same disorder. Therefore, the
distinction between 'achievement and intellectual ability' is rendered meaningless by the contamination of both
areas." (Willis, J. O. & Dumont, R. P. (2002). Guide to Identification of Learning Disabilities (3rd ed., pp. 131-2).
Peterborough, NH: authors. Not only do blind and deaf children tend
to get artificially low full scale or composite scores on mental ability tests, but so do children with visual
perception (Gv) and auditory perception (Ga) weaknesses (as well as children with other basic-process
weaknesses). Then, it falsely appears that they have limited intelligence rather than a specific learning disability.
The Luke Composite Effect
Luke 8:18 "Take heed therefore how ye hear: for whosoever hath, to him shall be given; and whosoever hath not,
from him shall be taken even that which he seemeth to have."
Total or composite scores will be more extreme (farther from the mean) than the average of the component scores
(unless all of the component scores are perfectly correlated). [See, for example, McGrew, K. S. (1994). Clinical
interpretation of the Woodcock-Johnson Tests of Cognitive Ability-Revised. Boston: Allyn and Bacon.] On the
WISC-IV, VCI 65, PRI 65, WMI 65, PSI 65 = FSIQ 57, and VCI 134, PRI 135, VMI 135, PSI 136 = FSIQ 144.
The Luke Jeopardy
Luke 19:26 "For I say unto you, That unto every one which hath shall be given; and from him that hath not, even
that he hath shall be taken away from him."
Students who have one known disability are very likely to have additional disabilities. It is essential not to
overlook other possible disabilities and weaknesses, nor to automatically attribute all problems to the initial
The Other Matthew Effect
Matthew 13:12 "For whosoever hath, to him shall be given, and he shall have more abundance: but whosoever
hath not, from him shall be taken away even that he hath."
If a student has a major disability, any additional disabilities or weaknesses ("The Luke Jeopardy") are likely to
have more severe effects on the student's functional abilities than they would in isolation.
Raw Scores of Zero
Ron Dumont, Fairleigh Dickinson University, & John Willis, Rivier College
We are severely allergic to zero raw scores. Jeri J. Goldman (1989) is the pioneer in the study of zero
raw scores. We have tried to contribute to this vital field with the Evaluation of Sam McGee at
A zero raw score can sometimes mean that the student lacks an essential skill that is not the intended
target of the test. The WJ III test of Editing is a good example, used by the publisher, Riverside, in
communications with WJ III purchasers. Editing is intended to measure a specific editing skill in kids who can
read. If the student cannot read (especially if it is because nobody has tried to teach this four-year-old how to
read yet), then what should have been a test of proofreading skill inadvertently becomes just another reading
More important, though, is that zero raw scores near the bottom end of a test's age range can produce
insanely high scores (for example in Goldman's and our dead students).
Another problem is that you don't know whether it was a high zero or a low one. Say, for example, the
zero raw score yields a standard score of 65. Was the student almost capable of getting one item correct and
leaping to a standard score of 78? Or, if the test had had sufficient bottom, would the student still have gotten
a zero, reflecting a true ability at a standard-score level of 20?
Sampling error becomes a horrendous problem with zero raw scores. We should worry when scores
are based on only a few correct responses, much less zero. Suppose a beginning first grader can read 25 words
at sight and has no other word attack skills. By sheer chance, that kid's set of 25 words might include 10 words
on the WJ-III, 5 on the KTEA-II, and none on the WIAT-III (or any other combination). The closer you are to
the bottom of the test, the more sampling error will bite you. Schultz's (1988) comparison of standard scores
for commonly used tests of early reading, discussed above, found that different achievement tests yielded
widely different scores for the same test performance, e.g., standard score of 98 on one test and 65 on another.
A zero raw score might mean that the student simply missed the whole point of the exercise, even
though the student was reasonably competent in the underlying ability that the test was intended to measure. A
child might, for example, be able to demonstrate verbal abilities pretty well on an analogies test, but not on a
similarities one. Daniel (1986) found, for another example, that block-design tests with flat tiles and with
three-dimensional cubes measured the same construct in sixth grade children. However, the 3-D cubes might
confuse some younger children who could have demonstrated their abilities with flat tiles (Elliott, 1990, p.
Even if a zero raw score should happen, by accident, to produce a valid standard score, you certainly
would not have much data to work with. We strongly recommend that if you get within about one SEm of a
zero raw score, it would be prudent to find some other measure of that ability that had more bottom.
Daniel, M. H. (1986, April). Construct validity of two-dimensional and three-dimensional block design. Paper
presented at the annual convention of the National Association of School Psychologists, Hollywood, FL.
Elliott, C. D. (1990). Differential Ability Scales introductory and technical handbook. San Antonio: The
Psychological Corporation.
Goldman, J. J. (1989). On the robustness of psychological test instrumentation: Psychological
evaluation of the dead. In Glenn G. Ellenbogen (Ed.) The Primal Whimper: More Readings from the
Journal of Polymorphous Perversity. New York: Ballantine, Stonesong Press.
Schultz, M. K. (1988). A comparison of Standard Scores for commonly used tests of early reading.
Communiqué, Newsletter of the National Association of School Psychologists, 17 (4), 13.
Executive Functions as Basic Psychological Processes
For reasons that are not clear to us, people keep emailing us to ask whether "executive functions" qualify as
"basic psychological processes" for the purpose of identifying a specific learning disability under IDEA.
Although we have already replied to 35 of the 37 people in the English-speaking world who have the slightest
interest in our opinion, here it is for the other two.
1. Executive functions are perfectly respectable basic psychological processes. They are basic. They are
psychological. They are processes. What more could you ask?
2. "Executive functions" should be written in the plural because there are lots of them and an individual may have
different levels of functioning in different functions.
3. Cognitive ability tests (even ones that tap many different cognitive abilities, such as the Wechslers, DAS,
KABC, Stanford-Binet, or WJ) do not measure all or even close to all potential basic psychological processes.
Even the ones designed to produce a profile of strengths and weaknesses focus on tests that primarily measure
intelligence and only secondarily measure other processes. A "flat profile" on a test of cognitive ability does
not rule out anything (well, OK, it does rule out a jagged profile on that administration of that test).
In fact, one reason that prefrontal lobotomies remained popular as long as they did was that lobotomies
usually did not make significant differences in IQ scores on the Wechsler and Binet scales! [Goldstein, K. (1950).
Prefrontal lobotomy; analysis and warning. Scientific American, 182 (2), 44-47.]
When people are constructing an IQ test, one thing they usually do is throw out any candidate subtests that
do not have at least reasonably high correlations with psychometric g. 18 A subtest that does not measure
intellectual ability usually does not make the cut. [An exception is the DAS-II, which puts high-g-loading subtests
in its "Core" battery to compose a General Conceptual Ability (GCA) score, but then adds a bunch of "Diagnostic"
subtests that yield valuable information, but do not necessarily have particularly high g loadings. The Diagnostic
subtests are not included in the GCA. You can do something somewhat similar by separately considering the GIA
and CPI scores on the WISC-IV or WAIS-IV.]
Cecil Reynolds and others have made the point on the NASP listserv (and in many publications that I
cannot cite from memory) that profile analysis on an IQ test is risky at best because the g loadings of the subtests
make them contaminated measures of anything else they might be measuring. That is the reason, for example, that
an OT may find horrendous weaknesses with visual perception, but the SAIF or psychologist may report only mild
problems (if any) on cognitive ability subtests, such as WISC Block Design or DAS Pattern Construction, that are
"supposed to" tap visual perceptual skills. The cognitive ability subtests primarily measure intelligence and only
secondarily measure other specific abilities. This issue would argue for using dedicated tests (such as the ones
with funny names employed by OTs, SLPs, and neuropsychologists) to measure abilities other than intelligence
and using IQ tests that are tightly focused on measuring g. I agree with the observation, but still prefer to begin my
assessment with a profile-producing measure, such as the DAS-II, KABC-II, or Wechsler scales, and then follow
up leads with the specialized tests, as recommended by many authorities.
4. The Office of Special Education Programs and Office of Special Education and Rehabilitative Services (OSEP
and OSERS) have repeatedly made it clear that the law does not require documentation of a processing disorder
for identification of a specific learning disability. Legally (unless state law requires otherwise), the process
weakness may simply be inferred. [McBride, G.M., Dumont, R., & Willis, J.O. (2011, pp. 79, 80, 86, 88-92).
Essentials of IDEA for Assessment Professionals. New York: Wiley].
5. However, we believe that best practice does call for documentation of a disorder in one or more of the basic
psychological processes involved in understanding or in using language, spoken or written, that may manifest itself
in the imperfect ability to listen, think, speak, read, write, spell, or to do mathematical calculations. In fact, we
believe we need not only to clearly name and define the specific weak process (not merely vaguely calling it some
sort of process or executive function disorder, but specifying one or more impaired specific processes such as, for
General, global, overall intelligence. On most IQ tests, the total score (e.g., Wechsler Full Scale IQ) attempts to measure this
psychometric g. Factor analytic techniques can yield a more sophisticated estimate of g. For a very readable discussion of
these issues, I recommend Alan S. Kaufman's IQ Testing 101 (New York, NY: Springer, 2009 ISBN 978-0-8261-0629-2),
which is totally accessible to readers with no background in psychological assessment, but held the attention of, and provided
new information for, someone who has been doing IQ testing as a hobby for 43 years.
example, spatial perception, auditory working memory, fine-motor coordination, task initiation, multi-tasking, selfmonitoring and evaluation, or planning), but also to explain how that weakness impairs achievement.
6. Achievement may be measured by normed test scores. However, it may also, legally and logically, be measured
by informal assessments, classroom performance, or academic grades. Normed achievement tests do not even
attempt to assess many essential academic skills, such as working through multiple drafts to produce a ten-page
essay; reading, comprehending, and remembering a 40-page chapter in a science or history text; or sustaining
attention to, understanding, taking notes on, and remembering a 40-minute lecture. High scores on even the best
of the normed achievement tests do not rule out possible academic problems.
7. Many of the important executive functions involve activities that last for many minutes, hours, days, weeks, or
months. By their nature, most executive functions are issues of consistent, typical performance, not the peak
performance targeted by most tests of cognitive ability and academic achievement. There are some useful direct
tests of some executive functions, such as the NEPSY-II and Delis-Kaplan Executive Function System (D-KEFS).
Memory tests, such as the TOMAL-2 and WRAML2, include some executive functions. There are also
procedures, such as the Rey-Osterrieth Complex Figure, the WISC-IV Integrated, or the Advanced Clinical
Solutions for the WAIS-IV and WMS-IV, which can be helpful.
You can also learn a lot about a student's executive functions by observing the student's approach to the
tasks. For example, if the student is struggling to correctly orient the last of the identical patterned cubes to copy a
Block Design, does the student rotate it once, rotate it again, and then – instead of rotating it once more into the
correct orientation, flip the block over to start again from scratch with an identical face? Does the student work
very slowly and carefully on the timed Coding subtest, but quickly and impulsively on the untimed, multiplechoice Picture Concepts and Matrix Reasoning subtests? Does the student misunderestimate the difficulty of
repeating dictated series of digits and then devote more mental energy to repeating other series backwards, thereby
recalling more digits backward than forward? Or does the student fail to learn from experience?
However, we also need to rely on the observations of parents, teachers, and therapists who interact with the
student for months or years. Standardized, normed questionnaires allow us to collect and quantify those
observations. Most authorities consider it best practice to first use a broad-spectrum questionnaire, such as the
BASC-2, and then a more narrowly focused one, such as the Behavior Rating Inventory of Executive Function
(BRIEF). The risk of beginning an assessment with a narrowly focused test or questionnaire is Maslow's hammer
problem. 19 Many of us recall the widespread use of school readiness tests that identified maturity and immaturity.
Although thoughtful examiners knew better and used their knowledge appropriately, the formal test scores perforce
classified intellectual disability, cerebral palsy, deafness, blindness, and childhood schizophrenia as "immaturity."
It is imprudent to prematurely narrow our choices in the assessment process.
8. Finally, as with any assessment process, we want to understand and explain how the disorder actually impairs
academic achievement or performance. We want to understand the mechanism. For one example, it might be a
stretch to attribute poor math fluency to a weakness in planning. The burden of proof would be on the evaluator
suggesting that cause-and-effect relationship. However, if the student was given a low rating from parents and
even lower from teachers on the BRIEF Plan/Organize scale, 20 was observed to plan poorly on written expression
and block design tasks in the assessment, and scored low on Elithorn Mazes on the WISC-IV Integrated and
planning tasks on the KABC-II, D-KEFS or NEPSY-II, you might make a pretty strong argument that poor
planning ability helped explain the student's confused, incomplete, and late homework; the lack of practice caused
by the inadequate homework; and the resulting lack of skill and fluency on tasks the homework was designed to
reinforce. You might also be able to demonstrate how poor planning interfered with computation of multiple-step
math computation examples and solution of multiple-step math story problems, with writing assignments, and with
the planning of experiments in the science lab (including the one that required evacuation of the school last
To the man who only has a hammer, everything he encounters begins to look like a nail. – Abraham H. Maslow
"ability to manage current and future-oriented task demands. . . . anticipate future events, set goals, and develop appropriate
steps ahead of time to carry out a task or activity. . . .bring order to information and to appreciate main ideas of key concepts"
When a new test is developed, it is normed on a sample of hundreds or thousands of people. The sample
should be like that for a good opinion poll: female and male, urban and rural, different parts of the
country, different income levels, etc. The scores from that norming sample are used as a yardstick for
measuring the performance of people who then take the test. This human yardstick allows for the
difficulty levels of different tests. The student is being compared to other students on both difficult and
easy tasks. You can see from the illustration below that there are more scores in the middle than at the
very high and low ends. Many different scoring systems are used, just as you can measure the same
distance as 1 yard, 3, feet, 36 inches, 91.4 centimeters, 0.91 meter, or 1/1760 mile.
PERCENTILE RANKS (PR) simply state the percent of persons in the norming sample who scored the
same as or lower than the student. A percentile rank of 50 would be Average – as high as or higher than
50% and lower than the other 50% of the norming sample. The middle half of scores falls between
percentile ranks of 25 and 75.
STANDARD SCORES ("quotients" on some tests) have an average (mean) of 100 and a standard
deviation of 15. A standard score of 100 would also be at the 50th percentile rank. The middle half of
these standard scores falls between 90 and 110.
SCALED SCORES ("standard scores on some tests) are standard scores with an average (mean) of 10
and a standard deviation of 3. A scaled score of 10 would also be at the 50th percentile rank. The middle
half of these standard scores falls between 8 and 12.
V-SCALE SCORES have a mean of 15 and standard deviation of 3. A v-scale score of 15 would also be
at the 50th percentile rank and in Stanine 5. The middle half of v-scale scores falls between 13 and 17.
T SCORES have an average (mean) of 50 and a standard deviation of 10. A T score of 50 would be at
the 50th percentile rank. The middle half of T scores falls between approximately 43 and 57.
There are 200 &s.
Each && = 1%.
& &
& & &
Percent in each
Standard Scores
– 69
70 – 79
80 – 89
90 – 109
110 – 119
120 – 129
130 –
Scaled Scores
V-Scale Scores
2 3
T Scores
– 29
< –2.00
Percentile Ranks
Wechsler IQ
30 – 36
2.00 – –1.34
37 – 42
1.33 – –0.68
16 17 18 19
21 – 24
43 – 56
57 – 62
63 – 69
70 –
0.67 – 0.66
0.67 – 1.32
1.33 – 1.99
2.00 –
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
– 02
03 – 08
09 – 24
25 – 74
75 – 90
91 – 97
Below Av.
Below Av.
Above Av.
98 –
Above Av.
Severe Delay =
30 – 39
– 69
Vineland Adaptive
55 –
Below Average 70
– 84
Below Average 70
– 84
(90 – 110)
Below Average 70
– 84
85 – 115
– 70
Moderately Low
71 – 85
Moderately Low
Very Low
– 70
Very Low
– 73
71 – 77
78 – 85
74 – 81
82 - 88
85 – 115
85 – 115
(111 – 120)
(121 – 130)
Above Average
116 – 130
Above Average
116 – 130
High Average
Above Average
116 – 130
Very High/
Very Superior
(131 – )
131 –
146 –
Adequate or Average
Moderately High
86 – 114
115 – 129
130 –
Moderately High
Above Average
86 – 114
115 –
Very High
Average Average 119 – 126
97 – 103
127 –
104 - 111 112 – 118
89 – 96
Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley
Custom Publishing, 1998, p. 27). Also available at
RELATIVE PROFICIENCY INDEXES (RPI) show the examinee's level of proficiency (accuracy, speed, or
whatever is measured by the test) at the level at which peers are 90% proficient. An RPI of 90/90 would mean that, at
the difficulty level at which peers were 90% proficient, the examinee was also 90% proficient. An RPI of 95/90 would
indicate that the examinee was 95% proficient at the same level at which peers were only 90% proficient. An RPI of
75/90 would mean that the examinee was only 75% proficient at the same difficulty level at which peers were 90%
Proficiency with Age- or Grade-Level Tasks
Age- or Grade-Level Tasks will be:
Very Advanced
Extremely Easy
98/90 to 100/90
Very Easy
95/90 to 98/90
Average to Advanced
82/90 to 95/90
67/90 to 82/90
Limited to Average
24/90 to 67/90
Very Difficult
3/90 to 24/90
Very Limited
Extremely Difficult
0/90 to 3/90
Adapted from Jaffe, L. E. (2009). Development, interpretation, and application of the W score and the relative proficiency index
(Woodcock-Johnson III Assessment Service Bulletin No. 11). Rolling Meadows, IL: Riverside Publishing.
Random Comments on the DAS-II and WISC-IV
Table 8.10, p. 169 in the DAS-II Introductory and Technical Handbook (Elliott, 2007b) shows the WISC-IV and
DAS-II scores obtained by 202 children of ages 6-0 through 16-11 tested about three weeks apart (1 to 72 days) with the DASII first. The sample is small compared to standardization samples, but fairly large for studies of this kind.
Elliott notes (2007, p. 168) that the DAS-II scores are slightly lower than the WISC-IV ones, as would be expected by
the Flynn Effect (e.g., Flynn, 1998) and possible learning of test procedures by taking the DAS-II first. I don't know why
counterbalanced order was not used. I would add that the WISC-IV Index score that is highest compared to other WISC-IV
Index scores and DAS-II Cluster scores was, as would be expected from the Flynn Effect, Perceptual Reasoning (mean =
106.2). The other WISC-IV Index and DAS-II Cluster scores ranged from 98.8 to 101.2 (FSIQ 103.1).
When an individual student had notably different scores on the WISC-III and DAS, the culprit was usually the DAS
Nonverbal (fluid) Reasoning Cluster. In most cases, the WISC-III Verbal Comprehension and DAS Verbal Ability scores
would be close to one another, and the WISC-III Perceptual Organization and the DAS Spatial Ability scores would also hang
together. A high or low Nonverbal Reasoning Cluster score would usually cause any noteworthy difference between the child's
WISC-III FSIQ and DAS GCA scores (see, for example, Dumont, Cruse, Price, & Whelley, 1996).
The WISC-IV dropped Picture Completion from the Perceptual Index (which changed from a 4-subtest Perceptual
Organization Index to a 3-subtest Perceptual Reasoning) and added two subtests that appear to be measures more of fluid
reasoning (Gf) than visual-spatial thinking (Gv) (see Farrall's "Myth of the WISC-III/WISC-IV Retest). I would speculate that
the WISC-IV Perceptual Reasoning Index might be closer to the DAS-II Special Nonverbal Composite rather than showing the
closer relationship to the Spatial Ability than to the Nonverbal Reasoning that we saw with the WISC-III and DAS.
The DAS-II Verbal Ability and WISC-IV Verbal Comprehension have very similar word-defining subtests. The
greatest difference is the 2-1-0 WISC-IV and 1-0 DAS-II scoring. In the comparison of the two tests (Elliott, 2007),
Vocabulary and Word Definitions correlated .64, with the WISC-IV score (10.3) only slightly higher than the DAS-II (49.6).
Both instruments have subtests of explaining how different words can be alike. The DAS-II uses three words and 1-0 scoring
for most items (and in Elliott's study was given first), but the
WISC-IV uses only two words per item and 2-1-0 scoring for
most items. The correlation between the two was .63 with
.67 100.0
extremely close scores (50.5 DAS-II and 10.5 WISC-IV). The
WISC-IV Verbal Comprehension Index, however, includes a
.69 100.2
third subtest: Comprehension. In Elliott's (2007) study, the
.78 100.0
WISC-IV Comprehension had correlations of .39 with the
W Mem
DAS-II Verbal Comprehension, .57 with Naming Vocabulary,
Pro Spd
.49 100.5
.46 with Word Definitions, and .43 with Verbal Similarities. I
Sch Rdn .72
might anticipate that a divergent Comprehension score on the
.80 .62
WISC-IV could cause a notable difference between the Verbal
mean 101.2 106.2 100.5 100.0 103.1
Comprehension Index and the Verbal Ability Cluster. The
difference between 2-1-0 and 1-0 scoring could separate WISC-IV and DAS-II verbal scores for some children. Also, some
children probably profit from the three words on the DAS-II Verbal Similarities, because they can pass an item even if they do
not know one of the words, but other children may be confused by so many
The WISC-IV Working Memory Index appears to be roughly equal
parts simple short-term memory (Digits Forward and easier LNS items) and
working memory (Digits Backward and higher LNS items). It correlated .74
SqOrd .43
with the DAS-II Working Memory, which includes mostly working memory
tasks: Digits Backward (but not Digits Forward) and Recall of Sequential
The DAS-II Processing Speed includes Rapid Naming as well as the more
traditional Speed of Information Processing. The WISC-IV still uses the paper-andSoIP
pencil Coding and Symbol Search. However, the verbal-response Rapid Naming has
very slightly higher correlations with the paper-and-pencil WISC-IV subtests than does
the DAS-II paper-and-pencil Speed of Information Processing.
Dumont, R. Cruse, C., Price, L. & Whelley, P. (1996). The relationship between the Differential Ability Scales and the WISC-III for students
with learning disabilities. Psychology in the Schools, 33, 203-209.
Elliott, C. D. (2007). Differential Ability Scales 2nd edition introductory and technical handbook. San Antonio, TX: Pearson.
Farrall, M. L. The myth of the WISC-III/WISC-IV Retest: the apples and Oranges effect (retrieved 23 May 2007 from
Flynn, J. R. (1998). IQ gains over time: Toward finding the causes. In U. Neisser (Ed.), The rising curve: Long-term gains in IQ
and related measures (pp. 25-66). Washington: American Psychological Association.