In a previous post I suggested curriculum-based measurement would be a beneficial tool for math teachers. However, I didn’t want to give it short shrift in the previous post because it deserves so much more! That is why we are here today. Big takeaway
Crap Data leads to Crap Decision
Below is an advance organizer for the structure of the post.
From the Beginning: CBM, CBE, CBA,
CBD(oops not that)Framework for thinking of CBM research evidence and their use
Tidbits for implementation (What does all this jawn mean? - don’t forget Art)
The Origin Story
We are going to start here in 1991 with a quote from Lindsley…and then go back…and then go forward.
Now, after collecting tens of thousands of records of hundreds of different behaviors, I am convinced that frequency is much more than Skinner's universal datum. I am convinced that frequency is actually a dimension of behavior. When you change the frequency, you have changed the behavior. Just as frequency is a dimension of light, sound, and electricity, frequency is a dimension of behavior. Frequency should not be considered a mere measure of behavior, it is a dimension of behavior. You have not accurately described a behavior until you have stated its frequency.
This quote unpacks a couple of important considerations for our origin story of curriculum-based measurement for current-day practice.
Curriculum-based measurement can trace its use of rate of responding back to the field of behavioral psychology and BF Skinner (yeah, the rat and pigeon guy). In fact, Skinner suggested the identification of how useful rate of responding was to understanding behavior was one of his greatest accomplishments.
Precision teaching adopted rate of responding and applied this to the application of teaching children academic skills (I can hear the squeal of delight from Olivia Enders and Rick Kubina). Through precision teaching1, early researchers identified useful instructional tactics and measurement tools that did a phenomenal job of evaluating a student’s likelihood to Retain, Endure, Apply, and Perform to current standards (REAPS; Haughton, 1984).
As education has continually engaged in the never-ending pendulum swing of pitting two frameworks, “back to basics” versus “reform-based” against one another - some instead focused on evaluating student responsiveness to the school curriculum and identifying measurement tools that could be used to carefully evaluate student learning. The major impetus was a critical focus on instructional responsiveness paired with tools that produce psychometrically sound data can allow educators to optimize their instruction to ensure all students are progressing. In the words of Sigfried Engelmann:
Well, yeah, from the beginning, that was our motto, and it offended a lot of traditional educators. But it was: If the learner hasn’t learned, the teacher hasn’t taught, and that it’s not a question of the learner’s ability, it’s a question of the teacher’s ability. These kids are capable of learning, certainly at different rates, but learning anything we want to teach them.
Jump to 1985
Exceptional Children (a flagship journal in the field of special education) published a special issue led by Dr. James Tucker focused on alternatives to standardized testing.
Curriculum-Based Instructional Assessment (Gickling & Thompson) was introduced. It focuses on instruction and how to use data from student performance on curriculum tasks to better match instruction. It offers a solution to challenges with relying on large-scale standardized assessments. Large-scale standardized assessments do not provide actionable feedback to teachers. They can only be administered so frequently because they can be anxious inducing to children and steal instructional time. Thus, teachers have limited tools available to guide them in the effectiveness of their instruction on a day to day basis. Furthermore, large-scale standardized assessments do not always align perfectly with the actual instructional materials and curriculum teachers are using. This mismatch makes it difficult to know if teachers are being effective because the measurement tool doesn’t align to the instruction occurring in the classroom. Another challenge was the mismatch of the curriculum difficulty with where a child was currently situated with their knowledge. Thus, the authors propose teachers carefully consider (a) the type of task, (b) the number of items in the task, and (c) the student's performance of the task. The careful consideration of student performance on a skill-by-skill consideration instead of “grade level” consideration is important. For example, stating “the child currently is at a third grade instructional level in math” is a pointless statement. There are SO many skills to focus on and the authors of this article aimed to highlight the need to focus on skill-by-skill analysis by using curriculum materials available to teachers to then determine future instruction. This framework of thinking about identifying what is frustrational, instructional, or independent (mastery) is not new and can be traced back to Betts and proposed later as the instructional hierarchy (read more here on history). Yet, this was an influential article unpacking how teachers can use curriculum materials and assessment tools to best match instruction to student current knowledge.
Curriculum-Based Measurement (Deno) was introduced. It focuses on measurement with a careful consideration to ensure the tools produce psychometrically sound data for decision-making. I don’t want to rehash above - but the article highlights limitations to using large-scale standardized assessments as the only measurement tool. The article highlighted teachers preferred “snapshots” of student performance to guide future instruction through observation. However, research at that time found teacher observation and then teacher judgment of whether students met objectives was not accurate (still a concern). Deno laid out a clear analysis for evaluating the usefulness of curriculum-based measurement
reliable and valid if the results of their use were to be accepted as evidence regarding student achievement and the basis for making instructional decisions.
Simple and efficient if teachers were going to use them, or teach others to use them, to frequently monitor student achievement.
Easily understood so that the results could be clearly and correctly communicated to parents, teachers, and students.
Inexpensive since multiple forms were to be required for repeated measurement.
A major emphasis was the need for useful data to inform instruction on a daily basis. Large-scale standardized assessments are inappropriate for this purpose. Relying on teacher observations to inform this decision-making process is unreliable. Thus, creating a set of tools for teachers to use that align to curriculum, are quick and easy to administer, easy to interpret, and inexpensive would be beneficial.
Framework for Thinking about Curriculum-Based Measurement Research
Lynn Fuchs, who was influenced by Deno, laid out a framework for thinking about CBM research.
Technical adequacy of scores at a static time point. This involves evaluating the reliability, validity, sensitivity, specificity of scores from a CBM task at a specific time point. Within a multi-tiered systems of support framework (MTSS; your district might call this response to intervention [RTI]) this provides evidence for the CBM tool to provide useful information for screening based decision. For example, if you administer a CBM task can you ensure:
it provides consistent data across each individual
whoever “scores” the task will reach the same score
the items included on the tool reflect content that is important for the overall domain
the scores correlate with other measures that are “good” measurement tools for the content domain (e.g., standardized tools, end-of-year state assessments)
It correctly identifies students at-risk
It correctly identifies students not at-risk
Quick aside. This is VERY damn difficult to do! So to pretend a teacher observing student behavior on the fly, district personnel creating a “benchmark” instrument on the fly, or to pull something off of teachers pay teachers to adequately achieve this threshold is honestly foolish.
Technical features of slopes. This involves investigating whether the “slope” of progress (e.g., week to week growth) is correlated with student progress in the broader academic domain. Within a multi-tiered systems of support framework (MTSS) this provides evidence for the CBM tool to provide useful information for progress monitoring decision. For example, if you administer a CBM task each week can you ensure:
week to week student “growth” is actually a reflection of learning and not just measurement error (i.e., variations in difficulty of the individual assessment tool)
is the week to week growth consistent across time or is there a lot of variability because of measurement error (i.e., variations in difficulty of the individual assessment tool)
is the measurement tool sensitive on a week to week basis to identify new learning that is occurring (i.e., does the score increase if a student learned new knowledge)
Quick aside. This is one of the challenging things for math. There are so many skills embedded within a grade-level curriculum so it is difficult to find one specific tool to use for the entire year that is sensitive to student learning week to week! This is an argument we laid out in an article I co-authored (I am very sorry to refer to an article I co-wrote, I know that can be annoying but it is relevant here).
Instructional Utility. This involves investigating if teachers can actually administer the CBM, score the CBM, interpret the CBM data, and then influence their instruction and observe student gains. This is CLEARLY getting at the consequential validity of using CBM scores to enhance student knowledge. A more recent review unpacks the literature on this specific aspect of CBM research.
Tidbits for Implementation
Art Dowdy would be asking me, “Yo, what does all this jawn mean?”
The critical aspects I teach in my undergraduate assessment class and my graduate course on single-case research design (yo #ScrdChat) can be categorized into the following domains:
CBM selection
Fidelity of administration
Fidelity of scoring
Standardization of environment
Data-literacy
CBM Selection
My friend Brian Poncy likes to remind me (and apparently Ryan Farmer as well), “You can’t do anything unless you understand your dependent variable.” Thus, the selection of the CBM task must be aligned with what you are currently teaching! For example, if I select a math CBM that is focused on concepts and applications for the current grade level for a student (it likely has sampled items reflected for the entire grade level) but I am actually providing intervention for the student performing below that grade level content then it will not be sensitive. The student can be learning on a week to week basis a bunch of math content but their performance on that specific measurement tool may not reflect this because it is not aligned to what I am teaching and thus will not be sensitive to their learning. We need to be acutely attune to what items or tasks are included on the curriculum-based measure we select to ensure it aligns to the current instruction we are providing and will be sensitive to identify learning that occurs.
Fidelity of Administration
The properties of reliability and validity are for SCORES we collect and not the instrument. I cannot ASSume (see what I did there?) the CBM is valid/reliable those properties are reserved for the scores. We must ensure the protocol for administering the CBM is adhered to - this involves training personnel, practicing by the personnel, and then observation/feedback on correct/incorrect administration. For example, if the time limit is 2-min for a CBM task and the administrator gives 4-min to complete the task then ALL your data will become problematic to interpret. Another quick anecdote - I was leading a research study and paraeducators were tasked with reading aloud math word problems to students (i.e., we did this to hopefully override issues with decoding and reading fluency for students in the intervention). Unfortunately for me in the research but great for practice, some of the paraeducators wanted to be helpful by emphasizing specific words, rewording specific aspects of the word problem to help with understanding, etc etc. These deviations we corrected ASAP for the research project to protect data credibility. But this highlights a huge point for CBM data - deviations in administration could lead to differential student scores which is not reflecting knowledge but rather level of support/help. Sticking to the standardized protocol is critical to ensure we are only measuring student knowledge that is growing and not fluctuation in administration.
Fidelity of Scoring
I scored the CBM as 35 digits correct per minute and my friend scored it as 25 digits correct per minute. Uh oh! This is a critical problem because now student score fluctuation is a bi-product of scoring and not their true knowledge. Similarly to the administration protocol above, it is critical to train personnel on scoring protocol. One suggestion is to save a permanent product (e.g., worksheet, audio recording for some CBM tasks) and to practice scoring together to enhance agreement. Furthermore, occasionally checking in by double scoring student CBM tasks across time can prevent observer drift (i.e., person scoring changing across time).
Standardization of Environment
I have the following data collection schedule
Week 1: Monday morning, first thing in the morning
Week 2: Wednesday right after lunch
Week 3: Thursday right before dismissal
Week 4: Wednesday right before lunch
Week 5: Friday right after math (a substitute was in for the general education teacher this day)
I hope I made this clear - the time of day is varying along with the day of the week. Furthermore, I could have added more variation such as collecting data in the general education setting whole class versus sometimes pulling the student to do it individually. The key part here is each time we alter environmental variables we *might* be impacting the student performance. This then influences the score we collect and impacts the confidence we have in interpreting if the change in their performance is reflective of learning from instruction or perhaps a bi-product of us deviating the environment in which we collected the data. Standardizing the environment (i.e., day of week, time of day, group setting, etc) each time we collect data will hopefully minimize the impact external variables may be influencing scores so we can trust variations are due to learning and not something else.
Data-Literacy
I do not usually do this but this book is freaking phenomenal and I use it in my undergraduate assessment course. It is a *little* outdated now on some of the more updated research on specific CBM tasks for each content domain and their psychometric properties. But it does such a nice job discussing how to organize time-series data collected from CBMs, how to create a graph to analyze, and how to make decisions based on student responsiveness. I will also refer readers to this other free resource on using CBMs for progress monitoring and how to engage in analysis of time-series data.
Why is this important? Prior research (here is one sample from a special issue) found that pre-service and in-service teachers had difficulty interpreting progress monitoring data from CBMs to make informed decisions. The critical aspects here are to (a) have a clear and defensible decision for selecting an end of unit/semester/year goal and (b) have clear decision rules for when you would make a decision to adjust instruction because growth is not sufficient.
Conclusion
Within math we have a lot of CBM tasks with research supporting (see Nelson et al.) their use across Phase 1, 2, and 3 of the framework Lynn Fuchs laid out (see above). The purpose of CBMs are to (a) align to curriculum; (b) provide usable information for teachers immediately; (c) be quick and easy to administer, score, and interpret; and (d) to demonstrate strong psychometric properties to enhance our confidence in using them for instructional decision-making.
In my next post I will discuss the specific math CBM tasks available to teachers and why/how they may consider using them as part of their instructional environment.
Precision teaching is not the focus of this post but interested readers can check out this systematic review evaluating promising effects. The Great Falls Project is one of the most cited historical demonstrations of its applicability in schools.
Looking forward to that next post, Corey!
Thanks for this one.
JohnL
Happily anticipating them.
When I was a grad student at Oregon, I was a few steps behind a host of really admirable peeps who were on the road about precision measurement of academic, behavioral, and "inner" behaviors.
I was so privileged!