Distinguish Among Reversal, Multiple-Baseline, Multielement, and Changing-Criterion Designs
If you’re training for the BACB exam, writing a research proposal, or trying to pick the right study design for your clinic’s data, you’ve likely hit a wall with single-case experimental designs. The names sound similar. The logic overlaps in ways that aren’t always obvious. And the stakes are real—choose the wrong design, and you won’t be able to claim your intervention actually caused the change you’re measuring.
This post cuts through the confusion by explaining four core single-case designs: reversal (ABAB), multiple-baseline, multielement (alternating-treatments), and changing-criterion. By the end, you’ll understand how each one demonstrates experimental control, when to use it in practice, and—just as importantly—when not to use it.
One-Paragraph Summary
Single-case experimental designs let you prove that your intervention caused a change in one person’s behavior by systematically changing conditions and watching what happens. Reversal designs (also called ABAB) show control by withdrawing treatment and watching behavior revert, then reintroducing treatment to see the effect return. Multiple-baseline designs stagger the start of the same intervention across different people, behaviors, or settings—no withdrawal needed—and prove control happens only when the intervention arrives. Multielement designs (alternating-treatments) rapidly switch between two or more conditions to compare them side-by-side without withdrawal. Changing-criterion designs keep the intervention steady but gradually shift the performance target upward in steps, showing control when behavior tracks each new goal.
Each design shines in different situations. Reversal is strongest but risky when you can’t safely withdraw treatment. Multiple-baseline works when withdrawal is unsafe or behavior is irreversible. Multielement is fast for comparing two approaches. Changing-criterion fits gradual shaping. Your choice affects not just your data—it affects client safety, how long intervention takes, and whether you’ll have solid proof that your program works.
Clear Explanation of the Topic
What Is a Single-Case Experimental Design?
A single-case design is a research approach where you focus intensely on one person (or one small group) and measure their behavior repeatedly across different conditions. Instead of comparing 30 people in a treatment group to 30 in a control group, you take one learner and change what’s happening around them while tracking how they respond. The goal is to show a functional relation—proof that the intervention caused the change, not luck or other factors.
To show a functional relation, you need replication. You have to see the same pattern happen more than once. Maybe you introduce the intervention, see improvement, withdraw it to watch behavior slip backward, then reintroduce it to see improvement return. Or maybe you introduce the same intervention to three different people at three different times and see improvement each time. When the pattern repeats, you’ve got stronger evidence that the intervention made the difference.
Reversal Design (ABAB)
A reversal design is the most straightforward. You start with a baseline (A) where you measure behavior without intervention. You’re essentially asking, “What does this behavior look like on its own?” Then you introduce your intervention (B) and measure again. If behavior changes when the intervention arrives, that’s promising—but not proof yet. The change could be coincidence, or something else might have shifted.
That’s why reversal designs include a crucial step: you withdraw the intervention and return to baseline conditions (A again). If the behavior was truly caused by your intervention, it should revert toward baseline levels. Then you reintroduce the intervention (B) a second time. When behavior improves again, you’ve seen the pattern twice—baseline → improvement → reversion → improvement. That repetition clinches the argument that your intervention caused the change.
The strength of reversal designs is their internal validity—the confidence that the intervention, not something else, made the difference. The challenge is that they’re ethically risky. If your intervention prevents self-injury, reduces aggression, or teaches a critical safety skill, withdrawing it can be harmful. Withdrawal also doesn’t make sense for behaviors that don’t revert—once a child learns to tie shoes, removing the intervention doesn’t unbind those shoes. Reversal designs work best for behaviors that are reversible and safe to withdraw.
Multiple-Baseline Design
A multiple-baseline design solves the ethical problem of withdrawal. Instead of introducing and removing an intervention with one person, you introduce the same intervention at different times across different people, behaviors, or settings. You stagger when the intervention starts, so you can see the effect happen multiple times without ever taking the help away.
Here’s a concrete picture. You might track three students’ on-task behavior. All three start in baseline (no intervention). After collecting stable baseline data on Student 1, you introduce your intervention to Student 1 only. Students 2 and 3 stay in baseline. You watch Student 1 improve while Students 2 and 3 stay unchanged. Then you introduce the intervention to Student 2, while Student 3 remains in baseline. Student 2 improves while Student 3 stays the same. Finally, you introduce the intervention to Student 3, and they improve too. The staggered timing means the effect happens multiple times, proving the intervention matters—not time passing or seasonal changes.
You can also use multiple-baseline designs across different behaviors or across different settings. The logic is identical: if behavior only improves when and where you introduce the intervention, you’ve got solid proof of a functional relation.
The main requirement is that your baselines need to be stable or predictable before you introduce the intervention. If baseline data is wild and scattered, you won’t see a clean effect when the intervention starts. You also need to make sure the tiers are independent—if teaching Student 1 accidentally teaches Student 2 in the same classroom, that tier independence breaks down, and your logic falls apart.
Multielement (Alternating-Treatments) Design
A multielement design, also called an alternating-treatments design, is your tool when you want to compare two interventions quickly without withdrawal. Instead of running one intervention for weeks and then switching, you alternate between them—sometimes daily, sometimes within the same session. You might use token economy on Mondays and Wednesdays and a different reward system on Tuesdays and Thursdays, then see which produces higher on-task behavior.
The logic here is different from reversal and multiple-baseline. You’re not trying to show that treatment is better than nothing. You’re answering, “Which of these two approaches works better for this learner?” Multielement designs are efficient—you can answer that question quickly without long baseline periods.
For a multielement design to work cleanly, the two conditions need to be clearly distinct so the learner knows which approach is in effect. If both conditions look the same, the learner will get confused and results will be murky. That’s why you use discriminative stimuli (cues)—maybe a red card means token economy and a blue card means the alternative system.
The catch is carryover. If one approach’s effects linger into the next condition, your comparison gets contaminated. Say you use a high-energy reward on Monday. The learner might still be excited Tuesday even though you’re using a lower-energy reward. That leftover excitement makes the second approach look more effective than it actually is. Good practice is to use enough time between conditions, strong discriminative cues, and sometimes counterbalancing to reduce carryover.
Changing-Criterion Design
A changing-criterion design is perfect for shaping behavior through gradual steps. You pick an intervention and stick with it, but you change the goal (the criterion for reinforcement) in stepwise increments. A student might start by earning a reward for 5 math problems completed. Once they hit that reliably, the criterion shifts to 8 problems. Then 12. Then 15. The intervention stays the same. What changes is the target.
Control comes from watching behavior track the criterion changes. When the goal shifts from 5 problems to 8, you should see the student improve toward 8. When it shifts to 12, they aim for 12. If behavior consistently matches the new target, you’ve shown a functional relation—the criterion shift caused the behavior change.
For this design to work, you need realistic step sizes. If you jump from 5 problems to 50, the student will fail and get discouraged. Steps should be achievable but challenging. You also want each subphase to be long enough for behavior to stabilize before changing the target.
Changing-criterion designs don’t work for teaching entirely new skills. They’re built for behaviors already in a learner’s repertoire—you’re just adjusting frequency, duration, or precision. They also work well when you want gradual change rather than a sudden flip.
Why This Matters
Picking the right design isn’t an academic exercise. It determines whether you can honestly say your intervention caused the improvement. It affects how long your learner waits for effective treatment. It shapes what’s safe and ethical to do. And it changes how confident you can be when presenting your data to a parent, a school, or a referral source.
If you use a reversal design when withdrawal would be harmful, you expose your learner to unnecessary risk—that’s not acceptable. If you use a multiple-baseline design without staggering the starts enough, your baselines won’t be independent and your replication won’t be credible. If you use a multielement design without clear condition labels, carryover will blur your results. If you use changing-criterion to teach a brand-new skill, you’ll be frustrated because the learner has nothing to build on.
Misunderstandings about these designs also lead clinicians to misinterpret their own data. A clinician might run what looks like a reversal but pull the treatment back only partway, muddying the withdrawal phase and weakening the claim of control. Another might assume a multiple-baseline design is weaker than a reversal, when both can be equally rigorous if designed well. Someone else might introduce an intervention and, two weeks later when behavior improves, think they’ve proven the intervention works—without realizing that time alone could be the cause.
Key Features and Defining Characteristics
Reversal Design
Structure: Baseline (A) → Intervention (B) → Return to Baseline (A) → Reintroduction of Intervention (B).
The minimum is one full cycle: A-B-A. Many studies use A-B-A-B to strengthen replication. Some use longer patterns.
How control is shown: When behavior shifts along with the intervention—improving in B, reverting in A, improving again in B—the pattern suggests the intervention caused the change. The replication rules out coincidence.
When it’s appropriate: Behaviors that safely revert when support is removed, no harm from withdrawal, and you have time for multiple phases.
When it’s problematic: Irreversible learning, dangerous behaviors, and situations where withdrawing effective treatment violates ethics or policy.
Multiple-Baseline Design
Structure: Two or more tiers (participants, behaviors, or settings) in concurrent or staggered baselines. Intervention introduced to Tier 1, then Tier 2, then Tier 3—while baselines of later tiers continue. No withdrawal.
How control is shown: The intervention effect appears only when the intervention is introduced to each tier. If behavior improves only when the intervention reaches each tier (not before), the timing of improvement replicates the effect without withdrawal.
When it’s appropriate: When withdrawal is unsafe, unethical, or impractical. When change is expected to be permanent. When you have access to multiple tiers that can be staggered.
When it’s problematic: Unstable baselines, very few tiers, or tiers that aren’t independent.
Multielement (Alternating-Treatments) Design
Structure: Rapid alternation between two or more conditions, with clear discriminative cues for each.
How control is shown: Differential responding under different conditions. If Condition A consistently produces better behavior than Condition B, the contrast supports a functional relation.
When it’s appropriate: Quick comparisons between treatments. When withdrawal isn’t an issue. When you want an answer fast and both conditions are viable.
When it’s problematic: Carryover between conditions, poor discriminability, or high baseline variability.
Changing-Criterion Design
Structure: Baseline → Intervention with Criterion 1 → Criterion 2 → Criterion 3 (and so on). The intervention itself doesn’t change; the performance target does.
How control is shown: Behavior tracks each new criterion. When the goal shifts, behavior follows. Consistent tracking across multiple criterion changes shows a functional relation.
When it’s appropriate: Shaping goals. Stepwise movement toward a long-term target. When the behavior is already partly in the learner’s repertoire.
When it’s problematic: Teaching entirely new behaviors, expecting sudden changes, or settings where criterion adjustments aren’t practical.
When You Would Use This in Practice
Decision Points: How to Choose
Start by asking these questions:
Is withdrawal ethical and safe? If yes and the behavior is reversible, a reversal design is strong. If no, consider multiple-baseline or changing-criterion.
Do you have multiple tiers you can stagger? If yes and withdrawal is undesirable, multiple-baseline is excellent. If no, consider reversal or multielement.
Do you want to compare two interventions side-by-side? If yes and you need a fast answer, multielement is efficient.
Is the goal gradual shaping toward a target? If yes, changing-criterion is a natural fit. If the behavior needs to emerge suddenly or you’re teaching something new, it’s not ideal.
Is baseline behavior volatile or hard to predict? If yes, multielement might be faster. If baseline is stable, you have more options.
Real Clinical Scenarios
Scenario 1: Teaching a new skill where you can safely withdraw prompts. A child is learning to request breaks using AAC. You establish baseline, introduce full verbal prompts and model, see requests increase, then gradually fade prompts to see if the behavior holds. This could be a reversal design if you fully remove supports at some point, though you’d need to be confident the child won’t suffer from temporary loss of the skill.
Scenario 2: Reducing self-injurious hand-biting where withdrawal would be harmful. A multiple-baseline design makes sense. You might track biting in three contexts (sensory room, classroom, home), introduce a sensory-heavy intervention in the sensory room first, then classroom, then home—each context improves only after the intervention arrives.
Scenario 3: Comparing two reinforcer types to see which motivates a learner more. You could alternate between access to Toy A and Toy B within the same session. Rapid alternation with clear cues shows which toy maintains higher engagement.
Scenario 4: Increasing homework completion over several weeks. Week 1: student earns a reward for 5 problems. Week 2: criterion shifts to 8 problems. Week 3: 12 problems. If the student reliably hits each target, behavior is tracking the criterion, and you’ve demonstrated control without ever removing your intervention.
Examples in ABA
Example 1: Reversal Design for Reducing Self-Injury
A child’s hand-biting behavior is high during baseline (average 8 bites per 10-minute session). You introduce a sensory-based intervention (fidget tools, textured objects, specific hand movements) in Phase B. Over two weeks, biting drops to an average of 2 per session. You then withdraw the intervention for one week (return to A), and biting climbs back to 7 per session. You reintroduce the sensory intervention in Phase B again, and biting drops to 1-2 per session within days.
Why this works: The behavior shifts with the intervention three separate times, creating a clear replication pattern. The timing is tight—change happens quickly after each phase shift—and the direction is consistent.
Safety note: Before running this design, the team documented that temporary withdrawal wouldn’t create safety risks, obtained parental consent explaining the withdrawal, and had a plan to reintroduce the intervention if biting spiked beyond a pre-set threshold.
Example 2: Multiple-Baseline Across Settings for Increasing On-Task Behavior
A teacher wants to reduce off-task behavior across three classroom environments (Math, Reading, Specials). All three settings start in baseline. After Week 1, the intervention (a visual timer and specific praise for on-task behavior) is introduced in Math only. Math improves while Reading and Specials stay unchanged. In Week 3, the intervention is introduced to Reading; Reading improves while Specials stays unchanged. In Week 5, the intervention reaches Specials, and it improves too.
Why this works: The effect appears three times, each aligned with when the intervention is introduced to a new setting. No withdrawal happens. The staggered timing proves that something about introducing the program—not time passing—caused the improvement.
Stability note: The baselines needed to be stable enough that the team could clearly see the change when the intervention started.
Example 3: Multielement Design for Reinforcer Preference
A therapist wants to know which toy a learner prefers: a spinning top or stacking rings. Each day, the therapist alternates conditions. On Top Days (blue card visible), the learner works for access to the spinning top. On Ring Days (yellow card visible), the learner works for access to the stacking rings. Over two weeks, the learner consistently completes more trials on Top Days and maintains engagement longer.
Why this works: The learner quickly “votes” with their behavior. No long baseline period is needed. The clear cues help the learner distinguish which condition is active.
Carryover risk: If the learner gets overstimulated by the spinning top on a Top Day, they might be tired and less engaged the next Ring Day. The team would watch for patterns and adjust spacing if carryover appears.
Example 4: Changing-Criterion Design for Increasing Reading Fluency
A student reads 15 words per minute in baseline. The intervention—small-group fluency practice with targeted error correction—is introduced, and the first criterion is set at 18 words per minute. The student reaches that by week 2. The criterion shifts to 22 words per minute; the student reaches it by week 4. Then 26, then 30. Each time the target changes, the student’s performance tracks upward to meet it.
Why this works: Behavior consistently matches the new criterion. The intervention stays the same, but the goal shifts and behavior follows. Three or more criterion changes provide replication.
Realistic steps: The steps (3-4 words per minute) are challenging but achievable. If steps were too large, the student would likely fail and disengage.
Examples Outside of ABA
Multiple-Baseline Across Departments (Business Software)
A tech company rolls out a new project management system to reduce time-to-completion for team projects. Engineering adopts it in Month 1 (baseline: 6-week average; with system: 4 weeks). Marketing stays in baseline. In Month 3, Marketing adopts the system and improves to 5 weeks. In Month 5, Operations adopts it and improves to 3.5 weeks. The staggered rollout shows the system causes efficiency gains without pulling it back.
Changing-Criterion Design in Fitness
A personal trainer helps a client build endurance. Week 1, the goal is 20 minutes on the treadmill. Week 3, the goal shifts to 25 minutes. Week 5, it’s 30 minutes. The intervention (same workouts, same coaching) stays constant, but the duration target steps upward. The client’s behavior tracks each new criterion, demonstrating a functional relation.
Common Mistakes and Misconceptions
Reversal Design Misuses
A clinician reduces aggression using a new behavior plan, sees improvement, then removes the plan to “prove” it works. The aggression returns—but this is ethically risky if aggression is dangerous. A safer choice would be a multiple-baseline across settings.
Another pitfall: assuming that because reversal is strong, it’s always possible. It’s not. Once a child learns to read, you can’t unread them. Once self-injury is reduced through skill-building, removing the new skill doesn’t reverse progress meaningfully.
Multiple-Baseline Pitfalls
Unstable baselines. If one student’s off-task behavior is stable at 30% and another’s ranges from 20% to 60%, it’s hard to see clear improvement when the intervention hits the erratic baseline. Collect baseline data long enough to confirm stability.
Poorly staggered starts. If all three students get the intervention within a week, you lose the staggered replication logic—any improvement might be due to external factors rather than the intervention’s timing.
Too few tiers. A multiple-baseline with only two participants or two behaviors is thin. Most researchers and supervisors prefer at least three tiers for solid replication.
Multielement Pitfalls
Unclear discriminative stimuli. If the learner doesn’t know which condition is active, they can’t respond properly. Conditions need to look, sound, or feel different.
Insufficient spacing. If you alternate conditions too quickly without any gap, carryover and fatigue can confound results.
High baseline variability. If baseline is all over the place, spotting differences between conditions becomes harder.
Changing-Criterion Pitfalls
Step sizes that are too large or too small. If steps are too big, the learner fails. If too small, it’s unclear the criterion change caused improvement.
Overly short subphases. If you change the criterion before behavior stabilizes, you won’t know if the new criterion pulled the improvement. Aim for at least two stable data points at each level before shifting.
Using it for entirely new behaviors. Changing-criterion shines when the behavior is emerging—you have a skill you can refine. It’s not a tool for skills outside the learner’s current ability.
Ethical Considerations
When Reversal Isn’t Appropriate
If withdrawal worsens behavior, causes harm, or violates your agency’s ethical policies, reversal is off the table. Self-injurious behavior, dangerous behavior, or skills meant to last shouldn’t be withdrawn just to prove control.
Informed consent is non-negotiable. Any time you plan to withdraw or cycle an intervention, caregivers and the learner (if able) must understand why, what it means, and what safeguards are in place.
Universal Safeguards Across All Designs
Document your rationale. Why did you choose this design over others? What alternatives did you consider?
Monitor for adverse effects. If behavior deteriorates, distress appears, or safety becomes a concern, pause the design and consult your supervisor.
Prioritize client dignity and safety above research elegance. A messier design that keeps someone safe is always better than a “perfect” design that risks harm.
Team communication. Supervisors, caregivers, and staff involved should understand the design before it starts.
Regulatory awareness. Some agencies, schools, or insurance policies have rules about what kinds of phase changes are allowed. Know your environment’s guidelines.
Practice Questions
Question 1
Scenario: A therapist evaluates a token economy by running baseline, implementing tokens, withdrawing tokens to see if behavior returns, and then reintroducing tokens.
Which design is this? Reversal design.
Why: It uses withdrawal and reintroduction (A-B-A-B structure) to show replication. When behavior changes with the presence and absence of tokens, control is demonstrated.
Why the others don’t fit: Multiple-baseline doesn’t withdraw treatment. Multielement alternates rapid conditions. Changing-criterion adjusts targets, not withdraws.
Question 2
Scenario: A teacher wants to increase on-task behavior across three classrooms. She introduces the same intervention at different times in each classroom and doesn’t remove it.
Which design is this? Multiple-baseline design.
Why: Staggered introduction across tiers without withdrawal. Control is shown by the timing of improvement matching the timing of intervention introduction.
Why the others don’t fit: Reversal requires withdrawal. Multielement alternates rapidly within the same setting. Changing-criterion shifts performance targets.
Question 3
Scenario: Two teaching methods are alternated daily to see which produces higher accuracy in the same learner during the same session.
Which design is this? Multielement (alternating-treatments) design.
Why: Two conditions are rapidly alternated with discriminable cues. The comparison happens within the same individual without withdrawal.
Why the others don’t fit: Reversal requires withdrawal. Multiple-baseline staggers across tiers. Changing-criterion adjusts goals, not conditions.
Question 4
Scenario: A student’s weekly reading goal increases by small increments (22 → 25 → 28 words per minute) while the same intervention continues. Behavior tracks each new goal.
Which design is this? Changing-criterion design.
Why: The intervention is constant; the performance criterion changes stepwise, and behavior follows—demonstrating control without withdrawal.
Why the others don’t fit: Reversal withdraws treatment. Multiple-baseline staggers starts across tiers. Multielement alternates conditions.
Question 5
Scenario: A behavior is dangerous and irreversible (learning acquired isn’t expected to be unlearned). Which design is least appropriate?
Which design is least appropriate? Reversal design.
Why: Reversal requires withdrawing effective treatment. For dangerous or irreversible behaviors, withdrawal is unsafe and unethical.
Why the others fit better: Multiple-baseline avoids withdrawal entirely. Multielement compares treatments without withdrawal. Changing-criterion works without withdrawal.
Related Concepts
Single-case experimental design is the umbrella family that includes all four designs. Each one is a tool within that toolbox.
Baseline logic underpins all single-case designs. By establishing what behavior looks like without intervention, you create a comparison standard for measuring change.
Replication (within-study) is how each design proves control. Reversal replicates by withdrawing and reinstating. Multiple-baseline replicates by staggering starts. Multielement replicates by showing consistent differential responding. Changing-criterion replicates by behavior tracking multiple criterion shifts.
Carryover effects are threats to validity, especially in multielement designs. When one condition’s effects linger, your comparison gets muddied.
Visual analysis is the skill of reading graphs to evaluate level, trend, variability, immediacy, and overlap. These elements help you judge whether experimental control is demonstrated.
Social validity asks whether your procedures and results matter and are acceptable to the people involved. Withdrawing an effective treatment might have internal validity but poor social validity.
Multiple-probe design is a related concept: instead of running full baselines on all tiers, you probe periodically. This is a practical variant when full baselines are impractical.
Frequently Asked Questions
When should I avoid a reversal design?
Avoid reversal if withdrawing treatment would be harmful, unethical, or impractical. Self-injurious behavior, aggression, safety-critical skills, and irreversible learning are poor candidates. Consult your supervisor and caregivers, review your agency’s policies, and consider multiple-baseline or changing-criterion as safer alternatives.
How do I decide between multielement and multiple-baseline?
Use multielement if you want a fast answer about which of two conditions works better, if carryover is minimal, and if discriminability is high. Use multiple-baseline if withdrawal is undesirable, behavior change is expected to be irreversible, or you have time for staggered starts.
Can I combine designs in one study?
Yes. Researchers sometimes embed a multielement comparison within a multiple-baseline structure. The logic becomes more complex, so document it clearly and ensure replication remains strong.
How long should baselines be in a multiple-baseline design?
Baselines should be stable or show a predictable trend before introducing the intervention. Stability is often judged by the last 3–5 data points clustering within a reasonably narrow range. The exact length depends on the data.
What are carryover effects and why do they matter?
Carryover occurs when the effects of one condition persist into the next, contaminating your comparison. It undermines internal validity. Reduce it by using strong discriminative cues, adequate time between conditions, and counterbalancing condition order.
Is changing-criterion appropriate for sudden behavior changes?
No. Changing-criterion assumes behavior will gradually shift with each criterion change. It’s built for shaping and incremental progress. If you expect a sudden flip, reversal or multielement would fit better. If the behavior is entirely new, changing-criterion won’t work—there’s no existing skill to refine.
Key Takeaways
The four single-case designs each solve a different problem. Reversal demonstrates control through withdrawal and replication, offering the strongest logic when safe and ethical. Multiple-baseline avoids withdrawal by staggering intervention starts, making it safer for irreversible or dangerous behaviors. Multielement compares treatments quickly within the same individual, ideal for preference or efficiency questions. Changing-criterion shapes behavior through stepwise targets without withdrawal, perfect for gradual progress toward a goal.
Your choice isn’t just about research elegance. It affects how long your learner waits for help, what risks you take, and whether you can defend your data with confidence. Always weigh experimental rigor against client safety, ethical responsibility, and practical feasibility. When you’re unsure, consult your supervisor, involve caregivers in the decision, and document your reasoning. That combination of scientific rigor and human-centered care is what makes ABA trustworthy and effective.



