Comparing Human and Animated Video Modeling for Teaching Conversation Skills
Video modeling is a widely used teaching strategy in ABA, but clinicians often wonder which format works best. This study directly compares human video models to animated versions for teaching conversation skills—including facial expressions and body language—to children with autism. The findings offer practical guidance for personalizing video-based instruction in your clinical work.
What Is the Research Question Being Asked and Why Does It Matter?
This study asked a straightforward clinical question: when you use video modeling to teach conversation skills, should the model be a real person or an animated version of that same person?
The target skills went beyond saying the right words. Learners also needed to copy the speaker’s facial expression and body language. In real conversations, both pieces often need to happen together for someone to appear engaged and be understood.
This matters because many clinicians already use human video modeling, but animated videos may be easier to edit, reuse, or feel more engaging to some learners. If animated videos work about as well as human videos for certain learners, that gives teams more flexibility. It also matters because some learners may respond better to one format than the other—so picking the right one could save time and reduce frustration.
What Did the Researchers Do to Answer That Question?
Eight children with autism, ages 5–8, participated in self-contained school classrooms. None were receiving other direct teaching on intraverbals or motor imitation during the study.
The researchers created two short videos for each learner: one human and one animated. The animated video was made by tracing over the human video (rotoscope style), so the model, setting, and actions were as similar as possible across formats.
Each video showed three short social interactions. In each interaction, the model did three things: said a specific answer to a spoken prompt (the intraverbal), made a specific facial expression, and did a specific body movement.
After watching a video (40–60 seconds, repeated three times per session), a teacher presented the same prompts and scored whether the learner matched the exact vocal response, facial expression, and body movement within 5 seconds. Scores could range from 0 to 9 per session (3 interactions × 3 responses).
Half the learners started with animated videos and then switched to human videos; the other half did the opposite. Mastery was set at 6 out of 9 correct for three consecutive sessions. If a learner did not reach mastery after 10 sessions, that condition was stopped.
The researchers also ran generalization probes with a different person after mastery and a maintenance probe about three weeks later, using the same test format without extra prompts or rewards.
How You Can Use This in Your Day-to-Day Clinical Practice
If you’re deciding between human and animated video models, this study supports a practical takeaway: neither format wins for all learners. Expect individual differences.
Some learners learned more from the human video, some learned more from the animated video, and several showed little difference. One learner only met mastery with the human video; another only met mastery with the animated video. Rather than debating which format is best overall, treat video format like any other clinical variable you test and personalize.
Start With a Quick Comparison
Before investing time in a big video library, try a brief comparison. Make one short human video and one short animated video for the same targets, then run a trial period with each.
Keep scoring simple: does the learner produce the skill after watching, and how quickly does it improve across sessions? If one format clearly leads to faster, steadier growth, use it more often. If both look similar, choose whichever is easier for your team to make, update, and use with dignity—for example, a format the learner seems comfortable watching.
Be Careful About What Counts as “Correct”
In this study, answers were only scored correct if they matched the exact word in the video. That’s good for tight measurement, but it can be tricky clinically—some different answers may still be appropriate in real life.
If your real goal is flexible conversation, consider scoring plans that allow a range of acceptable responses. Or plan a second step where you teach variation after the first response is learned. Also think about whether copying the exact facial expression is always a meaningful goal. Sometimes a learner can show interest in other respectful ways.
Consider Adding Reinforcement
The teaching setup in this study did not include programmed reinforcement for correct responding during tests. That means acquisition here may not look like your normal clinical sessions, where you’d usually reinforce correct responses and keep motivation high.
You don’t need to copy their no-consequences approach. If your learner isn’t improving with video modeling alone, it’s reasonable to add reinforcement for correct responding—as long as you do it in a way that supports choice and doesn’t turn the interaction into forced performance. Reinforce participation, attempts, and functional communication (including asking for help or a break), not just perfect matching.
Check Prerequisites
Use this study as a reminder to check the prerequisites that make video modeling work. The learners had some basic imitation skills, but that didn’t guarantee they would learn facial and body imitation from watching a video.
Video modeling often requires the learner to watch, remember, and then respond after a delay. If video modeling isn’t working, assess whether the learner can:
- Attend to a video for the needed time
- Imitate actions after a short delay
- Tolerate the teaching format
If any of those are weak, teach those component skills first or change the format—shorter clips, fewer targets, more breaks, or more active responding.
Keep Targets Socially Meaningful
In this study, targets were chosen partly so they were unlikely to be taught elsewhere, which helps research control but can reduce real-life value.
Clinically, pick intraverbals, facial expressions, and body language that actually help the learner in their daily routines. If the learner never needs the exact phrase, or if the facial expression looks unnatural for them, the teaching may not generalize well and may not support dignity.
Plan for Generalization
The study’s generalization was limited and inconsistent. They checked generalization across people after mastery, but the main teaching and testing happened in one classroom area with scripted prompts.
Once a learner starts responding after videos, quickly practice the same skills with different people, different tones of voice, and in real activities. Start small: one new adult, one new setting, and one natural moment per day where the learner can use the phrase and a comfortable body response.
If you see the skill only shows up right after the video, treat that as a cue to fade the video and move toward real interactions.
Don’t Assume Lack of Progress Means the Learner Can’t Learn From Video
Three of the eight learners did not meet mastery in either condition, and the study stopped conditions after 10 sessions without mastery. For your cases, lack of progress is a signal to adjust—not a final verdict.
Try reducing the response load (teach only the vocal response first, then add body language later), shortening the video, increasing the match between the learner’s current repertoire and the modeled response, or adding prompts and reinforcement.
Also watch for competing behaviors that make certain facial expressions hard, like a learner who smiles throughout. You may need to pick expressions that fit their natural behavior rather than trying to replace it.
Use Video Modeling as a Tool, Not a Full Social Skills Program
This study tested whether learners could copy specific scripted responses after watching a model. It did not show that learners became better conversation partners in free play, made more friends, or felt better in social situations.
When you use video models, tie them to a bigger plan: giving the learner ways to start, keep, and end interactions that meet their own goals—with choices about how they show interest and how much social interaction they want.
Works Cited
Bloh, C., Bacon, L., Begel, B., Madara, K., & Koller, B. (2025). Comparing human video modeling to animated video modeling for learners with autism. The Analysis of Verbal Behavior, 41, 262–279. https://doi.org/10.1007/s40616-025-00224-y



