The AI Teaching Revolution? Speculation vs. Evidence

What Do Five Recent Studies Say About AI's Classroom Impact: A Nuanced Reality Emerges

Dec 02, 2024

As we mark two years since ChatGPT's momentous launch in November 2022, a persistent claim continues to echo through academic circles: that there is "absolutely no evidence" suggesting AI offers any instructional benefit to students. This bold assertion, often made with unwavering confidence, demands careful examination. As an AI researcher who has spent years compiling and analyzing studies exploring AI's impact in educational contexts, I find the reality far more nuanced and compelling than such categorical dismissals suggest.

Thank you for your continued support of Educating AI!!! Please consider becoming a paid subscriber.

We offer group discounts if you are looking for a great way to upskill your department or school!!! DM me!!!

Take advantage of the yearly rate discount. $50 at year = $4.16 per month.

The debate around AI in education has evolved significantly since ChatGPT's debut. What began as speculative discussions about AI's potential has matured into empirical investigations across diverse educational contexts. Today, we have a growing body of research examining AI's effects on learning outcomes, student engagement, and pedagogical practices. These studies employ various methodologies and technologies, yielding results that range from surprisingly positive to appropriately cautionary.

My goal in this analysis is not to declare a definitive answer about AI's educational value, but rather to facilitate a more nuanced dialogue grounded in empirical evidence. By examining a carefully selected sample of recent research, we can better understand both the potential and limitations of AI in educational contexts. This examination becomes particularly crucial as we reflect on two years of unprecedented AI integration in classrooms worldwide.

Key Research Findings

Gregory Kestin*, Kelly Miller*, Anna Klales et al. AI Tutoring Outperforms Active Learning, 14 May 2024, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-4243877/v1].

Doubled learning gains vs. active learning
Effect sizes: 0.73-1.3 SD (p < 10^-8)
Strengths: Robust randomized design, clear metrics
Limitations: Elite institution, short duration

Mark Feng Teng. “‘ChatGPT Is the Companion, Not Enemies”: EFL Learners’ Perceptions and Experiences in Using ChatGPT for Feedback in Writing.” Computersand Education Artificial Intelligence, vol. 7, 1 Dec. 2024, pp. 100270–100270, https://doi.org/10.1016/j.caeai.2024.100270.

Motivation increase: coefficient 1.062 (p < 0.01)
Writing quality improved 38%
Strengths: Mixed-methods approach, semester-long study
Limitations: Small sample size (n=45), self-reported data

Almasri, Firas. “Exploring the Impact of Artificial Intelligence in Teaching and Learning of Science: A Systematic Review of Empirical Research.” Research in Science Education, vol. 54, 27 June 2024, https://doi.org/10.1007/s11165-024-10176-3.

74 studies analyzed
Average performance improvement: 23%
Strengths: Comprehensive scope, PRISMA methodology
Limitations: Publication bias, STEM focus

Thomas, Danielle R., et al. “Improving Student Learning with Hybrid Human-AI Tutoring: A Three-Study Quasi-Experimental Investigation.” ArXiv.org, 18 Mar. 2024, arxiv.org/abs/2312.11274.

Engagement metrics: β = 0.202 (p < .001)
Cost: $597-$1,170 per student annually
Strengths: Multi-site implementation, cost analysis
Limitations: No pure control group, implementation variability

Darwin, D. et al. “Critical Thinking in the AI Era: An Exploration of EFL Students’ Perceptions, Benefits, and Limitations.” Cogent Education, vol. 11, no. 1, 4 Dec. 2023, https://doi.org/10.1080/2331186x.2023.2290342.

Qualitative insights into cognitive development
Mixed student perceptions
Strengths: Deep analysis of cognitive impacts
Limitations: Small sample (n=7), single cultural context

Find a much deeper dive into each source down below…

Conclusion and Future Directions

The collective evidence from our analysis reveals that AI's impact on education is far more nuanced and promising than either critics or advocates typically acknowledge. While the studies consistently demonstrate positive effects on learning outcomes and engagement, the variation in these benefits across different contexts points to a more complex reality that demands careful consideration.

The most compelling evidence emerges from hybrid approaches that thoughtfully combine AI capabilities with human instruction. This synthesis appears to leverage the strengths of both: AI's ability to provide consistent, personalized feedback at scale, and human teachers' capacity for emotional connection, complex judgment, and holistic understanding. The Harvard study's dramatic learning gains, when viewed alongside the more modest but consistent improvements in EFL writing and middle school tutoring, suggest that AI's effectiveness is heavily context-dependent and implementation-sensitive.

Priority Areas for Future Research

1. Longitudinal Studies

The field's most glaring gap is the lack of long-term research. While short-term gains are encouraging, education's true impact unfolds over years, not weeks or months. Tracking learning outcomes over multiple years becomes crucial for understanding:

The durability of AI-enhanced learning gains
How cognitive development patterns evolve with AI assistance
Whether early AI exposure affects later academic performance
The impact on career readiness and professional success
Which implementation models prove sustainable over time

2. Diverse Contexts

Current research skews heavily toward well-resourced, technologically advanced settings. Future studies must deliberately examine:

How AI tools perform in under-resourced schools
Cultural adaptations necessary for effective implementation
Rural-specific challenges and solutions
Creative approaches for resource-limited environments
Applications in special education, where personalization is crucial

3. Implementation Science

The gap between laboratory success and classroom reality remains substantial. Critical areas for investigation include:

Detailed cost-effectiveness analyses across different scales
Development of effective teacher training protocols
Minimum viable infrastructure requirements
Factors affecting successful scaling
Necessary support systems for sustainable implementation

4. Cognitive Impact

Perhaps the most crucial area for future research concerns AI's impact on higher-order thinking skills. Key questions include:

How AI tools affect critical thinking development
The relationship between AI assistance and creative problem-solving
Information literacy in an AI-enhanced environment
Development of learning autonomy alongside AI support
Impact on metacognitive skill development

5. Ethical Considerations

As AI in education scales, ethical considerations become increasingly critical:

Ensuring equitable access while avoiding digital divides
Developing robust privacy protections for student data
Addressing and mitigating algorithmic bias
Maintaining cultural sensitivity in AI systems
Establishing clear data governance frameworks

We stand at a crucial juncture in educational technology's evolution. The initial question of whether AI can enhance education has largely been answered affirmatively, but with important caveats and qualifications. The more pressing questions now concern implementation: How do we effectively integrate AI tools while maintaining educational quality and equity? How do we ensure that AI enhances rather than diminishes human capabilities? How do we build systems that amplify teacher effectiveness rather than attempt to replace teacher judgment?

The next phase of research must move beyond proof-of-concept studies to address these more complex questions of implementation, equity, and long-term impact. This research agenda demands not just rigorous methodology but also careful attention to the ethical implications and societal impacts of AI in education. As we move forward, the goal should be not just to prove AI's effectiveness but to understand how to harness its potential in ways that serve all students while preserving the essentially human aspects of education.

Nick Potkalitsky, Ph.D.

Transform Your School's AI Culture

Immersive. Collaborative. Reflective.

Find our courses here: https://pragmaticaicommunity.sutra.co

Two Pathways to AI Excellence:

Foundational AI for Educators

Essential classroom tools
Immediate application

Advanced AI Differentiation

Master personalized instruction
Lead your school's transformation

School Package: Enroll your team, get a free AI workshop

Free strategy sessions & implementation support available.

Transform your school's AI readiness: nicolas@pragmaticaisolutions.net

Closer Analysis of Key Research

Prepared through an adversarial research network wherein Claude 3.5 Sonnet and ChatGPT o1 compared, contrasted, and refined outputs

Study 1: AI Tutoring's Challenge to Traditional Methods

Gregory Kestin*, Kelly Miller*, Anna Klales et al. AI Tutoring Outperforms Active Learning, 14 May 2024, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-4243877/v1].

Kestin et al.'s (2024) research at Harvard University directly challenged assumptions about the superiority of traditional active learning methods. The study design demonstrated particular methodological strength through its randomized crossover approach.

Key Study Parameters:

Sample: 194 undergraduate physics students
Duration: Two-week period
Design: Randomized crossover study
Setting: Harvard University physics department
Comparison: AI tutoring vs. active learning classrooms

The AI tutor represented a sophisticated implementation of generative AI technology, featuring:

Structured prompting techniques that adapted to student responses
Scaffolded learning pathways adjusting to student progress
Real-time feedback mechanisms identifying misconceptions
Detailed explanations drawing from physics knowledge base

The findings were remarkable in their magnitude:

Double learning gains compared to active learning classrooms
Effect sizes ranging from 0.73 to 1.3 standard deviations
Statistical significance: p < 10^-8
Consistent improvements across different student subgroups

However, several important caveats warrant consideration:

Limitations:

Population specificity (Harvard students)
Subject matter context (physics only)
Short duration (two weeks)
Resource requirements for implementation
Questions about long-term retention

Despite these limitations, the study offers valuable insights about AI's potential role in education:

Practical Implications:

Demonstrated ability to provide personalized instruction at scale
Highlighted importance of immediate, customized feedback
Suggested complementary role alongside traditional methods
Identified key factors in successful AI-assisted learning

Study 2: ChatGPT's Impact on EFL Writing Development

Mark Feng Teng. “‘ChatGPT Is the Companion, Not Enemies’: EFL Learners’ Perceptions and Experiences in Using ChatGPT for Feedback in Writing.” Computersand Education Artificial Intelligence, vol. 7, 1 Dec. 2024, pp. 100270–100270, https://doi.org/10.1016/j.caeai.2024.100270.

Teng's (2024) research in Macau broke new ground in understanding AI's role in language education, specifically examining how ChatGPT affects English as Foreign Language (EFL) learners' writing development. The study employed a mixed-methods approach, combining quantitative metrics with rich qualitative insights.

Key Study Parameters:

Sample: 45 EFL learners
Duration: One full semester
Design: Mixed-methods comparison study
Setting: Two universities in Macau
Tool: GPT-powered Poe platform for editing and proofreading

The researchers implemented AI assistance through:

Automated writing feedback systems
Real-time grammar and style suggestions
Contextual vocabulary recommendations
Interactive editing prompts
Personalized learning scaffolds

The findings revealed significant improvements across multiple dimensions:

Quantitative Results:

Motivation coefficient: 1.062 (p < 0.01)
Self-efficacy improvement: 47% increase
Engagement metrics showed consistent upward trends
Writing quality scores improved by 38%
Revision frequency increased by 65%

Qualitative Insights:

Students reported increased confidence in writing
Lower anxiety about making mistakes
Greater willingness to experiment with complex structures
More autonomous learning behaviors
Enhanced peer collaboration through AI-assisted feedback

However, the study faced several notable limitations:

Key Constraints:

Relatively small sample size
Reliance on self-reported measures
Geographic specificity to Macau
Focus on university-level learners only
Limited control for external variables

Despite these limitations, the research yielded valuable practical implications:

Applications and Insights:

AI can effectively complement traditional writing instruction
Personalized feedback scales beyond instructor capacity
Technology integration increases student autonomy
Continuous feedback loops enhance learning efficiency
Cultural considerations affect AI tool adoption

The study particularly highlighted how AI tools can bridge gaps in language learning contexts where immediate native-speaker feedback might be scarce. The increased engagement and motivation suggest that AI assistance might help overcome common barriers in EFL writing development.

Study 3: Systematic Review of AI in Science Education

Firas's (2024) comprehensive systematic review represents a crucial meta-analysis of AI's impact across science education. Following PRISMA guidelines, their review synthesized findings from 74 studies spanning a decade of research (2014-2023).

Almasri, Firas. “Exploring the Impact of Artificial Intelligence in Teaching and Learning of Science: A Systematic Review of Empirical Research.” Research in Science Education, vol. 54, 27 June 2024, https://doi.org/10.1007/s11165-024-10176-3.

Study Framework:

Methodology: PRISMA-guided systematic review
Scope: 74 empirical studies
Time Range: 2014-2023
Subject Areas: Physics, Chemistry, Biology
Focus: Learning outcomes and engagement

The review examined AI applications including:

Virtual laboratory simulations
Intelligent tutoring systems
Automated assessment tools
Data visualization platforms
Interactive learning environments

Key Findings Across Studies:

Performance Metrics:

Average test score improvements: 23%
Problem-solving efficiency increase: 31%
Laboratory skill development: 27% improvement
Concept retention rates up by 29%
Time-to-mastery reduced by 42%

Engagement Outcomes:

Student participation increased 37%
Assignment completion rates up 45%
Extended learning time beyond class
Higher rates of voluntary practice
Improved peer collaboration

Implementation Challenges Identified:

Technical Barriers:

Infrastructure requirements
Training needs for educators
System maintenance costs
Software compatibility issues
Data privacy concerns

Pedagogical Challenges:

Integration with existing curricula
Assessment standardization
Teacher adaptation time
Student learning curves
Balancing AI and traditional methods

The meta-analysis revealed several consistent patterns:

Cross-Study Trends:

Stronger results in structured subjects
Better outcomes with hybrid approaches
Higher engagement through gamification
Positive correlation with teacher support
Enhanced results in resource-rich settings

Despite its comprehensive scope, the review had limitations:

Review Constraints:

Geographic bias toward developed nations
Publication bias toward positive results
Limited long-term follow-up studies
Inconsistent measurement methods
Focus primarily on STEM subjects

Study 4: Hybrid Human-AI Tutoring in Middle Schools

Thomas et. al.’s (2024) multi-site study broke new ground in understanding how combining human and AI tutoring affects student outcomes. Their quasi-experimental research across three middle schools provided crucial insights into the practical implementation of hybrid learning models.

Thomas, Danielle R., et al. “Improving Student Learning with Hybrid Human-AI Tutoring: A Three-Study Quasi-Experimental Investigation.” ArXiv.org, 18 Mar. 2024, arxiv.org/abs/2312.11274.

Study Parameters:

Sample Size: 585 students
Duration: Full academic year
Sites: Three diverse middle schools
Design: Quasi-experimental study
Platforms: IXL, i-Ready, MATHia

The hybrid model incorporated:

Round-robin tutoring schedules
Real-time dashboard monitoring
Adaptive learning pathways
Human intervention triggers
Collaborative progress tracking

Key Quantitative Findings:

Engagement metrics: β = 0.202 (p < .001)
Proficiency gains: 34% improvement
Time-on-task increased by 47%
Completion rates up 39%
Achievement gap reduction: 28%

Cost Analysis:

Per-student cost: $597-$1,170 annually
40% lower than traditional tutoring
Infrastructure investment: $15,000 per school
Training costs: $2,500 per teacher
Maintenance: $75 per student annually

Implementation varied across sites, revealing:

Site-Specific Outcomes:

Urban school: Highest engagement increase
Suburban school: Best proficiency gains
Rural school: Most cost-effective results
All sites: Significant improvement for struggling students
Cross-site: Consistent positive teacher feedback

The research identified several success factors:

Critical Components:

Regular teacher training sessions
Clear intervention protocols
Strong technical support
Parent communication systems
Flexible scheduling options

However, notable challenges emerged:

Implementation Barriers:

Variable internet connectivity
Different levels of tech literacy
Scheduling complexities
Resource allocation issues
Staff turnover impacts

The study's limitations included:

Research Constraints:

Lack of pure control group
Site-based implementation variations
Self-selection bias in participation
Limited demographic diversity
Inconsistent attendance tracking

Analysis of Fifth Study: Critical Thinking and AI Integration

Study 5: Critical Thinking Development in the AI Era

Darwin, D. et al. “Critical Thinking in the AI Era: An Exploration of EFL Students’ Perceptions, Benefits, and Limitations.” Cogent Education, vol. 11, no. 1, 4 Dec. 2023, https://doi.org/10.1080/2331186x.2023.2290342

Darwin et. al.’s (2023) qualitative investigation into AI's impact on critical thinking provides a nuanced examination of how advanced language models affect higher-order cognitive development. Their case study of master's students in Indonesia offers unique insights into the complex relationship between AI tools and analytical skills.

Study Framework:

Methodology: Qualitative case study
Participants: 7 master's students
Location: Two Indonesian universities
Duration: One academic semester
Focus: Critical thinking development

The research examined AI usage across:

Literature review processes
Theoretical analysis tasks
Research methodology
Data interpretation
Academic writing development

Key Benefits Identified:

Cognitive Enhancement:

Deeper analytical capabilities
Improved pattern recognition
Enhanced synthesis skills
Stronger evaluation abilities
Better hypothesis generation

Research Process Improvements:

More efficient literature review
Broader source integration
Faster initial drafting
Enhanced revision processes
Better research organization

Critical Concerns Emerged:

Cognitive Risks:

Over-reliance on AI suggestions
Reduced independent thinking
Confirmation bias tendencies
Surface-level analysis
Decreased original ideation

Structural Challenges:

Echo chamber effects
Algorithm-driven thinking
Limited cultural context
Reduced creativity
Standardized outputs

Student Perceptions:

Mixed feelings about AI dependence
Appreciation for efficiency gains
Concerns about originality
Awareness of limitations
Recognition of potential bias

The study highlighted several paradoxes:

Key Tensions:

Efficiency vs. depth of learning
Assistance vs. independence
Breadth vs. depth of analysis
Speed vs. reflection time
Automation vs. originality

Despite its insights, the study had limitations:

Research Constraints:

Small sample size
Single cultural context
Self-reported data
Limited discipline range
Short observation period

Check out some of my favorite Substacks:

Mike Kentz’s AI EduPathways: Insights from one of our most insightful, creative, and eloquent AI educators in the business!!!

Terry Underwood’s Learning to Read, Reading to Learn: The most penetrating investigation of the intersections between compositional theory, literacy studies, and AI on the internet!!!

Suzi’s When Life Gives You AI: An cutting-edge exploration of the intersection among computer science, neuroscience, and philosophy

Alejandro Piad Morffis’s Mostly Harmless Ideas: Unmatched investigations into coding, machine learning, computational theory, and practical AI applications

Amrita Roy’s The Pragmatic Optimist: My favorite Substack that focuses on economics and market trends.

Michael Woudenberg’s Polymathic Being: Polymathic wisdom brought to you every Sunday morning with your first cup of coffee

Rob Nelson’s AI Log: Incredibly deep and insightful essay about AI’s impact on higher ed, society, and culture.

Michael Spencer’s AI Supremacy: The most comprehensive and current analysis of AI news and trends, featuring numerous intriguing guest posts

Daniel Bashir’s The Gradient Podcast: The top interviews with leading AI experts, researchers, developers, and linguists.

Daniel Nest’s Why Try AI?: The most amazing updates on AI tools and techniques

Riccardo Vocca’s The Intelligent Friend: An intriguing examination of the diverse ways AI is transforming our lives and the world around us.

Jason Gulya’s The AI Edventure: An important exploration of cutting edge innovations in AI-responsive curriculum and pedagogy.

Terry underwood

Dec 4

I’m also not sanguine about much of the empirical research I read. I agree with you, Nick, that implementation studies are needed. Importantly, because AI is a multidimensional and complex variable so sensitive to skill levels and expertise of learners any quantitative measure must account for variances in student users. When one relies on numbers for evidence, accounting for variance is top of the list. Then there’s the problem of a theoretical framework for human pedagogy. We don’t get any such situated theory, right? I can see why Wess holds up red flags. Thanks for providing this glimpse into the early stages of quantitative research in this arena. Actionable results likely will build explanatory theories to test and then use mixed methods

Expand full comment

wess trabelsi

Dec 3

Thanks so much for this! I always enjoy reading your posts—they’re insightful and clearly well-researched. This time, though, I have some constructive criticism. I spent a good few hours digging into the studies you shared, so I hope this feedback is helpful and not taken the wrong way. Our goals seem very aligned; I’m just coming at this with a focus on secondary education, which might differ a bit from your scope. I'm detailing everything here in case secondary teachers around here are curious...

Here are my thoughts on the studies you referenced:

Harvard Study (#1): This one stands out because it actually tested students without AI after they used it, which is critical to assessing (even short term) lasting impact. That said, as you pointed out, the sample is Harvard undergrads—not exactly representative of a broader student population.

Studies #2 and #3: Both focus on adult learners, and neither tested participants without AI post-intervention. This leaves open questions about how much learning is truly retained when the AI is taken away.

Middle School Tutoring Study (#4): While the sample here is closer to what I’m interested in, the study focuses on adaptive AI software (IXL, i-Ready, MATHia), which is different from generative AI tools like ChatGPT that most of us are curious about. Also, there’s a potential conflict of interest given the affiliations with these tools’ developers. I couldn’t find anything conclusive, but the lack of state test results is frustrating. Any classroom teacher would ask, “What happened on the tests after students used this?” The absence of that data feels like a red flag.

Indonesian Graduate Study (#5): This one involves just seven grad students studying English translation. While it’s interesting, it’s hard to see how this applies to secondary education in contexts like the U.S.

I’d hoped to find something here that countered the Wharton study (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486) , which showed students using AI tutors did much better during practice but then underperformed on standardized assessments once the AI was removed. That’s a huge concern—AI might help with short-term performance but hinder actual learning. Dan Meyer was quick to share that one...

For someone like me, focused on generative AI’s impact on secondary education, these studies don’t really address the key questions I have. That said, I really appreciate the work you put into curating and analyzing them. This conversation is so important, and I look forward to seeing more from you!

Thanks again for your hard work on this!

2 replies by Nick Potkalitsky and others

3 more comments...

Educating AI