Discussion about this post

User's avatar
Terry underwood's avatar

I’m also not sanguine about much of the empirical research I read. I agree with you, Nick, that implementation studies are needed. Importantly, because AI is a multidimensional and complex variable so sensitive to skill levels and expertise of learners any quantitative measure must account for variances in student users. When one relies on numbers for evidence, accounting for variance is top of the list. Then there’s the problem of a theoretical framework for human pedagogy. We don’t get any such situated theory, right? I can see why Wess holds up red flags. Thanks for providing this glimpse into the early stages of quantitative research in this arena. Actionable results likely will build explanatory theories to test and then use mixed methods

Expand full comment
wess trabelsi's avatar

Thanks so much for this! I always enjoy reading your posts—they’re insightful and clearly well-researched. This time, though, I have some constructive criticism. I spent a good few hours digging into the studies you shared, so I hope this feedback is helpful and not taken the wrong way. Our goals seem very aligned; I’m just coming at this with a focus on secondary education, which might differ a bit from your scope. I'm detailing everything here in case secondary teachers around here are curious...

Here are my thoughts on the studies you referenced:

Harvard Study (#1): This one stands out because it actually tested students without AI after they used it, which is critical to assessing (even short term) lasting impact. That said, as you pointed out, the sample is Harvard undergrads—not exactly representative of a broader student population.

Studies #2 and #3: Both focus on adult learners, and neither tested participants without AI post-intervention. This leaves open questions about how much learning is truly retained when the AI is taken away.

Middle School Tutoring Study (#4): While the sample here is closer to what I’m interested in, the study focuses on adaptive AI software (IXL, i-Ready, MATHia), which is different from generative AI tools like ChatGPT that most of us are curious about. Also, there’s a potential conflict of interest given the affiliations with these tools’ developers. I couldn’t find anything conclusive, but the lack of state test results is frustrating. Any classroom teacher would ask, “What happened on the tests after students used this?” The absence of that data feels like a red flag.

Indonesian Graduate Study (#5): This one involves just seven grad students studying English translation. While it’s interesting, it’s hard to see how this applies to secondary education in contexts like the U.S.

I’d hoped to find something here that countered the Wharton study (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486) , which showed students using AI tutors did much better during practice but then underperformed on standardized assessments once the AI was removed. That’s a huge concern—AI might help with short-term performance but hinder actual learning. Dan Meyer was quick to share that one...

For someone like me, focused on generative AI’s impact on secondary education, these studies don’t really address the key questions I have. That said, I really appreciate the work you put into curating and analyzing them. This conversation is so important, and I look forward to seeing more from you!

Thanks again for your hard work on this!

Expand full comment
3 more comments...

No posts