6 Comments

I’m also not sanguine about much of the empirical research I read. I agree with you, Nick, that implementation studies are needed. Importantly, because AI is a multidimensional and complex variable so sensitive to skill levels and expertise of learners any quantitative measure must account for variances in student users. When one relies on numbers for evidence, accounting for variance is top of the list. Then there’s the problem of a theoretical framework for human pedagogy. We don’t get any such situated theory, right? I can see why Wess holds up red flags. Thanks for providing this glimpse into the early stages of quantitative research in this arena. Actionable results likely will build explanatory theories to test and then use mixed methods

Expand full comment

Thanks so much for this! I always enjoy reading your posts—they’re insightful and clearly well-researched. This time, though, I have some constructive criticism. I spent a good few hours digging into the studies you shared, so I hope this feedback is helpful and not taken the wrong way. Our goals seem very aligned; I’m just coming at this with a focus on secondary education, which might differ a bit from your scope. I'm detailing everything here in case secondary teachers around here are curious...

Here are my thoughts on the studies you referenced:

Harvard Study (#1): This one stands out because it actually tested students without AI after they used it, which is critical to assessing (even short term) lasting impact. That said, as you pointed out, the sample is Harvard undergrads—not exactly representative of a broader student population.

Studies #2 and #3: Both focus on adult learners, and neither tested participants without AI post-intervention. This leaves open questions about how much learning is truly retained when the AI is taken away.

Middle School Tutoring Study (#4): While the sample here is closer to what I’m interested in, the study focuses on adaptive AI software (IXL, i-Ready, MATHia), which is different from generative AI tools like ChatGPT that most of us are curious about. Also, there’s a potential conflict of interest given the affiliations with these tools’ developers. I couldn’t find anything conclusive, but the lack of state test results is frustrating. Any classroom teacher would ask, “What happened on the tests after students used this?” The absence of that data feels like a red flag.

Indonesian Graduate Study (#5): This one involves just seven grad students studying English translation. While it’s interesting, it’s hard to see how this applies to secondary education in contexts like the U.S.

I’d hoped to find something here that countered the Wharton study (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486) , which showed students using AI tutors did much better during practice but then underperformed on standardized assessments once the AI was removed. That’s a huge concern—AI might help with short-term performance but hinder actual learning. Dan Meyer was quick to share that one...

For someone like me, focused on generative AI’s impact on secondary education, these studies don’t really address the key questions I have. That said, I really appreciate the work you put into curating and analyzing them. This conversation is so important, and I look forward to seeing more from you!

Thanks again for your hard work on this!

Expand full comment

This is great, Wess!!! To be honest, you have done a deeper dive than I have. You should make this the subject of your own response article now that you are this deep in.

Know that I operate from a secondary frame so I too am concerned about the lack of research foundation we are operating from right now.

As you probably know, psych and ed profs have ready access to undergrads so most studies focus on this demographic. Staging a study with actual high school students is a much more complicated affair, and there are only a handful of such studies out there right now and none very conclusive.

We are still in the anecdotal and analogical phase.

Know too that I had a longer section with recommendations about methodologies moving forward that I decided to drop from the text because I didn't want to try the patience of my average reader.

You are giving voice to some of my rec--particularly the need for more attention to post-AI-integration data gathering.

It is going to be a long road ahead.

My gut says there is a diamond here in the rough---if the learning experience is constructed in a really intentional way.

Figuring out what that looks like will be the effort of the next phase---and unfortunately, it will most likely be forged through cautious, thoughtful, teacher-experimentation---as it always has been in secondary in the context of deep technological transformation.

Be well.

Let's work on something together in the near future.

Nick

Expand full comment

Thanks Nick. like I said our goals seem very aligned, which is why I'm following your thinking.

As far as I'm concerned, I'm not so prolific as you are in writing... I've been trying to birth my next article for a while. maybe tonight? you said "psych and ed profs have ready access to undergrads so most studies focus on this demographic" which suggests you're going to either get, or enjoy my intro ;)

Expand full comment

Excellent, detailed post. You are right to mention implementation. Do take a look at Rouzana's great work in this field: https://mosinian.com/

Expand full comment

Thanks for the tip. Just signed up for their newsletter.

Expand full comment