Discussion about this post

User's avatar
wess trabelsi's avatar

I remember how I felt when I tried o3 after learning that it had "achieved an ELO score of 2727 on Codeforces, a competitive programming platform, surpassing OpenAI's Chief Scientist's score of 2665" but then failing miserably on a pretty dumb Google Apps Scripts project I had in mind...

I recently saw a recorded webinar my supervisors sent me, it was meant for administrators, and the instructor was making very bold claims, such has "it's great for scheduling", and I totally called BS on that. Thx for confirming, I didn't even try.

Expand full comment
Martijn Sengers's avatar

Hi, Nick great article and experiment, I wonder what would have happened If you had trained the model with several examples (RAG for instance) I fully agree with your observations but I have managed to make the model make better solutions by teaching them, that simple questions give simple answers- that looks good or impressive to outsiders. If it looks like shit, it is shit (boogie nights) Give it context and you will be surprised 😉

Expand full comment
16 more comments...

No posts