Educating AI

wess trabelsi

Apr 11

ACTUALLY - I got access to Manus.im and paid after i quickly ran out of free tokens, I tried a scheduling task. I asked it to schedule 20 students for one-time presentations to 10 teachers within 2 weeks, with each presentation occurring during teacher prep periods and no teacher hosting more than 3 presentations total. I gave it class list in csv and the teacher prep schedule in csv: each teacher having 2 prep period a day, but all teachers have different prep times.

Not only did Manus nail it (after a first glitch), it made me on the spot a website where I can show the results per day, or per teacher, etc. check it out https://aemnybne.manus.space/

Expand full comment

Apr 12

Good to know. I just got access last week but haven’t played around with it. The more I experiment, the more I think it has to do with materials formatting and prompting. The capability seems to be latent in this generation, but different models require slightly different activations on user end.

Expand full comment

wess trabelsi

Apr 25

It’s really about writing and executing code so that it can compute your task instead of relying on LLM inference only. These new tools essentially design themselves the tools they need to meet your demand.

Expand full comment

Martijn Sengers

Apr 8

Hi, Nick great article and experiment, I wonder what would have happened If you had trained the model with several examples (RAG for instance) I fully agree with your observations but I have managed to make the model make better solutions by teaching them, that simple questions give simple answers- that looks good or impressive to outsiders. If it looks like shit, it is shit (boogie nights) Give it context and you will be surprised 😉

Expand full comment

Thanks. I just reworked my prompting pathway. Took things a little slower. I asked GPT 4 this time to teach me how to prompt it for success. Results much much better!!!!

Expand full comment

Tales Fernandes Costa

Apr 8

Hi Nick, very good article. Scheduling problems complexity rises very quickly with number of variables and restrictions applied. In this cases, effort for solving them is much higher than for straightforward organizational tasks. Another issue is that LLMs don't behave like conventional functions, sometimes they will provide different answers for same input prompts and parameters. Follows a link that may help future efforts: https://timefold.ai/

Expand full comment

Very cool, Tales. I will check this out. No, I was hoping to tap into a little of the unconventionality in light of the complexity of the task at hand. I tried with a much more clearly delineated prompt cycle and had better results.

Expand full comment

Mark Laurence

I wonder if you'd have got better results using Deep Research, Nick? I've found it to do great things working with uploaded data that I would've otherwise used the o Models for. I can't guarantee you'd get a better result, but definitely worth trying.

Expand full comment

Reply (2)

Good advice. I was hoping someone would help me troubleshoot.

Expand full comment

I actually just downgraded to GPT 4. Had better results. Lol!!!

Expand full comment

Michael Woudenberg

For how good it is at many things, it's terrible at many others. Which is fine! Because we haven't even figured out what to do with what we have!

Expand full comment

Terry underwood

From Google:

2. Fallback Mechanisms:

Purpose:

If an LLM call fails or encounters an issue, have alternative paths or actions in place.

Examples:

Retry with a different LLM provider: If one LLM fails, try another.

Use a simpler prompt: If the original prompt is too complex, simplify it.

Route the request to a human: If the LLM cannot handle the task, delegate it to a human operator.

Provide a default response: If the LLM fails to generate a response, provide a predefined default response.

From Gemini: Implementation

* Optimization:

* If the LLM's solution is not optimal, consider using optimization algorithms (e.g., genetic algorithms, constraint programming) to further refine the schedule.

* These algorithms can be integrated with the LLM's output to improve efficiency and fairness.

* Implementation:

* Integrate the generated schedule into a scheduling system or database.

* Communicate the schedule to students and teachers.

* Create a system for feedback and change requests.

Important Considerations:

* LLM Capabilities: LLMs are powerful, but they may not always produce perfectly optimal solutions. Complex scheduling problems may require additional optimization techniques.

* Prompt Engineering: The quality of the LLM's output depends heavily on the clarity and precision of the prompt.

* Data Accuracy: Ensure the accuracy and completeness of the input data.

* Ethical Considerations: Address any potential biases or fairness issues in the scheduling process.

* Scalability: Consider the scalability of the solution for large datasets.

* Error Handling: Implement robust error handling to address potential issues during LLM execution.

* Testing: Thoroughly test the system with various scenarios to ensure its reliability.

This protocol provides a comprehensive framework for using an LLM to generate student-teacher schedules. Adapt it to your specific needs and constraints for optimal results.

Expand full comment

Adam

See: https://ai-navigator.medium.com/can-ai-reason-4adf80ebc9b1

Good article, Nick. You've highlighted one of the main challenges for state-of-the-art AI in 2025 - can they become robust reasoners? I wrote about this last year, using a little reasoning test of my own, which I continue to use on the latest reasoning models as they are released. My test is in some ways quite similar to yours: it involves scheduling substitutions and playing time for a 5-a-side football team.

And: https://www.youtube.com/watch?v=h_DCekOtFqk

I do expect a lot of progress in AI reasoning in the next couple of years. It will be essential for AI to take the next leap in usefulness.

Expand full comment

Awesome!!! I will check this out!!! When I think back to the models, we were using 2 years ago, you have to admit that things are moving quickly.

Expand full comment

Daniel Nest

Solid test, disappointing results for o3-mini.

Expand full comment

Saty Chary

Hi Nick, nice article.

In a sense, not surprising - for the past 70 years, ie since AI's inception, what has worked is deep/narrow AI (even that, with significant help from humans in the form of rules, data, goals); what has been elusive is generalizing, spanning disparate domains.

Expand full comment