I’ve been really curious to know what’s actually happening behind the scenes when you ask OpenAI’s o1 model a question. From what they do show us, it seems pretty clear that the model is breaking the question down, tackling the problem in steps, reviewing its own work, etc. But considering how long the responses took to generate, and that the process was kept a secret, I assumed there must be something pretty exotic going on behind the scenes to make it all work.
Recently I’ve been researching the topic of fine-tuning Large Language Models (LLMs) like GPT on a single GPU in Colab (a challenging feat!), comparing both the free (Tesla T4) and paid options.