1. Understanding language modeling from scratch
  2. https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV + https://cs336.stanford.edu/
  3. https://pub.sakana.ai/diffusionblocks/
  4. https://arxiv.org/pdf/2606.02437 PEFT, multi lora, similar to models as models
  5. https://loniss.com/cambrian-thesis
  • useful verifiers for inference serving understanding
    • explain intuitively how thinking machines’ interaction model inference differs from typical large scale inference performed by frontier labs
    • why can cerebras chips serve gpt oss 120b 3x faster than any other chip
    • how does the design of rubin differ from the design of blackwell? predict what the design of feynman will be
    • how does the current inference paradigm support or inhibit continual learning via weight updates? put another way, if you imagined a frontier model updating its weights based on daily data, would the hardware architecture of inference change? if so, how? if not, why?