- Understanding language modeling from scratch
- https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV + https://cs336.stanford.edu/
- https://pub.sakana.ai/diffusionblocks/
- https://arxiv.org/pdf/2606.02437 PEFT, multi lora, similar to models as models
- https://loniss.com/cambrian-thesis
- useful verifiers for inference serving understanding
- explain intuitively how thinking machines’ interaction model inference differs from typical large scale inference performed by frontier labs
- why can cerebras chips serve gpt oss 120b 3x faster than any other chip
- how does the design of rubin differ from the design of blackwell? predict what the design of feynman will be
- how does the current inference paradigm support or inhibit continual learning via weight updates? put another way, if you imagined a frontier model updating its weights based on daily data, would the hardware architecture of inference change? if so, how? if not, why?