How can fixed state vectors handle new actors in the environment? Does the pointer analogy make sense? (per discussion leading to models-as-tools-for-models)
Frontier models probably won’t be trained on specific environments. If further performance requires predicting the consequences of your actions on the environment and/or predicting the behavior of others (if other behavior is mapped in the state space (state space dynamism question)), then will frontier models lag models optimized for domain-specific environment prediction?
Or will models as models just take this over, even if it is true.
This is in the vein of code-as-generalization, which requires evals and compute
starting up / cdev as prior updating / RL world modeling
eval + sensors + actuators being? new software network effects
Turn levine’s paper and explanation into a blog post, perhaps code up an experiment as well if it doesnt exist already?
https://gemini.google.com/app/2e7b447ca45202f4 explain how the work here (avalon) is just approximating AIXI by maintaining expected distributions over the actions of others, how it relates to Levine’s work (also modeling expected distributions over actions of others, but more broadly), how it potentially relates to models as tools for models (model distributions in micromodel space not context space, but not weight space either), how it relates to the data/reward bottleneck as explained by randall (where would the data for that to occur come from). More generally, all of this is supervised learning, which opposes self-play, but even self-play needs rewards)