what is the relation between interaction and MARL? how does that relation explain karten’s take that “realtime envs with agent actions” is the new paradigm? seemingly relates to ‘prospective learning’ which relates to ‘dynamic preferences’
prospective vs retrospective learning??
it does feel like predicting behavior of user you are engaging with would be drastically economically beneficial for modern labs. are they not doing reward modeling on simply predicting what the user says? like the conversation version of echo? they must no? try it as Experiments?
i defined an interaction model to jakub as a model that optimizes for a dynamic reward that is clear when hit but hard to codify before hand. so in some sense the model learns a function approximation of the reward model of its user in some ways superior to the user (it cannot necessarily produce ground truth but it may predict superhuman-ly)
conceptually, it makes sense to prove that you can find a local or global minimum on a static loss function because trying to find a local or global minimum on a dynamic loss function (which theoretically is what continual learning is)
Corrigibility might be a overlapping set of interaction
slow response times make it much much harder for “interaction” to occur
https://arxiv.org/pdf/2606.03237 another paper as a call for cooperation paradigms during training. considered solipsistic. referenced in “from agi to asi”