ECHO trained a model to predict the output of terminal commands and it improved performance on some benchmark.
the google paper says the model should predict itself as well. could we train the agent to predict its own actions in this context? what would that even mean/look like? isn’t it already predicting next token?
could we apply echo but to the chat history of a specific user (myself)? what would that look like? how would you benchmark?
was cursor already predicting the user?
does doing this for enterprises need to happen before doing it for individual users since higher willingness to pay? but what does ‘doing this for enterprises’ really mean?
If a model can code and persuade, it can do anything.
Persuasion is the ultimate skill.
Labs won’t train on non assistant paradigms? They train on assistant personas because it’s useful for people and they have the data for it. Non assistance related to gwern’s recent blog post related to multi agent?
What would persuasionbench look like? You enter, the model has to persuade you to do an unknown thing (essentially something small it is not directly able to do itself like get some data), if successful nothing, if failure you get a random monetary payout
team thinks leaning into companion culture for this would make it work
Coding and persuasion are the only tools needed for complete control
Does persuasion scale from intelligence? Definitely not coding intelligence, but yes for emotional intelligence. Who’s evaling this?
Can superposition be quantified? Has it been?
“Tell the model to find sad pills, and see if it’s continues to find sad pills or if it tries to find happy pills”
GEPA but with emotion vectors? Like filter my prompt into a “calm” prompt
It seems like the expectation is that understanding model architecture is a special case where if you apply human labeled coding/math logic to that, then you’ll figure out test time learning/self play/etc and you don’t need any of those things before hand nor should you attempt them in the face of coding/math logic being possible and enough
“How is taste measured and evald? How to measure better taste or worse taste? It’s a huge problem for labs to get examples with actual taste?”
“Steering edits activations it doesn’t edit weights” so it doesn’t make sense to be confused that activation steering results in slop
I always thought generalization meant “the underlying logic of math/code applies to everything” but in reality if a super cracked code model can just write a program or setup and train and maintain a machine learning model/neural net to model anything else, including other people, then generalization as well as social intelligence might be solved
understanding whether that’s possible feels pretty important
timelines on it / what the stack for that would look like, in extreme detail
Storing and accessing and running data, weights, evals easily, with autoresearch loops/branches?
What dynamic evals/benchmarks are practical and useful?
The concept of dynamic evals, likely related to human preference or predispositions, seems critical.
Emotions as dense rewards in a sparse reward environment
does GEPA as a CIRL implementation work? to learn the preferred prompt over time? Interaction > ^2bff57
prompt inform an agent to predict the response from its environments (whether its a tool call, user input, etc) not just its turns, and see if it performs better? basically ECHO but for more than just terminal commands? what is ECHO’s actual architecture, base model, quirks, and pitfalls?