I say “no opinion on Devin”, like that’s gonna be the big thing this week.
Then Quiet-STaR comes out…
It just sounds like the creator made a thing that wasn’t what people wanted.
It just feels like the question to ask then isn’t “but how do I get them to choose the thing despite it not being what they want?”
“Hard work goes to waste when you make a thing that people don’t want” is … true. But I would say it’s a stretch to call it a “problem”. It’s just an unescapable reality. It’s almost tautological.
Look at houses. You made a village with a diverse bunch of houses. But more than half of those, nobody wants to live in. Then “how do I get people to live in my houses?” “Build houses that people actually want to live in.” Like, you can pay people money to live in your weird houses, sure, I just feel like you have missed the point of being an architect somewhat.
I posted a bunch of Substack versions of Zvi’s stuff here earlier and they didn’t get as many upvotes. So like the pigeons driven crazy by random stimulus in the intermittent reinforcement studies, I have only posted the Wordpress since.
Also they don’t have an annoying popup.
Can you judge if the model is being truthful or untruthful by looking at something like |states . honesty_control_vector|
? Or dynamically chart mood through a conversation?
Can you keep a model chill by actively correcting the anger vector coefficient once it exceeds a given threshold?
Can you chart per-layer truthfulness through the layers to see if the model is being glibly vs cleverly dishonest? With glibly = “decides to be dishonest early”, cleverly = “decides to be dishonest late”.
Ah, the life~
Sure, but those systems are known to be useless. Far as I can tell, Zvi’s comment holds up. Falsely believing that an image was AI generated should not count as “AI involvement”, imo.
Wonder who at NIST is actually going to leave over this (if any).
As a doomer, honestly I can never parse if this sort of thing is “AI ESG trying to defect against us even though we’ve been trying to play nice with them very hard” or “AI accelerationists trying to drive a wedge into AI safety.”
In other words, “big, if true”.
Okay I take it back, this is … wait, she’s not even defecting, she’s just shitting on him randomly for lolz. Destroy Twitter when?
Well let’s all hope he’s right about that.