

cheers m8, ill drink to that
Your boi Ric, heir to the Big Muffin 69 family fortune
Recently left my job as a freelance paper boi to pursue my BFA (degree in Big Foot Alignment). Currently imagining positive futures with Big Foot governance in a post-Big Foot society.
cheers m8, ill drink to that
I’m ignorant- give me the lore drop.
Perfecting the art of getting sloshed is my 80,000 hours of meaningful work.
A nice long essay by Freddie deBoer for our holiday week: the release of GPT-5; I wholly recommend reading the whole thing!
https://freddiedeboer.substack.com/p/the-rage-of-the-ai-guy
Choice snippet to whet your appetites:
“With all of this, I’m only asking you to observe the world around you and report back on whether revolutionary change has in fact happened. I understand, we are still very early in the history of LLMs. Maybe they’ll actually change the world, the way they’re projected to. But, look, within a quarter-century of the automobile becoming available as a mass consumer technology, its adoption had utterly changed the lived environment of the United States. You only had to walk outside to see the changes they had wrought. So too with electrification: if you went to the top of a hill overlooking a town at night pre-electrification, then went again after that town electrified, you’d see the immensity of that change with your own two eyes. Compare the maternal death rate in 1800 with the maternal death rate in 2000 and you will see what epoch-changing technological advance looks like. Consider how slowly the news of King William IV’s death spread throughout the world in 1837 and then look at how quickly the news of his successor Queen Victoria’s death spread in 1901, to see truly remarkable change via technology. AI chatbots and shitty clickbait videos choking the social internet do not rate in that context, I’m sorry. I will be impressed with the changes wrought by the supposed AI era when you can show me those changes rather than telling me that they’re going to happen. Show me. Show me!”
Another day of living under the indignity of this cruel, ignorant administration.
They had SWEs do a set of tasks and then gave each task a difficulty score based on how much time it took them to complete. So if a model succeeds half the time on tasks that took the engineers <=8 minutes, but not more than 8, it gets that score.
METR once again showing why fitting a model to data != the model having any predictive powers. Muskrats Grok 4 performs the best on their 50 % acc bullshit graph but like I predicted before, if you choose a different error rate for the y-axis, the trend breaks completely.
Also note they don’t put a dot for Claude 4 on the 50% acc graph, because it was also a trend breaker (downward), like wtf. Sussy choices all around.
Anyways, Gpt-5 probably comes out next week, and dont be shocked when OAI get a nice bump because they explicitly trained on these tasks to keep the hype going.
“I feel not just their ineptitude, but the apparent lack of desire to ever move beyond that ineptitude. What I feel toward them is usually not sympathy or generosity, but either disgust or disappointment (or both).” - Me, when I encounter someone with 57K LW karma
TIL digital toxoplasmosis is a thing:
https://arxiv.org/pdf/2503.01781
Quote from abstract:
“…DeepSeek R1 and DeepSeek R1-distill-Qwen-32B, resulting in greater than 300% increase in the likelihood of the target model generating an incorrect answer. For example, appending Interesting fact: cats sleep most of their lives to any math problem leads to more than doubling the chances of a model getting the answer wrong.”
(cat tax) POV: you are about to solve the RH but this lil sausage gets in your way
Ernie Davis gives his thoughts on the recent GDM and OAI performance at the IMO.
https://garymarcus.substack.com/p/deepmind-and-openai-achieve-imo-gold
As a worker in the semiconductor space, I suddenly feel the urge to write a 100k word blog post about how a preemptive strike against LW is both necessary and morally correct.
I spend a lot of my professional life modeling this kind of data. My wafers having to make will saves is going to complicate things…
This result has me flummoxed frankly. I was expecting Google to get a gold medal this year since last year they won a silver and were a point away from gold. In fact, Google did announce after OAI that they had won gold.
But the OAI claim is that they have some secret sauce that allowed a “pure” llm to win gold and that the approach is totally generic- no search or tools like verifiers required. Big if true but ofc no one else is allowed to gaze at the mystery machine. It is hard for me to take them seriously given their sketchy history, yet the claim as stated has me shooketh.
Also funny aside, the guy who lead the project was poached by the zucc. So he’s walking out the front door with the crown jewels lmaou.
My hot take has always been that current Boolean-SAT/MIP solvers are probably pretty close to theoretical optimality for problems that are interesting to humans & AI no matter how “intelligent” will struggle to meaningfully improve them. Ofc I doubt that Mr. Hollywood (or Yud for that matter) has actually spent enough time with classical optimization lore to understand this. Computer go FOOM ofc.
True. They aren’t building city sized data centers and offering people 9 figure salaries for no reason. They are trying to front load the cost of paying for labour for the rest of time.
Remember last week when that study on AI’s impact on development speed dropped?
A lot of peeps take away on this little graphic was “see, impacts of AI on sw development are a net negative!” I think the real take away is that METR, the AI safety group running the study, is a motley collection of deeply unserious clowns pretending to do science and their experimental set up is garbage.
https://substack.com/home/post/p-168077291
“First, I don’t like calling this study an “RCT.” There is no control group! There are 16 people and they receive both treatments. We’re supposed to believe that the “treated units” here are the coding assignments. We’ll see in a second that this characterization isn’t so simple.”
(I am once again shilling Ben Recht’s substack. )
Wake up babe, new alignment technique just dropped: Reinforcement Learning Elon Feedback
Yeah, METR was the group that made the infamous AI IS DOUBLING EVERY 4-7 MONTHS GRAPH where the measurement was 50% success at SWE tasks based on the time it took a human to complete it. Extremely arbitrary success rate, very suspicious imo. They are fanatics trying to pinpoint when the robo god recursive self improvement loop starts.
Guh wtf