We can boil the oceans to run a worse version of a game that can run at 60fps on a potato, but the really cool part is that we need the better version of the game to exist in the first place and also the new version only runs at 20fps.
These videos are, of course, suspiciously cut to avoid showing all the times it completely fucked up, and still shows the engine completely fucking up.
“This door requires a blue key” stays on screen forever
the walls randomly get bullet damage for no reason
the imp teleports around, getting lost in the warehouse brown
the level geometry fucks up and morphs
it has no idea how to apply damage floors
enemies resurrect randomly because how do you train the model to know about arch-viles and/or Nightmare difficulty
finally: it seems like they cannot die because I bet it was trained on demos of successful runs of levels and not the player dying.
it’s interesting that the only real “hallucination” I can see in the video pops up when the player shoots an enemy, which results in some blurry feedback animations
Well, good news for the author, it’s time for him to replay doom because it’s clearly been too long.
I was just watching the vid! I was like, oh wow all of these levels look really familiar… it’s not imagining new “Doom” locations, its literally a complete memorization the levels. Then I saw their training scheme involved an agent playing the game and suddenly I’m like oh, you literally had the robot navigate every level and look around 360 to get an image of all locations and povs didnt you?
and yet, with zero evidence to support the claim, the paper’s authors are confident that their model can be used to create new game logic and assets:
Today, video games are programmed by humans. GameNGen is a proof-of-concept for one part of a new paradigm where games are weights of a neural model, not lines of code. GameNGen shows that an architecture and model weights exist such that a neural model can effectively run a complex game (DOOM) interactively on existing hardware. While many important questions remain, we are hopeful that this paradigm could have important benefits. For example, the development process for video games under this new paradigm might be less costly and more accessible, whereby games could be developed and edited via textual descriptions or examples images. A small part of this vision, namely creating modifications or novel behaviors for existing games, might be achievable in the shorter term. For example, we might be able to convert a set of frames into a new playable level or create a new character just based on example images, without having to author code.
the objective is, as always, to union-bust an industry that only recently found its voice
Which is funny, as creating new levels in an interesting way is very hard. What made John Romero great is that he was very good at level design. He made it look easy. People have been making new levels for ages but only few of them are good. (Of course also because you cannot recreate the experience of playing doom for the first time, so new experiences will need to be ‘my house’ levels of complexity.
I can allow one (1) implementation of Doom on GenAI, in the spirit of the “port Doom on everything” stunt. Now that it’s been done, I hope I don’t have to condone any more.
I can’t remember seeing an AI take on Bad Apple, but I assume the quota’s already filled on that one ages ago as well.
Oh god is this the first time we have to sneer at a 404 article? Let’s hope it will be the last.
It’s running at frames per second, not seconds per frame. so it’s not too energy intensive compared with the generative versions.
it’s interesting that the only real “hallucination” I can see in the video pops up when the player shoots an enemy, which results in some blurry feedback animations
Ah yes, issues appear when shooting an enemy, in a shooter game. Definitely not proof that the technology falls apart when it’s made to do the thing that it was created to do.
e: The demos made me motion sick. Random blobs of colour appearing at random and floor textures shifting around aren’t hallucinations?
yeah, this is weirdly sneerable for a 404 article, and I hope this isn’t an early sign they’ve enshittifying. let’s do what they should have and take a critical look at, ah, GameNGen, a name for their research they surely won’t regret
Diffusion Models Are Real-Time Game Engines
wow! it’s a shame that creating this model involved plagiarizing every bit of recorded doom footage that’s ever existed, exploited an uncounted number of laborers from the global south for RLHF, and burned an amount of rainforest in energy that also won’t be counted. but fuck it, sometimes I shop at Walmart so I can’t throw stones and this sounds cool, so let’s grab the source and see how it works!
just kidding, this thing’s hosted on github but there’s no source. it’s just a static marketing page, a selection of videos, and a link to their paper on arXiv, which comes in at a positively ultralight 10 LaTeX-formatted letter-sized pages when you ignore the many unhelpful screenshots and graphs they included
so we can’t play with it, but it’s a model implementing a game engine, right? so the evaluation strategy given in the paper has to involve the innovative input mechanism they’ve discovered that enables the model to simulate a gameplay loop (and therefore a game engine), right? surely that’s what convinced a pool of observers with more-than-random-chance certainty that the model was accurately simulating doom?
Human Evaluation. As another measurement of simulation quality, we provided 10 human raters with 130 random short clips (of lengths 1.6 seconds and 3.2 seconds) of our simulation side by side with the real game. The raters were tasked with recognizing the real game (see Figure 14 in Appendix A.6). The raters only choose the actual game over the simulation in 58% or 60% of the time (for the 1.6 seconds and 3.2 seconds clips, respectively).
of course not. nowhere in this paper is their supposed innovation in input actually evaluated — at no point is this work treated experimentally like a real-time game engine. also, and you pointed this out already — were the human raters drunk? (honestly, I couldn’t blame them — I wouldn’t give a shit either if my mturk was “which of these 1.6 second clips is doom”) the fucking thing doesn’t even simulate doom’s main gameplay loop right; dead possessed marines just turn to a blurry mess, health and armor don’t make sense in any but the loosest sense, it doesn’t seem to think imps exist at all but does randomly place their fireballs where they should be, and sometimes the geometry it’s simulating just casually turns into a visual paradox. chances are this experimental setup was tuned for the result they wanted — they managed to trick 40% of a group of people who absolutely don’t give a fuck that the incredibly short video clip they were looking at was probably a video game. amazing!
if we ever get our hands on the code for this thing, I’m gonna make a prediction: it barely listens to input, if at all. the video clips they’ve released on their site and YouTube are the most coherent this thing gets, and it instantly falls apart the instant you do anything that wasn’t in its training set (aka, the instant you use this real-time game engine to play a game and do something unremarkably weird, like try to ram yourself through a wall)
were the human raters drunk? (honestly, I couldn’t blame them — I wouldn’t give a shit either if my mturk was “which of these 1.6 second clips is doom”)
“I’unno, I’m fuckin’ wasted and guessin’ at random.”
What is up with AI papers using fancy symbols to notate abstract concepts when there isn’t a single other instance of the concept to be referred to
They offer a bunch of tables with numbers in a metric that isn’t explained, showing that they are exactly the same for “random” and “agent” policy, in other words, inputs don’t actually matter! And they say they want to use these metrics for training future versions. Good luck.
For the sample size they are using 60% seems like a statistically significant rate, and they only tested at most 3 seconds after real gameplay footage.
Sidenote: Auto-regressive models for much shorter periods are really useful for when audio is cutting out. Those use really simple math, they aren’t burning any rainforests
I’m willing to retract my statement that these guys don’t have any ulterior motives.
There are serious problems with how easy it is to adopt the aesthetic of serious academic work without adopting the substance. Just throw a bunch of meaningless graphs and equations and pretend some of the things you’re talking about are represented by Greek letters and it’s close enough for even journalists who should really know better (to say nothing of VCs who hold the purse strings) to take you seriously and adopt the "it doesn’t make sense because I’m missing something* attitude.
The paper starts with a weirdly bad definition of “computer game” too. It almost makes me think that (gasp) the paper was written by non-gamers.
Computer games are manually crafted software systems centered around the following game loop: (1) gather user inputs, (2) update the game state, and (3) render it to screen pixels. This game loop, running at high frame rates, creates the illusion of an interactive virtual world for the player.
No rendering: Myst
No frame rate: Zork
No pixels: Asteroids
No virtual world: Wordle
No screen: Soundvoyager, Audio Defense (well these examples have a vestigial screen, but they supposedly don’t really need it)
Where’s the part where they have people play with the game engine? Isn’t that what they supposedly are running, a game engine? Sounds like what they really managed to do was recreate the video of someone playing Doom which is yawn.
right! without that, all they can show they’re outputting is averaged, imperfect video fragments of a bunch of doom runs. and maybe it’s cool (for somebody) that they can output those at a relatively high frame rate? but that’s sure as fuck not the conclusion they forced — the “an AI model can simulate doom’s engine” bullshit that ended up blowing up my notifications for a couple days when the people in my life who know I like games but don’t know I hate horseshit decided I’d love to hear about this revolutionary research they saw on YouTube
The tone of the article was unusual, putting way too large of a quote from the researchers and taking them at their word. Maybe it’s sarcasm i’m not getting, but either way, the “research” is just a bit of fun if the only goal was getting Doom to run
https://www.404media.co/this-is-doom-running-on-a-diffusion-model/
We can boil the oceans to run a worse version of a game that can run at 60fps on a potato, but the really cool part is that we need the better version of the game to exist in the first place and also the new version only runs at 20fps.
Uhhh about that…
Oh it’s definitely comparable. See, I’ll compare:
The visual quality of GameNGen is worse than that of the original game.
Man, that looks like ass.Sorry, looks like an ass.
only real gamers want to shoot Thongmonster Pro. coming soon to a doom WAD near you
These videos are, of course, suspiciously cut to avoid showing all the times it completely fucked up, and still shows the engine completely fucking up.
The training data was definitely stolen from https://dsdarchive.com/, right?
Well, good news for the author, it’s time for him to replay doom because it’s clearly been too long.
The poison floor not hurting the player trapping him forever was a good thing to end on.
I was just watching the vid! I was like, oh wow all of these levels look really familiar… it’s not imagining new “Doom” locations, its literally a complete memorization the levels. Then I saw their training scheme involved an agent playing the game and suddenly I’m like oh, you literally had the robot navigate every level and look around 360 to get an image of all locations and povs didnt you?
and yet, with zero evidence to support the claim, the paper’s authors are confident that their model can be used to create new game logic and assets:
the objective is, as always, to union-bust an industry that only recently found its voice
Which is funny, as creating new levels in an interesting way is very hard. What made John Romero great is that he was very good at level design. He made it look easy. People have been making new levels for ages but only few of them are good. (Of course also because you cannot recreate the experience of playing doom for the first time, so new experiences will need to be ‘my house’ levels of complexity.
I can allow one (1) implementation of Doom on GenAI, in the spirit of the “port Doom on everything” stunt. Now that it’s been done, I hope I don’t have to condone any more.
I can’t remember seeing an AI take on Bad Apple, but I assume the quota’s already filled on that one ages ago as well.
Oh god is this the first time we have to sneer at a 404 article? Let’s hope it will be the last.
It’s running at frames per second, not seconds per frame. so it’s not too energy intensive compared with the generative versions.
Ah yes, issues appear when shooting an enemy, in a shooter game. Definitely not proof that the technology falls apart when it’s made to do the thing that it was created to do.
e: The demos made me motion sick. Random blobs of colour appearing at random and floor textures shifting around aren’t hallucinations?
yeah, this is weirdly sneerable for a 404 article, and I hope this isn’t an early sign they’ve enshittifying. let’s do what they should have and take a critical look at, ah, GameNGen, a name for their research they surely won’t regret
wow! it’s a shame that creating this model involved plagiarizing every bit of recorded doom footage that’s ever existed, exploited an uncounted number of laborers from the global south for RLHF, and burned an amount of rainforest in energy that also won’t be counted. but fuck it, sometimes I shop at Walmart so I can’t throw stones and this sounds cool, so let’s grab the source and see how it works!
just kidding, this thing’s hosted on github but there’s no source. it’s just a static marketing page, a selection of videos, and a link to their paper on arXiv, which comes in at a positively ultralight 10 LaTeX-formatted letter-sized pages when you ignore the many unhelpful screenshots and graphs they included
so we can’t play with it, but it’s a model implementing a game engine, right? so the evaluation strategy given in the paper has to involve the innovative input mechanism they’ve discovered that enables the model to simulate a gameplay loop (and therefore a game engine), right? surely that’s what convinced a pool of observers with more-than-random-chance certainty that the model was accurately simulating doom?
of course not. nowhere in this paper is their supposed innovation in input actually evaluated — at no point is this work treated experimentally like a real-time game engine. also, and you pointed this out already — were the human raters drunk? (honestly, I couldn’t blame them — I wouldn’t give a shit either if my mturk was “which of these 1.6 second clips is doom”) the fucking thing doesn’t even simulate doom’s main gameplay loop right; dead possessed marines just turn to a blurry mess, health and armor don’t make sense in any but the loosest sense, it doesn’t seem to think imps exist at all but does randomly place their fireballs where they should be, and sometimes the geometry it’s simulating just casually turns into a visual paradox. chances are this experimental setup was tuned for the result they wanted — they managed to trick 40% of a group of people who absolutely don’t give a fuck that the incredibly short video clip they were looking at was probably a video game. amazing!
if we ever get our hands on the code for this thing, I’m gonna make a prediction: it barely listens to input, if at all. the video clips they’ve released on their site and YouTube are the most coherent this thing gets, and it instantly falls apart the instant you do anything that wasn’t in its training set (aka, the instant you use this real-time game engine to play a game and do something unremarkably weird, like try to ram yourself through a wall)
“I’unno, I’m fuckin’ wasted and guessin’ at random.”
“So, your P(doom) is 50%.”
Fuck, you beat me to the P(doom) joke. Well done.
The paper is so bad…
What is up with AI papers using fancy symbols to notate abstract concepts when there isn’t a single other instance of the concept to be referred to
They offer a bunch of tables with numbers in a metric that isn’t explained, showing that they are exactly the same for “random” and “agent” policy, in other words, inputs don’t actually matter! And they say they want to use these metrics for training future versions. Good luck.
For the sample size they are using 60% seems like a statistically significant rate, and they only tested at most 3 seconds after real gameplay footage.
Sidenote: Auto-regressive models for much shorter periods are really useful for when audio is cutting out. Those use really simple math, they aren’t burning any rainforests
I’m willing to retract my statement that these guys don’t have any ulterior motives.
There are serious problems with how easy it is to adopt the aesthetic of serious academic work without adopting the substance. Just throw a bunch of meaningless graphs and equations and pretend some of the things you’re talking about are represented by Greek letters and it’s close enough for even journalists who should really know better (to say nothing of VCs who hold the purse strings) to take you seriously and adopt the "it doesn’t make sense because I’m missing something* attitude.
The paper starts with a weirdly bad definition of “computer game” too. It almost makes me think that (gasp) the paper was written by non-gamers.
No rendering: Myst
No frame rate: Zork
No pixels: Asteroids
No virtual world: Wordle
No screen: Soundvoyager, Audio Defense (well these examples have a vestigial screen, but they supposedly don’t really need it)
Excel is a game.
things that are games:
things that aren’t games:
More computer games:
More computer non-games:
Where’s the part where they have people play with the game engine? Isn’t that what they supposedly are running, a game engine? Sounds like what they really managed to do was recreate the video of someone playing Doom which is yawn.
right! without that, all they can show they’re outputting is averaged, imperfect video fragments of a bunch of doom runs. and maybe it’s cool (for somebody) that they can output those at a relatively high frame rate? but that’s sure as fuck not the conclusion they forced — the “an AI model can simulate doom’s engine” bullshit that ended up blowing up my notifications for a couple days when the people in my life who know I like games but don’t know I hate horseshit decided I’d love to hear about this revolutionary research they saw on YouTube
My intention was more to sneer at the research:
Diffusion Models Are Real-Time Game Engines
The tone of the article was unusual, putting way too large of a quote from the researchers and taking them at their word. Maybe it’s sarcasm i’m not getting, but either way, the “research” is just a bit of fun if the only goal was getting Doom to run