@rorschach200

rorschach200@alien.top · 10 months ago

The bus width needed to be what it needed to be. That left 2 possibilities - 12 GB and 24 GB. The former was way low for 4090 to work in its target applications. 24 it became.

This is exactly what drives these decisions.

What do you think drives them?

rorschach200@alien.top · 10 months ago

by improving the RT & tensor cores

and HW support for DLSS features and CUDA as a programming platform.

It might be “a major architecture update” by the amount of work that Nvidia engineering will have to put in to pull off all the new features and RT/TC/DLSS/CUDA improvements without regressing PPA - that’s where the years of effort will be sunk - and possibly large improvements in perf in selected application categories and operating modes, but a very minor improvement in “perf per SM per clock” in no-DLSS rasterization on average.

rorschach200@alien.top · 10 months ago

Why actually build the 36 GB one though? What gaming application will be able to take advantage of more than 24 for the lifetime of 5090? 5090 will be irrelevant by the time the next gen of consoles releases, and the current one has 16 GB for VRAM and system RAM combined. 24 is basically perfect for top end gaming card.

And 36 will be even more self-canibalizing for professional cards market.

So it’s unnecessary, expensive, and canibalizing. Not happening.

rorschach200@alien.top · 10 months ago

GA102 to AD102 increased by about 80%

without scaling DRAM bandwidth anywhere near as much, only partially compensating for that with a much bigger L2.

For 5090 on the other hand we might also have clock increase going (another 1.15x?), and proportional 1:1 (unlike Ampere -> Ada) DRAM bandwidth increase by a factor of 1.5 due to GDDR7 (no bus width increases necessary; 1.5 = 1.3 * 1.15), so this is 1.5x perf increase 4090 -> 5090, which has to be further multiplied by whatever u-architectural improvements might bring, like Qesa is saying.

Unlike Qesa, though, I’m personally not very optimistic regarding those u-architectural improvements being very major. To get from 1.5x that comes out of node speed increase and the node shrink subdued and downscaled by node cost increase, to recently rumored 1.7x one would need to get (1.7 / 1.5 = 1.13) 13% perf and perf/w improvement, which sounds just about realistic. I’m betting it’ll be even a little bit less, yielding more like 1.6x proper average, that 1.7x might have been the result of measuring very few apps or outright “up to 1.7x” with “up to” getting lost during the leak (if there was even a leak).

1.6x is absolutely huge, and no wonder nobody’s increasing the bus width: it’s unnecessary for yielding a great product and even more expensive now than it was on 5nm (DRAM controllers almost don’t shrink and are big).

rorschach200@alien.top · 10 months ago

The “gain” is largely a weighted average over all apps, not a max realizing in couple of outliers. It’s the bulk that determines the economics of the question, not singular exceptions.
The current status is heavily dominated by the historical state of affairs, as not enough time has passed to do much yet. Complex heterogenous cache hierarchies that generalize poorly is a very recent thing in CPUs, in GPUs it was the case for decades now, and in GPUs that is not the only source of large sensitivity to tuning.

rorschach200@alien.top · 10 months ago

Also, GPUs are full of sharp performance cliffs and tuning opportunities, there is a lot to be gained. CPUs are a lot more resilient and generic - a lot less to be gained there.

rorschach200@alien.top · 10 months ago

From the video for convenience:

“Why did Intel only choose to enable Intel® Application Optimization on select 14th Gen processors? Settings within Intel® Application Optimization are custom determined for each supported processor, as they consider the number of P-cores, E-cores, and Intel® Hyperthreading Technology. Due to the massive amount of custom testing that went into the optimized setting parameters specifically gaming applications [sic], Intel chose to align support for our gaming-focused processors.”
- Intel

(original page quoted from)