Nvidia Blackwell GB202 GPU Rumored to Feature 384-bit GDDR7

imaginary_num6er@alien.top · 1 year ago

Nvidia Blackwell GB202 GPU Rumored to Feature 384-bit GDDR7

JuanElMinero@alien.top · 1 year ago

Am I reading those Cuda core projections right?

GA102 to AD102 increased by about 80%, but the jump from Ad102 to GB202 is only slightly above 30%, aside from no large gains going to 3nm?

Might not turn out that impressive after all.

Qesa@alien.top · 1 year ago

It’s highly likely to be a major architecture update, so core count alone won’t be a good indicator of performance.

ResponsibleJudge3172@alien.top · 1 year ago

‘Ampere Next’ referred to datacenter lineup, which ended being the biggest architectural change in datacenter GPUs since Volta vs GP100. And Ampere Next Next, referred to datacenter Blackwell, which is MCM so again a big change

Eitan189@alien.top · 1 year ago

It isn’t a major architecture update. Nvidia’s slides from Ampere’s release stated that the next two architectures after Ampere would be part of the same family.

Performance gains will be had by improving the RT & tensor cores, using an improved node, probably N4X, to facilitate clock speed increases at the same voltages, and by increasing the number of SMs across the product stack. The maturity of the 5nm process will allow Nvidia to use larger die than they could in Ada.

rorschach200@alien.top · 1 year ago

by improving the RT & tensor cores

and HW support for DLSS features and CUDA as a programming platform.

It might be “a major architecture update” by the amount of work that Nvidia engineering will have to put in to pull off all the new features and RT/TC/DLSS/CUDA improvements without regressing PPA - that’s where the years of effort will be sunk - and possibly large improvements in perf in selected application categories and operating modes, but a very minor improvement in “perf per SM per clock” in no-DLSS rasterization on average.

Baalii@alien.top · 1 year ago

You should be looking at transistor amount if anything at all, “cuda cores” is only somewhat useful when looking at different products within the same generation.

ResponsibleJudge3172@alien.top · 1 year ago

Still very accurate if you know what to look for.

For example, the reason why Ampere vs Turing CUDA cores scale different will let you predict how an Ampere GPU scales vs Turing GPU.

It’s also why we knew how Ada would scale linearly except with 4090 that was nerfed to be more efficient

ResponsibleJudge3172@alien.top · 1 year ago

I guess people don’t dig into white papers to learn about how and why the architectures perform as they do

capn_hector@alien.top · 1 year ago

GA102 to AD102 increased by about 80%, but the jump from Ad102 to GB202 is only slightly above 30%,

Maybe GB202 is not the top chip, and the top chip is named GB200.

I mean, you’d expect this die to be called GB102 based on the recent numbering scheme, right? Why jump to 202 right out of the gate? They haven’t done that in the past, AD100 is the compute die and AD102, 103, 104… are the gaming dies. In fact this has been extremely consistent all the way back to Pascal, even when there is a compute uarch variant that is different (and, GP100 is quite different from GP102 etc) it’s still called the 100.

But if there is another die above it, you’d call it GB100 (like Maxwell GM200, or Fermi GF100). Which is obviously already taken, GB100 is the compute die. So you bump the whole numbering series to 200, meaning the top gaming die is GB200.

There is also precedent for calling the biggest gaming die the x110, like GK110 or the Fermi GF110 (in the 500 series). But they haven’t done that in a long time, since Kepler. Probably because it ruins the “bigger number = smaller die” rule of thumb.

Of course it’s possible the 512b rumor was bullshit, or this one is bullshit. But it’s certainly an odd flavor of bullshit - if you were making something up, wouldn’t you make up something that made sense? Odd details like that potentially lend it credibility, because you’d call it GB102 if you were making it up. It will also be easy to corroborate across future rumors, if nobody ever mentions GB200-series chips again, then this was probably just bullshit, and vice versa. Just like Angstronomics and the RDNA3 leak, once he’d nailed the first product the N32/N33 information was highly credible.

scytheavatar@alien.top · 1 year ago

It is already leaked, GB200 is a chiplet design that will be exclusive for server customers. GB202 will be used for the 5090.

rorschach200@alien.top · 1 year ago

GA102 to AD102 increased by about 80%

without scaling DRAM bandwidth anywhere near as much, only partially compensating for that with a much bigger L2.

For 5090 on the other hand we might also have clock increase going (another 1.15x?), and proportional 1:1 (unlike Ampere -> Ada) DRAM bandwidth increase by a factor of 1.5 due to GDDR7 (no bus width increases necessary; 1.5 = 1.3 * 1.15), so this is 1.5x perf increase 4090 -> 5090, which has to be further multiplied by whatever u-architectural improvements might bring, like Qesa is saying.

Unlike Qesa, though, I’m personally not very optimistic regarding those u-architectural improvements being very major. To get from 1.5x that comes out of node speed increase and the node shrink subdued and downscaled by node cost increase, to recently rumored 1.7x one would need to get (1.7 / 1.5 = 1.13) 13% perf and perf/w improvement, which sounds just about realistic. I’m betting it’ll be even a little bit less, yielding more like 1.6x proper average, that 1.7x might have been the result of measuring very few apps or outright “up to 1.7x” with “up to” getting lost during the leak (if there was even a leak).

1.6x is absolutely huge, and no wonder nobody’s increasing the bus width: it’s unnecessary for yielding a great product and even more expensive now than it was on 5nm (DRAM controllers almost don’t shrink and are big).

ResponsibleJudge3172@alien.top · 1 year ago

Its expected to be like Ampere, Ampere was 17% increase in SMs (rtx 3090ti vs rtx Titan) but the SM itself was improved such that they yielded about 33% improvement per SM in ‘raster’ and massive improvements in occupency for RT workloads. So 3090ti ended up 46% faster in ‘raster’ vs rtx Titan.

The TPC and GPC of Blackwell are rumored to be overhauled with a more hesitant rumor about the SM also being improved.