• From-UoM@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Fp8 performance per unit die apparently.

      The 4090 has 660 fp8 tflops (which is insane when you think about it) and got the ban.

      H100 is 1930 Fp8 tflops.

      With sparsity both can do upto 2x that

      The 4090 has more fp32 performance than the H100 though.

    • siazdghw@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It’s slightly complex, as there are two metrics it needs to be under.

      See this chart:

      https://cdn.mos.cms.futurecdn.net/dHjnhPMk93HuDPBYnXBzLV.png

      Total Processing Performance (TPP) is essentially the listed processing power multiplied by the length of operation (e.g., FLOPS or TOPS ‘8/16/32/64) without sparsity

      Performance Density is counted by dividing TPP by the die area measured in square millimeters