How a GPU is broken down usually determines how other things are shared between those ALUs. I’ll use ARC Alchemist for this because I have the spec sheet for it.
The A770 is broken down into 32 Xe Cores. This means it has 4096 shading units, 256 TMUs, 128 ROPs, 512 Execution Units, 512 Tensor Cores, 32 RT Cores, and 16MB of L2 cache.
You can also think of this as each Xe Core being made of 128 shading units, 8 TMUs, 4 ROPs, 16 Execution Units, 16 Tensor Cores, 1 RT Cores, and 512KB of L2 cache.
That Xe Core is the smallest unit you could break an Alchemist GPU into and still have every part of the larger whole. You can’t literally just do that, cut the GPU in half, but if I had to draw a diagram of one that is what would be in each Xe Core block.
I’m not going to get into the technical side of how GPU design works, partly because that’s an entire doctorate thesis to write out, and also because I work on the CPU side and those guys are wizards to me.
Reduce your undervolt slightly. -120mv is possibly stable enough given it lasted a few minutes.