This is a follow up on the “Pascal and Polaris” post I did some time ago.
As you might remember, based on the pre-released benchmarks and some rough estimations, I got to the conclusion that the 1080/1070 (Pascal) seems to have exactly the same performance per clock as Titan X (Maxwell). It seemed to be about 40% more energy efficient, which is the expected benefit from a new process (16nm) and the FinFet transistors. Being cheaper is a bonus.
Without further ado, here is how a streaming multiprocessor looks like in P100.
We can see a few things – there are double precision (DP) calculation units on top of the single precision 32bit (core) ones, we have 32kb registers for each 32FP units (1 to 1 ratio), we have 64kb of shared memory for 64 FP units (1 to 1 ratio).
Lets move to 1080 (GP104).
We can spot easily a few differences. First, there are no double precision (64bit) units. Second, we can see that there are 16kb of register for 32 single precision (1 : 2 ratio), there is 96kb of shared for 128 cores (1 : 1.3 ratio).
Now, let me get from the Maxwell white paper how the old Maxwell (Titan X/980/970/etc) streaming multiprocessors looked like.
So, do you see the difference between the Maxwell SMX and the GP104 SMX ? Well, no need to scroll up, because there is no difference. The GP104 is actually the good old Maxwell architecture build into 16nm transistors, instead of 28nm ones. With very few (if any) adjustments. And thats why the performance per clock of GP104 is exactly the same as Maxwell GM204. Making architecture work with two different memory types (GDDR vs HBM) is very hard from what I have heard, and I always asked myself how NVIDIA are going to achieve that. Now I know.
This of course means that all the GPGPU features of Pascal, like native half 16bit and double 64bit support, more registers, more shared memory, shared virtual memory with the CPU, HBM2.0 and NVLINK will be exclusive for the premium professional Tesla P100 (GP100). The compute preemption seems to be in the 1080 GP104 however. And the Pascal Quadro/Tesla GPUs seems to be quite different to the gamer ones, at last.
I can guess that this will be the case until the HBM2 becomes cheaper. Which will take a while.