in Computing and stuff

980, 1080, P100

GPU board close shot

This is a follow up on the “Pascal and Polaris” post I did some time ago.

As you might remember, based on the pre-released benchmarks and some rough estimations, I got to the conclusion that the 1080/1070 (Pascal) seems to have exactly the same performance per clock as Titan X (Maxwell). It seemed to be about 40% more energy efficient, which is the expected benefit from a new process (16nm) and the FinFet transistors. Being cheaper is a bonus.

Well, today we have the Tesla P100 (GP100 core) white paper, as well as the GTX 1080/1070 (GP104 core) white paper. And here are some important things …

Without further ado, here is how a streaming multiprocessor looks like in P100.

Pascal GP100 SMX design

We can see a few things – there are double precision (DP) calculation units on top of the single precision 32bit (core) ones, we have 32kb registers for each 32FP units (1 to 1 ratio), we have 64kb of shared memory for 64 FP units (1 to 1 ratio).

Lets move to 1080 (GP104).
Pascal GP104 SMX designWe can spot easily a few differences. First, there are no double precision (64bit) units. Second, we can see that there are 16kb of register for 32 single precision (1 : 2 ratio), there is 96kb of shared for 128 cores (1 : 1.3 ratio).

Now, let me get from the Maxwell white paper how the old Maxwell (Titan X/980/970/etc) streaming multiprocessors looked like.
Maxwell SMX design

So, do you see the difference between the Maxwell SMX and the GP104 SMX ? Well, no need to scroll up, because there is no difference. The GP104 is actually the good old Maxwell architecture build into 16nm transistors, instead of 28nm ones. With very few (if any) adjustments. And thats why the performance per clock of GP104 is exactly the same as Maxwell GM204. Making architecture work with two different memory types (GDDR vs HBM) is very hard from what I have heard, and I always asked myself how NVIDIA are going to achieve that. Now I know.

This of course means that all the GPGPU features of Pascal, like native half 16bit and double 64bit support, more registers, more shared memory, shared virtual memory with the CPU, HBM2.0 and NVLINK will be exclusive for the premium professional Tesla P100 (GP100). The compute preemption seems to be in the 1080 GP104 however. And the Pascal Quadro/Tesla GPUs seems to be quite different to the gamer ones, at last.

I can guess that this will be the case until the HBM2 becomes cheaper. Which will take a while.

Write a Comment