CUDA and its questions

There we were some question raised during Rupesh’s GPU class today.

CAUTION: Not sure about device_reference is focus!? L290 device_reference.inl L2+42 reference.h L82 reference.inl at operator= L65

Why is the blockDim.z 64 whereas x and y is 1024
That is how it is designed to be in cuda/GPU.
Why is gridDim.y or z is not \(2^{16}\) but \(2^{16}-1\)
Good question! unknown! Some curious people 1
Some people 2 Since block IDs begin at zero, shouldn’t the maximum dimension therefore be 65536? Having the limit as 65535 allows for block IDs [0, 65534], so we’re losing one ID.
Does GTX 680 has limit of 2048 threads per thread block?
I think NO. See this! and table F2 in Guide v4 From CC v2.0+ it is 1024
Valid limits of kernel launches
There are multiple limits. All must be satisfied.
1. The maximum number of threads in the block is limited to 1024. This is the product of whatever your threadblock dimensions are (xyz).
2. The maximum x-dimension is 1024. (1024,1,1) is legal. (1025,1,1) is not legal.
3. The maximum y-dimension is 1024. (1,1024,1) is legal. (1,1025,1) is not legal.
4. The maximum z-dimension is 64. (1,1,64) is legal. (2,2,64) is also legal. (1,1,65) is not legal. Also, threadblock dimensions of 0 in any position are not legal.
Your choice of threadblock dimensions (x,y,z) must satisfy each of the rules 1-4 above.

★ 2 min read · Rajesh Pandian M · 18-Mar-2020 15:49:43 (IST) ★ cuda