CUDA and its questions
There we were some question raised during Rupesh’s GPU class today.
- what is sequence of actions for ‘D[1]=10’ in
cuda
/thrust
- first the index operator
- in device_vector.h Line 240+290/300
declare these members for the purpose of Doxygenating them. they actually exist in a derived-from class
- then vector_base.h L242 L42
- then contiguous_storage.h
- in thrust::detail::contiguous_storage::reference L36
- then operator[] at line173
CAUTION: Not sure about device_reference is focus!? L290 device_reference.inl L2+42 reference.h L82 reference.inl at operator= L65
- Why is the
blockDim.z
64 whereasx
andy
is 1024That is how it is designed to be in cuda/GPU.
- Why is
gridDim.y or z
is not \(2^{16}\) but \(2^{16}-1\)Good question! unknown! Some curious people 1
Some people 2 Since block IDs begin at zero, shouldn’t the maximum dimension therefore be 65536? Having the limit as 65535 allows for block IDs [0, 65534], so we’re losing one ID.
- Does GTX 680 has limit of 2048 threads per thread block?
I think NO. See this! and table F2 in Guide v4 From CC v2.0+ it is 1024
- Valid limits of kernel launches
There are multiple limits. All must be satisfied.
- The maximum number of threads in the block is limited to 1024. This is the product of whatever your threadblock dimensions are (xyz).
- The maximum x-dimension is 1024. (1024,1,1) is legal. (1025,1,1) is not legal.
- The maximum y-dimension is 1024. (1,1024,1) is legal. (1,1025,1) is not legal.
- The maximum z-dimension is 64. (1,1,64) is legal. (2,2,64) is also legal. (1,1,65) is not legal. Also, threadblock dimensions of 0 in any position are not legal.
Your choice of threadblock dimensions (x,y,z) must satisfy each of the rules 1-4 above.
Source Devtalk
dim3
is actually a struct ofuint3
int3 make_int3(2,3,4);