Thrust library and CUDA API
For long time, I was under the impression that thrust functions can be invoked only from the main function/Host device. Just got to know that thrust (v1.8+) fucntion can be invocked inside the kernel
__global__ void test(float *d_A, int N) {
float sum = thrust::reduce(thrust::seq, d_A, d_A + N);
printf("Device side result = %f\n", sum);
}