cuda - Shared memory address passed to device function is still shared memory? -
let's have __device__ function:
__device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){ return array_sh + params; } and within __global__ kernel use in way:
uarray = dev_kernel (uarray, params); where uarray array located in shared memory.
but when use cuda-gdb see addresss of uarray within __global__ kernel get:
(@generic unsigned char * @shared) 0x1000010 "z\377*" and within __device__ kernel get:
(unsigned char * @generic) 0x1000010 <error reading address 0x1000010: operation not permitted> despite error, program in running ok (maybe limitation of cuda-gdb).
so, want know: within __device__ kernel, uarray shared yet? i'm changing array global shared memory , time same (with shared memory time little worse).
so, want know: within
__device__kernel,uarrayshared yet?
yes, when pass pointer shared memory device function way, still points same place in shared memory.
in response questions posted below perplexing me, elected show simple example:
$ cat t249.cu #include <stdio.h> #define ssize 256 __device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){ return array_sh + params; } __global__ void mykernel(){ __shared__ unsigned char myshared[ssize]; __shared__ unsigned char *u_array; (int = 0; i< ssize; i++) myshared[i] = (unsigned char) i; unsigned char *loc = dev_kernel(myshared, 5); u_array = loc; printf("val = %d\n", *loc); printf("val = %d\n", *u_array); } int main(){ mykernel<<<1,1>>>(); cudadevicesynchronize(); return 0; } $ nvcc -arch=sm_20 -g -g -o t249 t249.cu $ cuda-gdb ./t249 nvidia (r) cuda debugger 5.5 release .... reading symbols /home/user2/misc/t249...done. (cuda-gdb) break mykernel breakpoint 1 @ 0x4025dc: file t249.cu, line 9. (cuda-gdb) run starting program: /home/user2/misc/t249 [thread debugging using libthread_db enabled] breakpoint 1, mykernel () @ t249.cu:9 9 __global__ void mykernel(){ (cuda-gdb) break 14 breakpoint 2 @ 0x4025e1: file t249.cu, line 14. (cuda-gdb) continue continuing. [new thread 0x7ffff725a700 (lwp 26184)] [context create of context 0x67e360 on device 0] [launch of cuda kernel 0 (mykernel<<<(1,1,1),(1,1,1)>>>) on device 0] [switching focus cuda kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 2, warp 0, lane 0] breakpoint 1, mykernel<<<(1,1,1),(1,1,1)>>> () @ t249.cu:12 12 (int = 0; i< ssize; i++) (cuda-gdb) continue continuing. breakpoint 2, mykernel<<<(1,1,1),(1,1,1)>>> () @ t249.cu:14 14 unsigned char *loc = dev_kernel(myshared, 5); (cuda-gdb) print &(myshared[0]) $1 = (@shared unsigned char *) 0x8 "" ^ | cuda-gdb telling pointer defined in __shared__ statement, , therefore it's storage implicit , unmodifiable. (cuda-gdb) print &(u_array) $2 = (@generic unsigned char * @shared *) 0x0 ^ ^ | u_array stored in shared memory. u_array generic pointer, meaning can point anything. (cuda-gdb) step dev_kernel(unsigned char * @generic, int) (array_sh=0x1000008 "", params=5) @ t249.cu:6 6 return array_sh + params; (cuda-gdb) print array_sh $3 = (@generic unsigned char * @register) 0x1000008 "" ^ ^ | array_sh stored in register. array_sh generic pointer, can point anything. (cuda-gdb) print u_array no symbol "u_array" in current context. (note can't access u_array inside __device__ function, don't understand comment there.) (cuda-gdb) step mykernel<<<(1,1,1),(1,1,1)>>> () @ t249.cu:15 15 u_array = loc; (cuda-gdb) step 16 printf("val = %d\n", *loc); (cuda-gdb) print u_array $4 = ( @generic unsigned char * @shared) 0x100000d ...... ^ ^ | u_array stored in shared memory u_array generic pointer, can point (cuda-gdb) although haven't provided it, assuming definition of u_array similar mine, based on cuda-gdb output getting.
note indicators @shared not telling kind of memory pointer pointing to, telling either kind of pointer (defined implicitly in __shared__ statement) or else stored (in shared memory).
if doesn't sort out questions, please provide complete example, along complete cuda-gdb session output, have.
Comments
Post a Comment