cuda - Shared memory address passed to device function is still shared memory? -
let's have __device__
function:
__device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){ return array_sh + params; }
and within __global__
kernel use in way:
uarray = dev_kernel (uarray, params);
where uarray
array located in shared memory.
but when use cuda-gdb see addresss of uarray
within __global__
kernel get:
(@generic unsigned char * @shared) 0x1000010 "z\377*"
and within __device__
kernel get:
(unsigned char * @generic) 0x1000010 <error reading address 0x1000010: operation not permitted>
despite error, program in running ok (maybe limitation of cuda-gdb).
so, want know: within __device__
kernel, uarray
shared yet? i'm changing array global shared memory , time same (with shared memory time little worse).
so, want know: within
__device__
kernel,uarray
shared yet?
yes, when pass pointer shared memory device function way, still points same place in shared memory.
in response questions posted below perplexing me, elected show simple example:
$ cat t249.cu #include <stdio.h> #define ssize 256 __device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){ return array_sh + params; } __global__ void mykernel(){ __shared__ unsigned char myshared[ssize]; __shared__ unsigned char *u_array; (int = 0; i< ssize; i++) myshared[i] = (unsigned char) i; unsigned char *loc = dev_kernel(myshared, 5); u_array = loc; printf("val = %d\n", *loc); printf("val = %d\n", *u_array); } int main(){ mykernel<<<1,1>>>(); cudadevicesynchronize(); return 0; } $ nvcc -arch=sm_20 -g -g -o t249 t249.cu $ cuda-gdb ./t249 nvidia (r) cuda debugger 5.5 release .... reading symbols /home/user2/misc/t249...done. (cuda-gdb) break mykernel breakpoint 1 @ 0x4025dc: file t249.cu, line 9. (cuda-gdb) run starting program: /home/user2/misc/t249 [thread debugging using libthread_db enabled] breakpoint 1, mykernel () @ t249.cu:9 9 __global__ void mykernel(){ (cuda-gdb) break 14 breakpoint 2 @ 0x4025e1: file t249.cu, line 14. (cuda-gdb) continue continuing. [new thread 0x7ffff725a700 (lwp 26184)] [context create of context 0x67e360 on device 0] [launch of cuda kernel 0 (mykernel<<<(1,1,1),(1,1,1)>>>) on device 0] [switching focus cuda kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 2, warp 0, lane 0] breakpoint 1, mykernel<<<(1,1,1),(1,1,1)>>> () @ t249.cu:12 12 (int = 0; i< ssize; i++) (cuda-gdb) continue continuing. breakpoint 2, mykernel<<<(1,1,1),(1,1,1)>>> () @ t249.cu:14 14 unsigned char *loc = dev_kernel(myshared, 5); (cuda-gdb) print &(myshared[0]) $1 = (@shared unsigned char *) 0x8 "" ^ | cuda-gdb telling pointer defined in __shared__ statement, , therefore it's storage implicit , unmodifiable. (cuda-gdb) print &(u_array) $2 = (@generic unsigned char * @shared *) 0x0 ^ ^ | u_array stored in shared memory. u_array generic pointer, meaning can point anything. (cuda-gdb) step dev_kernel(unsigned char * @generic, int) (array_sh=0x1000008 "", params=5) @ t249.cu:6 6 return array_sh + params; (cuda-gdb) print array_sh $3 = (@generic unsigned char * @register) 0x1000008 "" ^ ^ | array_sh stored in register. array_sh generic pointer, can point anything. (cuda-gdb) print u_array no symbol "u_array" in current context. (note can't access u_array inside __device__ function, don't understand comment there.) (cuda-gdb) step mykernel<<<(1,1,1),(1,1,1)>>> () @ t249.cu:15 15 u_array = loc; (cuda-gdb) step 16 printf("val = %d\n", *loc); (cuda-gdb) print u_array $4 = ( @generic unsigned char * @shared) 0x100000d ...... ^ ^ | u_array stored in shared memory u_array generic pointer, can point (cuda-gdb)
although haven't provided it, assuming definition of u_array
similar mine, based on cuda-gdb output getting.
note indicators @shared
not telling kind of memory pointer pointing to, telling either kind of pointer (defined implicitly in __shared__
statement) or else stored (in shared memory).
if doesn't sort out questions, please provide complete example, along complete cuda-gdb session output, have.
Comments
Post a Comment