Chapter 5 Exercise
- No. Because each thread uses it own data; there are data sharing across threads
- “I did some mental calculation”
- RAW, WAR hazards can happen. make the results wrong.
- shared memory let threads share data; registers are private to each thread.
- 1/32
- 512*1000=512000
- 1000
- a. N b. N/T
- 36 FLOP / (7*4B) = 1.29 FLOP/B;
a. 200/100 = 2 FLOP/B; memory bound
b. 300/250 = 1.2 FLOP/a; compute bound
- a.
BLOCK_SIZE is typo, it should BLOCK_WIDTH. If BLOCK_WIDTH is 1, the code is correct
b. add __syncthreads() after line 10
- a. reg. 1024
b. local mem. 1024
c. shared mem. 8
d. shared mem. 8
e. 1294 = 516
f. a is accecced 4 times at line 7. 10 FLOP at line 14. b is accessed 1 time at line 14. 10/((4+1)4) = 0.5 FLOP/B
12.
a. 2048/64=32 block. 204827=55296 regs. 4KB32=128KB shared memory. NOT full occupancy
b. 2048/256=8 block. 204831=63488 regs. 8KB8=64KB shared memory. full occupancy