Chapter 11 Performance Optimization in CUDA