➀ The Stream Multi-processor (SM) is the core module of the GPU, executing the entire Kernel Grid. ➁ SM consists of a SIMT front end and a SIMD back end, processing instructions in six stages: fetch, decode, issue, operand delivery, execute, and write back. ➂ SM supports instruction-level parallelism but not out-of-order execution, with each Warp executing the same instruction in SIMD mode.