Striding stack space for threads by power-of-two risks possibilities of bank conflicts or cache aliasing problems. Add an extra offset of 4 bytes to avoid this.
+ Microarchitecture optimizations + 64-bit support + Xilinx FPGA support + LLVM-16 support + Refactoring and quality control fixes