HGMMA_WAIT instruction stalls at issue when inuse_tensor is set, which
is done by the previous HGMMA insn. Currently inuse_tensor is never set
back to zero.
Define EXT_T_HOPPER that, when EXT_T_ENABLE is defined, distinguishes
whether to instantiate core-coupled Volta-style or decoupled
Hopper-style Tensor Core.
Setting mem_req_mask to all-zero triggers an assertion error in
mem_scheduler. Instead, disallow initialize-by-x in instruction decode
which is the source of x-propagation. Since this seems to only happen
in VCS, define-gate it accordingly.
This reverts commit a15f4fd483.