For unknown reasons, guarding the fence with a tid == 0 branch causes a TL source ID re-used assertion. Just call the fence from all thread/warps as a workaround. At least, all threads in a warp will coalesce into one request.