GEMM part is disabled for faster debugging, the kernel reads the result of A*B directly from input binary.