Commit Graph

5 Commits

Author SHA1 Message Date
24f4a45097 Fix macrodef.h include and clean up stale z4c_gpu_rhs_ss.cu
Include macrodef.h (not macrodef.fh) in gpu_rhsSS_mem.h and
bssn_gpu.h so that ABEtype is visible to #if guards in CUDA files.
Remove the separate z4c_gpu_rhs_ss.cu (merged into bssn_gpu_rhs_ss.cu).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 20:02:35 +08:00
f754aa1ec2 Add Z4C Shell-Patch GPU acceleration (Phase 3 complete)
Create z4c_gpu_rhs_ss.cu (reusing BSSN shell FD/chain-rule kernels):
- Uploads trKd = trK + 2*TZ to GPU so existing BSSN algebraic kernels
  compute correct Z4C physical equations without modification
- New kern_z4c_post applies TZ_rhs = alpn1 * Hcon / 2, kappa1/kappa2
  constraint damping, TZ advection (lopsided), and dissipation (kodis)
- Adds TZ/TZ_rhs to Meta struct, alloc/upload/download/free lifecycle

Add cuda_compute_rhs_z4c_ss() wrapper in Z4c_class.C matching the
Fortran f_compute_rhs_Z4c_ss signature, with #define redirection for
Step/SHStep call sites and #undef before analysis functions.

Add z4c_gpu_rhs_ss.o to ABE_CUDA_CFILES and build rule in makefile.
Add kappa1_c/kappa2_c constants to gpu_rhsSS_mem.h.

Build verified with USE_CUDA_Z4C=1 + WithShell — compiles and links
cleanly. All three Shell GPU files now coexist: bssn_gpu_rhs_ss.o
(BSSN), z4c_gpu_rhs_ss.o (Z4C), both sharing FD/chain-rule kernels.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 13:52:48 +08:00
bd4ce3fbf3 GPU-accelerate Shell-Patch BSSN evolution
Phase 1: Enable GPU resident state for Cartesian patches in Shell mode.
- Remove WithShell guard from bssn_cuda_use_resident_sync().
- Add GPU-to-CPU state sync before shell CPU consumers (SHStep,
  CS_Inter, inline shell RHS blocks).

Phase 2: GPU-accelerate BSSN Shell Patch RHS.
- Create bssn_gpu.h with RHS_SS_PARA macro and gpu_rhs_ss declaration.
- Fix compilation bugs in legacy bssn_gpu_rhs_ss.cu (deprecated
  cudaThreadSynchronize, tmp_con2 redeclaration, ijkmin3_h typo,
  CUDA_SAFE_CALL, missing compare_result guard).
- Add bssn_gpu_rhs_ss.o to CFILES_CUDA_BSSN with build rule.
- Write cuda_compute_rhs_bssn_ss() wrapper bridging Fortran and GPU
  parameter conventions, redirect all shell RHS call sites via #define.

Verified: 30-step Shell-Patch GPU run completes without errors/NaN.
Step wall time ~4.4s (step_fn ~2.0s + RP ~0.68s + constraint ~0.70s).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 18:50:10 +08:00
86a683de26 Replace legacy ABEGPU stack with ABE_CUDA backend 2026-04-12 21:19:14 +08:00
f2fc9af70e asc26 amss-ncku initialized 2026-01-13 15:01:15 +08:00