Wrap each C kernel in #if (ghost_width == N) blocks matching Fortran stencil
coefficients from diff_new.f90, kodiss.f90, and lopsidediff.f90. Add fast-path
indexing for ord=1,4,5 in share_func.h.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split prolongpointstru into search-only (prolongpointstru_search) and
append-only (prolongpointstru_append) functions. Parallelize shell-point
interpolation table construction with #pragma omp parallel for collapse(3)
and per-thread linked lists. Use static schedule for uniform workloads.
Add OMP_FLAG = -fopenmp in makefile.inc and ShellPatch.o override rule
in makefile for GCC OpenMP runtime (-lgomp already linked).
Speedup: setupintintstuff ~2.2x faster on multi-core.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Export OMPI_ROOT/lib64 in LD_LIBRARY_PATH so mpicxx finds its runtime libs
- Add -Wl,-rpath to embed OpenMPI lib64 path in executables for runtime
- Replace hardcoded paths with OMPI_ROOT variable
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- TwoPunctures.C: <mkl_cblas.h> → <cblas.h>
- gaussj.C: <mkl_lapacke.h> → <lapacke.h>
- makefile.inc: use -lopenblaso, remove MKLROOT dependency
- makefile: remove -I${MKLROOT}/include from all flag variables
- Add OpenMPI include path to filein (needed since g++ is used for .C
compilation, not the mpicxx wrapper)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bssnEScalar_class::Initialize() already calls setup_transfer_caches(),
but bssnEM_class::Initialize() did not. When USE_TRANSFER_CACHE=1,
the sync_cache pointers remain NULL, causing SIGSEGV in wrapper
methods that dereference sync_cache_*[lev].
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Seven Parallel::*_cached() calls in RestrictProlong and
RestrictProlong_aux were missed during the transfer-cache refactoring
(commits 9cd3741..8d28c29). When BSSN_USE_TRANSFER_CACHE=0, all
sync_cache pointers are NULL, so dereferencing sync_cache_*[lev]
triggers SIGSEGV.
Replace them with the equivalent wrapper methods (sync_evolution,
restrict_evolution, outbdlow2hi_evolution) that check
use_transfer_cache() and fall back to uncached direct calls.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- compute fi_min/fi_max from output i-range and derive ii_lo/ii_hi
- replace full ii sweep (-1:extf(1)) with windowed sweep in Z/Y precompute passes
- keep stencil math unchanged; add bounds sanity check for ii window