Commit Graph

135 Commits

Author SHA1 Message Date
c4b9bd3788 Add full FD order support (2nd/4th/6th/8th) to C derivative kernels via ghost_width dispatch
Wrap each C kernel in #if (ghost_width == N) blocks matching Fortran stencil
coefficients from diff_new.f90, kodiss.f90, and lopsidediff.f90. Add fast-path
indexing for ord=1,4,5 in share_func.h.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 11:33:52 +08:00
276b36ea25 Add plot-only restart script to skip recomputation when plotting is interrupted
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 15:01:25 +08:00
baf248c3bc Add thread-safe ShellPatch::setupintintstuff with OpenMP
Split prolongpointstru into search-only (prolongpointstru_search) and
append-only (prolongpointstru_append) functions. Parallelize shell-point
interpolation table construction with #pragma omp parallel for collapse(3)
and per-thread linked lists. Use static schedule for uniform workloads.

Add OMP_FLAG = -fopenmp in makefile.inc and ShellPatch.o override rule
in makefile for GCC OpenMP runtime (-lgomp already linked).

Speedup: setupintintstuff ~2.2x faster on multi-core.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 22:31:18 +08:00
70b6496ed3 Accelerate Shell-Patch CPU interpolation 2026-05-08 14:37:16 +08:00
6ca9fece2e Add OpenMPI rpath and LD_LIBRARY_PATH export for reliable linking
- Export OMPI_ROOT/lib64 in LD_LIBRARY_PATH so mpicxx finds its runtime libs
- Add -Wl,-rpath to embed OpenMPI lib64 path in executables for runtime
- Replace hardcoded paths with OMPI_ROOT variable

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 22:43:57 +08:00
516cdea502 Replace MKL with OpenBLAS
- TwoPunctures.C: <mkl_cblas.h> → <cblas.h>
- gaussj.C: <mkl_lapacke.h> → <lapacke.h>
- makefile.inc: use -lopenblaso, remove MKLROOT dependency
- makefile: remove -I${MKLROOT}/include from all flag variables
- Add OpenMPI include path to filein (needed since g++ is used for .C
  compilation, not the mpicxx wrapper)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 22:33:43 +08:00
9687d9a3dd Switch build system from Intel oneAPI to GCC + OpenMPI
- Replace compilers: ifx→gfortran, icx→gcc, icpx→g++, mpiicpx→mpicxx
- Replace flags: -xHost→-march=x86-64-v4, -ipo→-flto, -fpp→-cpp
- Replace flags: -fp-model fast=2→-ffast-math, -fma→-mfma
- Replace flags: -qopenmp→-fopenmp
- Remove Intel-specific: -align array64byte, -liomp5, -lifcore, -limf
- Switch MKL interface: -lmkl_intel_lp64→-lmkl_gf_lp64 (gfortran)
- Replace TBB malloc with optional jemalloc (default off)
- Disable PGO entirely (was already marked negative optimization)
- TwoPunctureABE and ABE both verified to build successfully

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 22:00:58 +08:00
f669180572 Merge branch 'cjy-vitality'
# Conflicts:
#	AMSS_NCKU_source/bssn_class.C
#	AMSS_NCKU_source/makefile.inc
2026-04-28 16:55:47 +08:00
0db537479b Fix BSSN build config selection 2026-04-27 18:42:34 +08:00
1f3fd264c0 Add missing setup_transfer_caches() to bssnEM_class::Initialize()
bssnEScalar_class::Initialize() already calls setup_transfer_caches(),
but bssnEM_class::Initialize() did not. When USE_TRANSFER_CACHE=1,
the sync_cache pointers remain NULL, causing SIGSEGV in wrapper
methods that dereference sync_cache_*[lev].

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:32:27 +08:00
442674cedc Fix direct sync_cache accesses bypassing use_transfer_cache() guard
Seven Parallel::*_cached() calls in RestrictProlong and
RestrictProlong_aux were missed during the transfer-cache refactoring
(commits 9cd3741..8d28c29). When BSSN_USE_TRANSFER_CACHE=0, all
sync_cache pointers are NULL, so dereferencing sync_cache_*[lev]
triggers SIGSEGV.

Replace them with the equivalent wrapper methods (sync_evolution,
restrict_evolution, outbdlow2hi_evolution) that check
use_transfer_cache() and fall back to uncached direct calls.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 14:50:59 +08:00
a13b6901f6 Fix Z4C C++ gauge damping ordering 2026-04-27 12:00:47 +08:00
7e67feddbf Add C++ Z4C RHS path and port some BSSN optimizations 2026-04-25 08:03:19 +08:00
8d28c29a91 Default to safe BSSN-EScalar C kernel 2026-04-25 02:02:01 +08:00
0cf58176d9 Add safe BSSN-EScalar kernel and transfer toggles 2026-04-25 01:41:55 +08:00
0f1d0de1e7 Stabilize and wire BSSN-EScalar C path 2026-04-25 00:08:35 +08:00
c60bc03664 Also disable cached sync for Z4C 2026-04-24 21:06:13 +08:00
b57d80ca61 Disable cached sync for BSSN-EScalar 2026-04-24 01:58:57 +08:00
9cd3741a90 Fallback BSSN-EScalar restrict/prolong path 2026-04-24 01:37:54 +08:00
ac82ebd889 更新精度检查脚本加入图像比对检查 2026-04-15 00:49:46 +08:00
9c31384b2f Add optional BSSN kernel profiling switches 2026-04-13 16:51:06 +08:00
e4e741caa1 Remove dead chi derivative setup in BSSN RHS 2026-04-13 15:55:43 +08:00
65e0f95f40 Localize chi Ricci intermediates in RHS 2026-04-13 15:14:31 +08:00
f9fbf97e64 Elide dead stores in BSSN RHS hot path 2026-04-13 15:10:22 +08:00
968522995b Add fine-grained step timing and trim BH RHS overhead 2026-04-13 14:50:55 +08:00
f3988ac8ca Merge wave and mass extraction interpolation 2026-04-13 13:17:36 +08:00
e4c25eb21f Cache wave extraction angular kernels 2026-04-13 12:40:20 +08:00
4b10519876 Reuse mass integrand across detector radii 2026-04-13 11:55:41 +08:00
3a58273501 Batch constraint norm reductions 2026-04-13 11:48:02 +08:00
5c65cea2f0 Optimize constraint refresh after regrid 2026-04-13 11:39:50 +08:00
8c1f4d8108 迁移C算子的循环融合和临时量消除 2026-03-03 16:20:15 +08:00
d310ef918b bssn_rhs(fortran): migrate C kernel loop-fusion optimizations 2026-03-03 16:20:15 +08:00
b35e1b289f 设置开关关闭内存打印统计 2026-03-03 16:17:47 +08:00
05851b2c59 关闭静态负载 2026-03-03 16:17:47 +08:00
3b39583d67 fix(bssn_rhs) 2026-03-03 16:06:33 +08:00
688bdb6708 Merge pull request 'cjy-dystopia' (#3) from cjy-dystopia into main
Reviewed-on: https://seele.tail3b303.ts.net:3000/64-BitBrainstorm_2026/AMSS-NCKU/pulls/3
2026-03-02 21:36:26 +08:00
5070134857 perf(transfer_cached): 将 per-call new/delete 的 req_node/req_is_recv/completed 数组移入 SyncCache 复用
避免 transfer_cached 每次调用分配释放 3 个临时数组,减少堆操作开销。

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-02 21:14:35 +08:00
4012e9d068 perf(RestrictProlong): 用 Restrict_cached/OutBdLow2Hi_cached 替换非缓存版本,Sync_finish 改为渐进式解包
- RestrictProlong/RestrictProlong_aux 中的 Restrict() 和 OutBdLow2Hi() 替换为 _cached 版本,
  复用 gridseg 列表和 MPI 缓冲区,避免每次调用重新分配
- 新增 sync_cache_restrict/sync_cache_outbd 两组 per-level 缓存
- Sync_finish 从 MPI_Waitall 改为 MPI_Waitsome 渐进式解包,降低尾延迟
- AsyncSyncState 扩展 req_node/req_is_recv/pending_recv 字段支持渐进解包

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-02 20:48:38 +08:00
b3c367f15b prolong3 改为先算实际 stencil 窗口;只有窗口触及对称边界时才走全域 symmetry_bd,否则只复制必需窗口。restrict3 同样改成窗口判定,无触边时仅填 ii/jj/kk 必需窗口。 2026-03-02 17:38:56 +08:00
e73911f292 perf(restrict3): shrink X-pass ii sweep to required overlap window
- compute fi_min/fi_max from output i-range and derive ii_lo/ii_hi
 - replace full ii sweep (-1:extf(1)) with windowed sweep in Z/Y precompute passes
 - keep stencil math unchanged; add bounds sanity check for ii window
2026-03-02 17:37:13 +08:00
7543d3e8c7 perf(MPatch): 用空间 bin 索引加速 Interp_Points 的 block 归属查找
- 为 Patch::Interp_Points 三个重载引入 BlockBinIndex(候选筛选 + 全扫回退)
  - 保持原 point-in-block 判定与后续插值/通信流程不变
  - 将逐点线性扫块从 O(N_points*N_blocks) 降为近似 O(N_points*k)
  - 测试:bin 上限如果太大,会引入不必要的索引构建开销。将 bins 上限设为 16。

Co-authored-by: gpt-5.3-codex
2026-03-02 17:37:13 +08:00
42c69fab24 refactor(Parallel): streamline MPI communication by consolidating request handling and memory management 2026-03-02 17:37:13 +08:00
95220a05c8 optimize fdderivs core-region branch elimination for ghost_width=3 2026-03-02 17:33:26 +08:00
466b084a58 fix prolong/restrict index bounds after cherry-pick 12e1f63 2026-03-02 13:59:47 +08:00
61ccef9f97 prolong3: 减少Z-pass 冗余计算 2026-03-02 13:58:52 +08:00
e11363e06e Optimize fdderivs: skip redundant 2nd-order work in 4th-order overlap 2026-03-02 03:21:21 +08:00
f70e90f694 prolong3:提升cache命中率 2026-03-02 03:05:35 +08:00
jaunatisblue
75dd5353b0 修改prolong 2026-03-02 02:25:25 +08:00
jaunatisblue
23a82d063b 对prolong3做访存优化 2026-03-02 02:25:25 +08:00
524d1d1512 Merge pull request 'cjy-dystopia' (#2) from cjy-dystopia into main
Reviewed-on: https://seele.tail3b303.ts.net:3000/64-BitBrainstorm_2026/AMSS-NCKU/pulls/2
2026-03-01 19:22:09 +08:00