39450228f5
Accelerate Shell-Patch interpolation fast paths
2026-05-08 13:26:16 +08:00
063f28b3b4
Add Shell-Patch GPU runtime fast paths
2026-05-08 09:26:36 +08:00
1064a68d16
Optimize BSSN-EM 8th-order AMR transfers
2026-05-07 21:38:16 +08:00
dcc83bafcb
Support 2nd and 8th order CUDA AMR paths
2026-05-07 20:31:26 +08:00
c4d8d41b25
Cover Z4C CUDA AMR restrict prolong
2026-05-07 19:49:09 +08:00
9ff2f065be
Apply BSSN AMR sync default to EScalar
2026-05-07 17:12:33 +08:00
2317e4abde
Fix BSSN GPU resident AMR sync default
2026-05-07 17:11:09 +08:00
96829d0441
Optimize Z4C GPU runtime defaults
2026-05-07 15:37:09 +08:00
83afaf19ce
Skip zero EM resident downloads
2026-05-07 13:04:46 +08:00
cb911dec06
Add EM GPU fast paths and defaults
2026-05-07 12:18:56 +08:00
ffa0d801ed
Default Python GPU runner to EScalar fast path
2026-05-06 00:12:46 +08:00
85fe29cc2e
Optimize BSSN-EScalar CUDA path
2026-05-05 10:47:46 +08:00
06f62dee36
Switch back to Intel toolchain as the default option
...
Seems that Intel MPI also supports CUDA-aware by setting I_MPI_OFFLOAD to 1. Besides, I_MPI_OFFLOAD_IPC=0 is needed to avoid segfaults.
2026-05-01 21:59:13 +08:00
a9a3809148
Default Python launcher to fast GPU path
2026-04-30 20:15:34 +08:00
e0d0673c8e
Enable optimized GPU runs from Python launcher
2026-04-30 18:31:31 +08:00
c689cc8dc9
[WIP] Add CUDA support for Z4C
...
Rewritten done by Codex.
This still has errors, do not pick this one now.
2026-04-27 11:58:43 +08:00
53c55451b3
Update makefile and scripts for CUDA BSSN configuration and build commands
2026-04-25 09:19:50 +08:00
86a683de26
Replace legacy ABEGPU stack with ABE_CUDA backend
2026-04-12 21:19:14 +08:00
44efb2e08c
预赛最终版本v1.0.0: 确定PGO和原负载均衡方案在当前版本造成负优化已经回退
2026-03-01 18:04:25 +08:00
1eba73acbe
先关闭绑核心,发现速度对比:不绑定核心+SCX>绑核心+SCX
2026-02-28 23:27:44 +08:00
e0b5e012df
引入 PGO 式两遍编译流程,将 Interp_Points 负载均衡优化合法化
...
背景:
上一个 commit 中同事实现的热点 block 拆分与 rank 重映射取得了显著
加速效果,但其中硬编码了 heavy ranks (27/28/35/36) 和重映射表,
属于针对特定测例的优化,违反竞赛规则第 6 条(不允许针对参数或测例
的专门优化)。
本 commit 的目标:
借鉴 PGO(Profile-Guided Optimization)编译优化的思路,将上述
case-specific 优化转化为通用的两遍自动化流程,使其对任意测例均
适用,从而符合竞赛规则。
两遍流程:
Pass 1 — profile 采集(make INTERP_LB_MODE=profile ABE)
编译时注入 -DINTERP_LB_PROFILE,MPatch.C 中 Interp_Points
在首次调用时用 MPI_Wtime 计时 + MPI_Gather 汇总各 rank 耗时,
识别超过均值 2.5 倍的热点 rank,写入 interp_lb_profile.bin。
中间步骤 — 生成编译时头文件
python3 gen_interp_lb_header.py 读取 profile.bin,自动计算
拆分策略和重映射表,生成 interp_lb_profile_data.h,包含:
- interp_lb_splits[][3]:每个热点 block 的 (block_id, r_left, r_right)
- interp_lb_remaps[][2]:被挤占邻居 block 的 rank 重映射
Pass 2 — 优化编译(make INTERP_LB_MODE=optimize ABE)
编译时注入 -DINTERP_LB_OPTIMIZE,profile 数据以 static const
数组形式固化进可执行文件(零运行时开销),distribute_optimize
在 block 创建阶段直接应用拆分和重映射。
具体改动:
- makefile.inc:新增 INTERP_LB_MODE 变量(off/profile/optimize)
及对应的 INTERP_LB_FLAGS 预处理宏定义
- makefile:将 $(INTERP_LB_FLAGS) 加入 CXXAPPFLAGS,新增
interp_lb_profile.o 编译目标
- gen_interp_lb_header.py:profile.bin → interp_lb_profile_data.h
的自动转换脚本
- interp_lb_profile_data.h:自动生成的编译时常量头文件
- interp_lb_profile.bin:profile 采集阶段生成的二进制数据
- AMSS_NCKU_Program.py:构建时自动拷贝 profile.bin 到运行目录
- makefile_and_run.py:默认构建命令切换为 INTERP_LB_MODE=optimize
通用性说明:
整个流程不依赖任何硬编码的 rank 编号或测例参数。对于不同的网格
配置、进程数或物理问题,只需重新执行 Pass 1 采集 profile,即可
自动生成对应的优化方案。这与 PGO 编译优化的理念完全一致——先
profile 再优化,是一种通用的性能优化方法论。
2026-02-27 15:10:22 +08:00
CGH0S7
e2bc472845
优化绑核逻辑,取消硬编码改为智能识别
2026-02-25 10:59:32 +08:00
4bb6c03013
makefile setting updated
2026-02-08 16:14:43 +08:00
03d501db04
Display the runtime of TwoPunctures
2026-02-07 14:45:16 +08:00
223ec17a54
input updated
2026-02-06 13:57:48 +08:00
CGH0S7
3a7bce3af2
Update Intel oneAPI configuration and CPU binding settings
...
- Update makefile.inc with Intel oneAPI compiler flags and oneMKL linking
- Configure taskset CPU binding to use nohz_full cores (4-55, 60-111)
- Set build parallelism to 104 jobs for faster compilation
- Update MPI process count to 48 in input configuration
2026-01-17 20:41:02 +08:00
CGH0S7
7a76cbaafd
Add numactl CPU binding to avoid cores 0-3 and 56-59
...
Bind all computation processes (ABE, ABEGPU, TwoPunctureABE) to
CPU cores 4-55 and 60-111 using numactl --physcpubind to prevent
interference with system processes on reserved cores.
2026-01-16 10:24:46 +08:00
f2fc9af70e
asc26 amss-ncku initialized
2026-01-13 15:01:15 +08:00