AMSS-NCKU

64-BitBrainstorm_2026/AMSS-NCKU

Author	SHA1	Message	Date
CGH0S7	39450228f5	Accelerate Shell-Patch interpolation fast paths	2026-05-08 13:26:16 +08:00
CGH0S7	063f28b3b4	Add Shell-Patch GPU runtime fast paths	2026-05-08 09:26:36 +08:00
CGH0S7	1064a68d16	Optimize BSSN-EM 8th-order AMR transfers	2026-05-07 21:38:16 +08:00
CGH0S7	dcc83bafcb	Support 2nd and 8th order CUDA AMR paths	2026-05-07 20:31:26 +08:00
CGH0S7	c4d8d41b25	Cover Z4C CUDA AMR restrict prolong	2026-05-07 19:49:09 +08:00
CGH0S7	9ff2f065be	Apply BSSN AMR sync default to EScalar	2026-05-07 17:12:33 +08:00
CGH0S7	2317e4abde	Fix BSSN GPU resident AMR sync default	2026-05-07 17:11:09 +08:00
CGH0S7	96829d0441	Optimize Z4C GPU runtime defaults	2026-05-07 15:37:09 +08:00
CGH0S7	83afaf19ce	Skip zero EM resident downloads	2026-05-07 13:04:46 +08:00
CGH0S7	cb911dec06	Add EM GPU fast paths and defaults	2026-05-07 12:18:56 +08:00
CGH0S7	ffa0d801ed	Default Python GPU runner to EScalar fast path	2026-05-06 00:12:46 +08:00
CGH0S7	85fe29cc2e	Optimize BSSN-EScalar CUDA path	2026-05-05 10:47:46 +08:00
ianchb	06f62dee36	Switch back to Intel toolchain as the default option Seems that Intel MPI also supports CUDA-aware by setting I_MPI_OFFLOAD to 1. Besides, I_MPI_OFFLOAD_IPC=0 is needed to avoid segfaults.	2026-05-01 21:59:13 +08:00
CGH0S7	a9a3809148	Default Python launcher to fast GPU path	2026-04-30 20:15:34 +08:00
CGH0S7	e0d0673c8e	Enable optimized GPU runs from Python launcher	2026-04-30 18:31:31 +08:00
ianchb	c689cc8dc9	[WIP] Add CUDA support for Z4C Rewritten done by Codex. This still has errors, do not pick this one now.	2026-04-27 11:58:43 +08:00
ianchb	53c55451b3	Update makefile and scripts for CUDA BSSN configuration and build commands	2026-04-25 09:19:50 +08:00
ianchb	86a683de26	Replace legacy ABEGPU stack with ABE_CUDA backend	2026-04-12 21:19:14 +08:00
CGH0S7	44efb2e08c	预赛最终版本v1.0.0: 确定PGO和原负载均衡方案在当前版本造成负优化已经回退	2026-03-01 18:04:25 +08:00
CGH0S7	1eba73acbe	先关闭绑核心，发现速度对比：不绑定核心+SCX>绑核心+SCX	2026-02-28 23:27:44 +08:00
CGH0S7	e0b5e012df	引入 PGO 式两遍编译流程，将 Interp_Points 负载均衡优化合法化背景：上一个 commit 中同事实现的热点 block 拆分与 rank 重映射取得了显著加速效果，但其中硬编码了 heavy ranks (27/28/35/36) 和重映射表，属于针对特定测例的优化，违反竞赛规则第 6 条（不允许针对参数或测例的专门优化）。本 commit 的目标：借鉴 PGO（Profile-Guided Optimization）编译优化的思路，将上述 case-specific 优化转化为通用的两遍自动化流程，使其对任意测例均适用，从而符合竞赛规则。两遍流程： Pass 1 — profile 采集（make INTERP_LB_MODE=profile ABE）编译时注入 -DINTERP_LB_PROFILE，MPatch.C 中 Interp_Points 在首次调用时用 MPI_Wtime 计时 + MPI_Gather 汇总各 rank 耗时，识别超过均值 2.5 倍的热点 rank，写入 interp_lb_profile.bin。中间步骤 — 生成编译时头文件 python3 gen_interp_lb_header.py 读取 profile.bin，自动计算拆分策略和重映射表，生成 interp_lb_profile_data.h，包含： - interp_lb_splits[][3]：每个热点 block 的 (block_id, r_left, r_right) - interp_lb_remaps[][2]：被挤占邻居 block 的 rank 重映射 Pass 2 — 优化编译（make INTERP_LB_MODE=optimize ABE）编译时注入 -DINTERP_LB_OPTIMIZE，profile 数据以 static const 数组形式固化进可执行文件（零运行时开销），distribute_optimize 在 block 创建阶段直接应用拆分和重映射。具体改动： - makefile.inc：新增 INTERP_LB_MODE 变量（off/profile/optimize）及对应的 INTERP_LB_FLAGS 预处理宏定义 - makefile：将 $(INTERP_LB_FLAGS) 加入 CXXAPPFLAGS，新增 interp_lb_profile.o 编译目标 - gen_interp_lb_header.py：profile.bin → interp_lb_profile_data.h 的自动转换脚本 - interp_lb_profile_data.h：自动生成的编译时常量头文件 - interp_lb_profile.bin：profile 采集阶段生成的二进制数据 - AMSS_NCKU_Program.py：构建时自动拷贝 profile.bin 到运行目录 - makefile_and_run.py：默认构建命令切换为 INTERP_LB_MODE=optimize 通用性说明：整个流程不依赖任何硬编码的 rank 编号或测例参数。对于不同的网格配置、进程数或物理问题，只需重新执行 Pass 1 采集 profile，即可自动生成对应的优化方案。这与 PGO 编译优化的理念完全一致——先 profile 再优化，是一种通用的性能优化方法论。	2026-02-27 15:10:22 +08:00
CGH0S7	e2bc472845	优化绑核逻辑，取消硬编码改为智能识别	2026-02-25 10:59:32 +08:00
CGH0S7	4bb6c03013	makefile setting updated	2026-02-08 16:14:43 +08:00
ianchb	03d501db04	Display the runtime of TwoPunctures	2026-02-07 14:45:16 +08:00
CGH0S7	223ec17a54	input updated	2026-02-06 13:57:48 +08:00
CGH0S7	3a7bce3af2	Update Intel oneAPI configuration and CPU binding settings - Update makefile.inc with Intel oneAPI compiler flags and oneMKL linking - Configure taskset CPU binding to use nohz_full cores (4-55, 60-111) - Set build parallelism to 104 jobs for faster compilation - Update MPI process count to 48 in input configuration	2026-01-17 20:41:02 +08:00
CGH0S7	7a76cbaafd	Add numactl CPU binding to avoid cores 0-3 and 56-59 Bind all computation processes (ABE, ABEGPU, TwoPunctureABE) to CPU cores 4-55 and 60-111 using numactl --physcpubind to prevent interference with system processes on reserved cores.	2026-01-16 10:24:46 +08:00
CGH0S7	f2fc9af70e	asc26 amss-ncku initialized	2026-01-13 15:01:15 +08:00

28 Commits