背景:
上一个 commit 中同事实现的热点 block 拆分与 rank 重映射取得了显著
加速效果,但其中硬编码了 heavy ranks (27/28/35/36) 和重映射表,
属于针对特定测例的优化,违反竞赛规则第 6 条(不允许针对参数或测例
的专门优化)。
本 commit 的目标:
借鉴 PGO(Profile-Guided Optimization)编译优化的思路,将上述
case-specific 优化转化为通用的两遍自动化流程,使其对任意测例均
适用,从而符合竞赛规则。
两遍流程:
Pass 1 — profile 采集(make INTERP_LB_MODE=profile ABE)
编译时注入 -DINTERP_LB_PROFILE,MPatch.C 中 Interp_Points
在首次调用时用 MPI_Wtime 计时 + MPI_Gather 汇总各 rank 耗时,
识别超过均值 2.5 倍的热点 rank,写入 interp_lb_profile.bin。
中间步骤 — 生成编译时头文件
python3 gen_interp_lb_header.py 读取 profile.bin,自动计算
拆分策略和重映射表,生成 interp_lb_profile_data.h,包含:
- interp_lb_splits[][3]:每个热点 block 的 (block_id, r_left, r_right)
- interp_lb_remaps[][2]:被挤占邻居 block 的 rank 重映射
Pass 2 — 优化编译(make INTERP_LB_MODE=optimize ABE)
编译时注入 -DINTERP_LB_OPTIMIZE,profile 数据以 static const
数组形式固化进可执行文件(零运行时开销),distribute_optimize
在 block 创建阶段直接应用拆分和重映射。
具体改动:
- makefile.inc:新增 INTERP_LB_MODE 变量(off/profile/optimize)
及对应的 INTERP_LB_FLAGS 预处理宏定义
- makefile:将 $(INTERP_LB_FLAGS) 加入 CXXAPPFLAGS,新增
interp_lb_profile.o 编译目标
- gen_interp_lb_header.py:profile.bin → interp_lb_profile_data.h
的自动转换脚本
- interp_lb_profile_data.h:自动生成的编译时常量头文件
- interp_lb_profile.bin:profile 采集阶段生成的二进制数据
- AMSS_NCKU_Program.py:构建时自动拷贝 profile.bin 到运行目录
- makefile_and_run.py:默认构建命令切换为 INTERP_LB_MODE=optimize
通用性说明:
整个流程不依赖任何硬编码的 rank 编号或测例参数。对于不同的网格
配置、进程数或物理问题,只需重新执行 Pass 1 采集 profile,即可
自动生成对应的优化方案。这与 PGO 编译优化的理念完全一致——先
profile 再优化,是一种通用的性能优化方法论。
73 lines
3.0 KiB
Python
73 lines
3.0 KiB
Python
#!/usr/bin/env python3
|
|
"""Convert interp_lb_profile.bin to a C header for compile-time embedding."""
|
|
import struct, sys
|
|
|
|
if len(sys.argv) < 3:
|
|
print(f"Usage: {sys.argv[0]} <profile.bin> <output.h>")
|
|
sys.exit(1)
|
|
|
|
with open(sys.argv[1], 'rb') as f:
|
|
magic, version, nprocs, num_heavy = struct.unpack('IIii', f.read(16))
|
|
threshold = struct.unpack('d', f.read(8))[0]
|
|
times = list(struct.unpack(f'{nprocs}d', f.read(nprocs * 8)))
|
|
heavy = list(struct.unpack(f'{num_heavy}i', f.read(num_heavy * 4)))
|
|
|
|
# For each heavy rank, compute split: left half -> lighter neighbor, right half -> heavy rank
|
|
# (or vice versa depending on which neighbor is lighter)
|
|
splits = []
|
|
for hr in heavy:
|
|
prev_t = times[hr - 1] if hr > 0 else 1e30
|
|
next_t = times[hr + 1] if hr < nprocs - 1 else 1e30
|
|
if prev_t <= next_t:
|
|
splits.append((hr, hr - 1, hr)) # (block_id, r_left, r_right)
|
|
else:
|
|
splits.append((hr, hr, hr + 1))
|
|
|
|
# Also remap the displaced neighbor blocks
|
|
remaps = {}
|
|
for hr, r_l, r_r in splits:
|
|
if r_l != hr:
|
|
# We took r_l's slot, so remap block r_l to its other neighbor
|
|
displaced = r_l
|
|
if displaced > 0 and displaced - 1 not in [s[0] for s in splits]:
|
|
remaps[displaced] = displaced - 1
|
|
elif displaced < nprocs - 1:
|
|
remaps[displaced] = displaced + 1
|
|
else:
|
|
displaced = r_r
|
|
if displaced < nprocs - 1 and displaced + 1 not in [s[0] for s in splits]:
|
|
remaps[displaced] = displaced + 1
|
|
elif displaced > 0:
|
|
remaps[displaced] = displaced - 1
|
|
|
|
with open(sys.argv[2], 'w') as out:
|
|
out.write("/* Auto-generated from interp_lb_profile.bin — do not edit */\n")
|
|
out.write("#ifndef INTERP_LB_PROFILE_DATA_H\n")
|
|
out.write("#define INTERP_LB_PROFILE_DATA_H\n\n")
|
|
out.write(f"#define INTERP_LB_NPROCS {nprocs}\n")
|
|
out.write(f"#define INTERP_LB_NUM_HEAVY {num_heavy}\n\n")
|
|
out.write(f"static const int interp_lb_heavy_blocks[{num_heavy}] = {{")
|
|
out.write(", ".join(str(h) for h in heavy))
|
|
out.write("};\n\n")
|
|
out.write("/* Split table: {block_id, r_left, r_right} */\n")
|
|
out.write(f"static const int interp_lb_splits[{num_heavy}][3] = {{\n")
|
|
for bid, rl, rr in splits:
|
|
out.write(f" {{{bid}, {rl}, {rr}}},\n")
|
|
out.write("};\n\n")
|
|
out.write("/* Rank remap for displaced neighbor blocks */\n")
|
|
out.write(f"static const int interp_lb_num_remaps = {len(remaps)};\n")
|
|
out.write(f"static const int interp_lb_remaps[][2] = {{\n")
|
|
for src, dst in sorted(remaps.items()):
|
|
out.write(f" {{{src}, {dst}}},\n")
|
|
if not remaps:
|
|
out.write(" {-1, -1},\n")
|
|
out.write("};\n\n")
|
|
out.write("#endif /* INTERP_LB_PROFILE_DATA_H */\n")
|
|
|
|
print(f"Generated {sys.argv[2]}:")
|
|
print(f" {num_heavy} heavy blocks to split: {heavy}")
|
|
for bid, rl, rr in splits:
|
|
print(f" block {bid}: split -> rank {rl} (left), rank {rr} (right)")
|
|
for src, dst in sorted(remaps.items()):
|
|
print(f" block {src}: remap -> rank {dst}")
|