|
|
a0dab90bcb
|
Switch to NVIDIA HPC Toolchain
|
2026-04-29 08:31:49 +08:00 |
|
|
|
c689cc8dc9
|
[WIP] Add CUDA support for Z4C
Rewritten done by Codex.
This still has errors, do not pick this one now.
|
2026-04-27 11:58:43 +08:00 |
|
|
|
60fee8f1c1
|
Fix Z4C C++ gauge damping ordering
|
2026-04-26 15:38:13 +08:00 |
|
|
|
843b116954
|
Add C++ Z4C RHS path and port some BSSN optimizations
|
2026-04-25 10:39:01 +08:00 |
|
|
|
c768e1220b
|
Also disable cached sync for Z4C
|
2026-04-25 10:25:54 +08:00 |
|
|
|
02f149e2e3
|
Disable cached sync for BSSN-EScalar
|
2026-04-25 10:17:47 +08:00 |
|
|
|
422e8ec4dc
|
Fallback BSSN-EScalar restrict/prolong path
|
2026-04-25 10:10:34 +08:00 |
|
|
|
c4909b9843
|
更新精度检查脚本加入图像比对检查
(cherry picked from commit ac82ebd889)
|
2026-04-25 09:40:12 +08:00 |
|
|
|
f521a97563
|
Fix ABE CPU version build error
|
2026-04-25 09:39:49 +08:00 |
|
|
|
53c55451b3
|
Update makefile and scripts for CUDA BSSN configuration and build commands
|
2026-04-25 09:19:50 +08:00 |
|
|
|
768345954f
|
Add optional BSSN kernel profiling switches
(cherry picked from commit 9c31384b2f)
|
2026-04-25 08:39:43 +08:00 |
|
|
|
9a6df6438b
|
Remove dead chi derivative setup in BSSN RHS
(cherry picked from commit e4e741caa1)
|
2026-04-25 08:38:01 +08:00 |
|
|
|
8e9463aa90
|
Localize chi Ricci intermediates in RHS
(cherry picked from commit 65e0f95f40)
|
2026-04-25 08:37:41 +08:00 |
|
|
|
7c6f15002e
|
Elide dead stores in BSSN RHS hot path
(cherry picked from commit f9fbf97e64)
|
2026-04-25 08:37:40 +08:00 |
|
|
|
6410c62e3e
|
Add fine-grained step timing and trim BH RHS overhead
(cherry picked from commit 968522995b)
|
2026-04-25 08:37:19 +08:00 |
|
|
|
11977eb82f
|
Merge wave and mass extraction interpolation
(cherry picked from commit f3988ac8ca)
|
2026-04-25 08:25:34 +08:00 |
|
|
|
cce8a44fc4
|
Cache wave extraction angular kernels
(cherry picked from commit e4c25eb21f)
|
2026-04-25 08:24:36 +08:00 |
|
|
|
c589097618
|
Reuse mass integrand across detector radii
(cherry picked from commit 4b10519876)
|
2026-04-25 08:24:11 +08:00 |
|
|
|
b713e5a9be
|
Batch constraint norm reductions
(cherry picked from commit 3a58273501)
|
2026-04-25 08:22:00 +08:00 |
|
|
|
0396701572
|
Optimize constraint refresh after regrid
(cherry picked from commit 5c65cea2f0)
|
2026-04-25 08:18:51 +08:00 |
|
|
|
bb20c9a876
|
fix ADM Constrant Violation Analysis
|
2026-04-15 19:19:16 +08:00 |
|
|
|
8fe60ea703
|
Add zero matter handling and interpolation for resident state in CUDA BSSN
|
2026-04-15 00:25:53 +08:00 |
|
|
|
9ab7e7c7f9
|
Fuse phases 5 and 6 for Gamma_rhs computation and optimize phases 8 and 9 for efficiency
|
2026-04-14 23:23:04 +08:00 |
|
|
|
f9119e8a2a
|
Add resident-GA mode switch and simplify sync logic
|
2026-04-14 21:09:27 +08:00 |
|
|
|
726d743376
|
Fuse Ricci assembly and optimize trK/Aij gauge kernels
|
2026-04-14 19:20:12 +08:00 |
|
|
|
af344bf1e5
|
Add Phase-10 Ricci kernels and batch launch flow
|
2026-04-14 19:00:22 +08:00 |
|
|
|
7191fc0b96
|
Move resident sync comm buffers into StepAllocation pool
|
2026-04-13 21:04:44 +08:00 |
|
|
|
b3ec244cf9
|
Add batched first/second derivative kernels for CUDA RHS
|
2026-04-13 20:51:08 +08:00 |
|
|
|
e952ee8e91
|
Batch GA/BH subset sync with indexed GPU pack/unpack buffers
|
2026-04-13 20:40:09 +08:00 |
|
|
|
c5d1268dd1
|
Batch patch-boundary copy and gate CPU BC in GPU substeps
|
2026-04-13 11:52:17 +08:00 |
|
|
|
4bdfc90f22
|
Pass pointer tables as kernel args and skip redundant symbol uploads
|
2026-04-13 11:19:00 +08:00 |
|
|
|
c49a4e00c9
|
Batch symbd_pack/lopsided/kodiss over all state variables
|
2026-04-13 11:02:55 +08:00 |
|
|
|
1b3c0b80d2
|
Refactor CUDA step buffers to remove loop-time allocations
|
2026-04-13 10:33:03 +08:00 |
|
|
|
636e35bfd8
|
Add direct CUDA resident-state sync path and profiling hooks
|
2026-04-13 00:57:05 +08:00 |
|
|
|
7f2a391dd2
|
Cache matter fields in StepContext across RK4 substeps
|
2026-04-12 22:19:45 +08:00 |
|
|
|
4fa12a2009
|
Integrate CUDA support into RK4 substep execution
|
2026-04-12 22:11:44 +08:00 |
|
|
|
86a683de26
|
Replace legacy ABEGPU stack with ABE_CUDA backend
|
2026-04-12 21:19:14 +08:00 |
|
|
|
aaf7bf0a26
|
Merge remote-tracking branch 'origin/main'
|
2026-04-12 20:55:42 +08:00 |
|
|
|
8c1f4d8108
|
迁移C算子的循环融合和临时量消除
|
2026-03-03 16:20:15 +08:00 |
|
|
|
d310ef918b
|
bssn_rhs(fortran): migrate C kernel loop-fusion optimizations
|
2026-03-03 16:20:15 +08:00 |
|
|
|
b35e1b289f
|
设置开关关闭内存打印统计
|
2026-03-03 16:17:47 +08:00 |
|
|
|
05851b2c59
|
关闭静态负载
|
2026-03-03 16:17:47 +08:00 |
|
|
|
3b39583d67
|
fix(bssn_rhs)
|
2026-03-03 16:06:33 +08:00 |
|
|
|
9c44d1c885
|
fix(bssn_rhs)
|
2026-03-03 16:00:45 +08:00 |
|
|
|
4b9de28feb
|
将 Restrict/Prolong 链路里的 coarse-level Sync_cached 改为可选(默认跳过)
OutBdLow2Hi_cached 读的是 coarse owned 区域(非 coarse ghost/buffer)
回退旧行为:编译时定义 RP_SYNC_COARSE_AFTER_RESTRICT=1
|
2026-03-03 14:25:27 +08:00 |
|
|
|
4eb5dc4ddb
|
删除重复的一次 chi 一阶导计算
|
2026-03-03 14:23:56 +08:00 |
|
|
|
688bdb6708
|
Merge pull request 'cjy-dystopia' (#3) from cjy-dystopia into main
Reviewed-on: https://seele.tail3b303.ts.net:3000/64-BitBrainstorm_2026/AMSS-NCKU/pulls/3
|
2026-03-02 21:36:26 +08:00 |
|
|
|
5070134857
|
perf(transfer_cached): 将 per-call new/delete 的 req_node/req_is_recv/completed 数组移入 SyncCache 复用
避免 transfer_cached 每次调用分配释放 3 个临时数组,减少堆操作开销。
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-02 21:14:35 +08:00 |
|
|
|
4012e9d068
|
perf(RestrictProlong): 用 Restrict_cached/OutBdLow2Hi_cached 替换非缓存版本,Sync_finish 改为渐进式解包
- RestrictProlong/RestrictProlong_aux 中的 Restrict() 和 OutBdLow2Hi() 替换为 _cached 版本,
复用 gridseg 列表和 MPI 缓冲区,避免每次调用重新分配
- 新增 sync_cache_restrict/sync_cache_outbd 两组 per-level 缓存
- Sync_finish 从 MPI_Waitall 改为 MPI_Waitsome 渐进式解包,降低尾延迟
- AsyncSyncState 扩展 req_node/req_is_recv/pending_recv 字段支持渐进解包
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-02 20:48:38 +08:00 |
|
|
|
b3c367f15b
|
prolong3 改为先算实际 stencil 窗口;只有窗口触及对称边界时才走全域 symmetry_bd,否则只复制必需窗口。restrict3 同样改成窗口判定,无触边时仅填 ii/jj/kk 必需窗口。
|
2026-03-02 17:38:56 +08:00 |
|