Commit Graph

  • a918dc103e Add SyncBegin/SyncEnd to Parallel for MPI communication-computation overlap copilot-swe-agent[bot] 2026-02-08 08:00:15 +00:00
  • 4bb6c03013 makefile setting updated CGH0S7 2026-02-08 16:14:43 +08:00
  • 38c2c30186 Merge lopsided advection + kodis dissipation to share symmetry_bd buffer copilot-swe-agent[bot] 2026-02-08 06:38:03 +00:00
  • b8e41b2b39 Only enable OpenMP for TwoPunctures ianchb 2026-02-08 13:00:37 +08:00
  • 3f7e20f702 删除diff_new.f90中冗余部分,方便后续工作 yx-fmisc jaunatisblue 2026-02-08 00:54:23 +08:00
  • 6796384bf4 taskset setting updated cjy-oneapi-opus-rhs-preview CGH0S7 2026-02-07 22:24:02 +08:00
  • c974a88d6d Pool fh work arrays in compute_rhs_bssn to eliminate allocation churn CGH0S7 2026-02-07 21:49:12 +08:00
  • 133e4f13a2 Use OpenMP's parallel for with schedule(dynamic,1) ianchb 2026-02-07 19:04:51 +08:00
  • 914c4f4791 Optimize memory allocation in JFD_times_dv ianchb 2026-02-07 15:55:45 +08:00
  • f345b0e520 Performance optimization for the TwoPunctures module * Re-enabled OpenMP. ianchb 2026-02-07 14:46:46 +08:00
  • f5ed23d687 Revert "Eliminate hot-path heap allocations in TwoPunctures spectral solver" ianchb 2026-02-07 10:35:05 +08:00
  • 03d501db04 Display the runtime of TwoPunctures ianchb 2026-02-06 21:27:41 +08:00
  • c6e4d4ab71 Add OpenMP parallelization to BSSN RHS hot-path stencil routines cjy-oneapi-opus-preview CGH0S7 2026-02-07 13:58:55 +08:00
  • 673dd20722 对fmisc.f90的polint修改 jaunatisblue 2026-02-07 01:56:44 +08:00
  • 09ffdb553d Eliminate hot-path heap allocations in TwoPunctures spectral solver CGH0S7 2026-02-06 21:20:35 +08:00
  • 699e443c7a Optimize polint/polin2/polin3 interpolation for cache locality CGH0S7 2026-02-06 19:00:35 +08:00
  • 24bfa44911 Disable NaN sanity check in bssn_rhs.f90 for production builds CGH0S7 2026-02-06 18:36:29 +08:00
  • 6738854a9d Compiler-level and hot-path optimizations for GW150914 CGH0S7 2026-02-06 17:13:39 +08:00
  • 4eb698f496 Add MPI+OpenMP hybrid parallelism (48 ranks x 2 threads) for full 96-core utilization cjy-oneapi-opus-openmp CGH0S7 2026-02-06 15:53:15 +08:00
  • 223ec17a54 input updated cjy-oneapi CGH0S7 2026-02-06 13:57:48 +08:00
  • 082f9c3423 feat: Implement hybrid MPI+OpenMP parallelization - Enable -qopenmp in makefile.inc - Add OpenMP directives to 4th order derivatives in diff_new.f90 - Update makefile_and_run.py to dynamic calculate OMP_NUM_THREADS based on 96 cores and remove hardcoded CPU binding cjy-oneapi-openmp CGH0S7 2026-02-06 13:25:07 +08:00
  • 79af79d471 baseline updated baseline CGH0S7 2026-02-05 19:53:55 +08:00
  • 6fffaa13f6 Optimize buffer_width dynamically based on FD order to improve scalability cjy-oneapi-parallel CGH0S7 2026-01-31 19:04:19 +08:00
  • 6684016e8c Optimize MPI domain decomposition min_width calculation to improve scalability CGH0S7 2026-01-31 16:23:16 +08:00
  • 95575d9450 fix: try to fix segfault at 240 steps by adding WithShell guard for writecheck_sh call chb-local ianchb 2026-01-22 14:26:41 +08:00
  • d11eaa2242 Optimize bssn_rhs.f90: Fuse loops for metric inversion and Christoffel symbols to improve cache locality cjy-oneapi-laptop CGH0S7 2026-01-21 11:22:33 +08:00
  • 54600327da fix(build): update makefile.inc for debian 13 ianchb 2026-01-21 09:29:35 +08:00
  • ef96766e22 优化 compute_rhs_bssn 热点路径并加入 NaN 检查开关 CGH0S7 2026-01-20 19:37:26 +08:00
  • ae7b77e44c Setup GW150914-mini test case for laptop development CGH0S7 2026-01-20 00:31:40 +08:00
  • 26c81d8e81 makefile updated CGH0S7 2026-01-19 23:53:16 +08:00
  • ed89bc029b Fix potential division by zero in reta_val calculation and enable NaN checks cjy-oneapi-test CGH0S7 2026-01-19 20:29:48 +08:00
  • 19274e93d1 Fix boundary handling in bssn_rhs_opt.f90 to prevent NaNs CGH0S7 2026-01-19 20:03:22 +08:00
  • ae1a474cca Fix compilation errors and complete logic in BSSN RHS optimization CGH0S7 2026-01-19 19:22:52 +08:00
  • cbb8fb3a87 patched last commit CGH0S7 2026-01-19 17:14:28 +08:00
  • 4472d89a9f Optimize bssn_rhs calculation with cache blocking and vectorization CGH0S7 2026-01-19 16:39:24 +08:00
  • 3914659ebb Optimize BSSN RHS and finite difference calculations - Integrate Intel oneMKL VML for efficient Gauge calculation in bssn_rhs.f90 - Refactor fderivs in diff_new.f90 to separate bulk/boundary loops for better vectorization - Add optimization report in docs/optimization_report.md cjy-oneapi-preview CGH0S7 2026-01-19 10:49:14 +08:00
  • 039dce4d65 Add aggressive compiler optimizations and vectorization directives CGH0S7 2026-01-19 10:17:31 +08:00
  • c524228d23 Enable multi-threaded MKL for better resource utilization CGH0S7 2026-01-19 09:31:29 +08:00
  • 9deeda9831 Refactor verification method and optimize numerical kernels with oneMKL BLAS CGH0S7 2026-01-18 14:25:21 +08:00
  • 3a7bce3af2 Update Intel oneAPI configuration and CPU binding settings CGH0S7 2026-01-17 20:41:02 +08:00
  • c6945bb095 Rename verify_accuracy.py to AMSS_NCKU_Verify_ASC26.py and improve visual output CGH0S7 2026-01-17 14:54:33 +08:00
  • 0d24f1503c Add accuracy verification script for GW150914 simulation CGH0S7 2026-01-17 00:37:30 +08:00
  • cb252f5ea2 Optimize numerical algorithms with Intel oneMKL CGH0S7 2026-01-16 10:58:11 +08:00
  • 7a76cbaafd Add numactl CPU binding to avoid cores 0-3 and 56-59 CGH0S7 2026-01-16 10:24:46 +08:00
  • 57a7376044 Switch compiler toolchain from GCC to Intel oneAPI CGH0S7 2026-01-15 16:32:12 +08:00
  • cd5ceaa15f main branch updated CGH0S7 2026-01-14 08:55:53 +08:00
  • 75be0968fc feat: port GPU code to CUDA 13 and enable GPU computation cjy CGH0S7 2026-01-13 18:15:49 +00:00
  • b27e071cde Makefile updated for rocky10 CGH0S7 2026-01-14 01:41:31 +08:00
  • a1125d4c79 try to build gpu version CGH0S7 2026-01-13 23:52:44 +08:00
  • dcc66588fc gitignore updated CGH0S7 2026-01-13 23:45:49 +08:00
  • 950d448edf fix(build): update LDLIBS to use -lmpi and remove hardcoded paths CGH0S7 2026-01-13 23:40:51 +08:00
  • f2fc9af70e asc26 amss-ncku initialized CGH0S7 2026-01-13 15:01:15 +08:00