Fix potential division by zero in reta_val calculation and enable NaN checks

Added a safety check for the denominator in the reta_val calculation to prevent division by zero when chi approaches zero (e.g., at far-field boundaries). Also enabled DEBUG_NAN_CHECK macro to catch invalid inputs early. Initialized output arrays to zero to prevent uninitialized memory access.
Fix boundary handling in bssn_rhs_opt.f90 to prevent NaNs
2026-01-19 20:29:48 +08:00 · 2026-01-19 20:03:22 +08:00 · 2026-01-19 19:22:52 +08:00 · 2026-01-19 17:14:28 +08:00 · 2026-01-19 16:39:24 +08:00
13 changed files with 2380 additions and 1960 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,7 +1,6 @@
 __pycache__
 GW150914
 GW150914-origin
-GW150914-mini
 docs
 *.tmp

--- a/AMSS_NCKU_Input.py
+++ b/AMSS_NCKU_Input.py
@@ -16,14 +16,12 @@ import numpy
 File_directory   = "GW150914"                    ## output file directory
 Output_directory = "binary_output"               ## binary data file directory
                                                 ## The file directory name should not be too long
-MPI_processes    = 8                             ## number of mpi processes used in the simulation
+MPI_processes    = 48                             ## number of mpi processes used in the simulation

 GPU_Calculation  = "no"                          ## Use GPU or not 
                                                 ## (prefer "no" in the current version, because the GPU part may have bugs when integrated in this Python interface)
 CPU_Part         = 1.0
 GPU_Part         = 0.0
-Debug_NaN_Check          = 0                       ## enable NaN checks in compute_rhs_bssn: 0 (off) or 1 (on)
-

 #################################################

--- a/AMSS_NCKU_Input_Mini.py
+++ b/AMSS_NCKU_Input_Mini.py
@@ -1,233 +0,0 @@
-
-#################################################
-##
-## This file provides the input parameters required for numerical relativity.
-## XIAOQU
-## 2024/03/19 --- 2025/09/14
-## Modified for GW150914-mini test case
-##
-#################################################
-
-import numpy    
-
-#################################################
-
-## Setting MPI processes and the output file directory
-
-File_directory   = "GW150914-mini"               ## output file directory
-Output_directory = "binary_output"               ## binary data file directory
-                                                 ## The file directory name should not be too long
-MPI_processes    = 4                             ## number of mpi processes used in the simulation (Reduced for laptop)
-
-GPU_Calculation  = "no"                          ## Use GPU or not 
-                                                 ## (prefer "no" in the current version, because the GPU part may have bugs when integrated in this Python interface)
-CPU_Part         = 1.0
-GPU_Part         = 0.0
-
-#################################################
-
-
-#################################################
-
-## Setting the physical system and numerical method
-
-Symmetry                 = "equatorial-symmetry"   ## Symmetry of System: choose equatorial-symmetry、no-symmetry、octant-symmetry
-Equation_Class           = "BSSN"                  ## Evolution Equation: choose "BSSN", "BSSN-EScalar", "BSSN-EM", "Z4C" 
-                                                   ## If "BSSN-EScalar" is chosen, it is necessary to set other parameters below
-Initial_Data_Method      = "Ansorg-TwoPuncture"    ## initial data method: choose "Ansorg-TwoPuncture", "Lousto-Analytical", "Cao-Analytical", "KerrSchild-Analytical"
-Time_Evolution_Method    = "runge-kutta-45"        ## time evolution method: choose "runge-kutta-45"
-Finite_Diffenence_Method = "4th-order"             ## finite-difference method: choose "2nd-order", "4th-order", "6th-order", "8th-order"
-Debug_NaN_Check          = 0                       ## enable NaN checks in compute_rhs_bssn: 0 (off) or 1 (on)
-
-#################################################
-
-
-#################################################
-
-## Setting the time evolutionary information
-
-Start_Evolution_Time     = 0.0                    ## start evolution time t0
-Final_Evolution_Time     = 100.0                  ## final evolution time t1 (Reduced for quick test)
-Check_Time               = 10.0
-Dump_Time                = 10.0                   ## time inteval dT for dumping binary data
-D2_Dump_Time             = 10.0                   ## dump the ascii data for 2d surface after dT'
-Analysis_Time            = 1.0                    ## dump the puncture position and GW psi4 after dT"
-Evolution_Step_Number    = 10000000               ## stop the calculation after the maximal step number
-Courant_Factor           = 0.5                    ## Courant Factor
-Dissipation              = 0.15                   ## Kreiss-Oliger Dissipation Strength
-
-#################################################
-
-
-#################################################
-
-## Setting the grid structure
-
-basic_grid_set    = "Patch"                          ## grid structure: choose "Patch" or "Shell-Patch"
-grid_center_set   = "Cell"                           ## grid center: chose "Cell" or "Vertex"
-
-grid_level        = 7                                ## total number of AMR grid levels (Reduced from 9)
-static_grid_level = 4                                ## number of AMR static grid levels (Reduced from 5)
-moving_grid_level = grid_level - static_grid_level   ## number of AMR moving grid levels
-
-analysis_level    = 0
-refinement_level  = 3                                ## time refinement start from this grid level
-
-largest_box_xyz_max = [320.0, 320.0, 320.0]          ## scale of the largest box
-                                                     ## not ne cess ary to be cubic for "Patch" grid s tructure
-                                                     ## need to be a cubic box for "Shell-Patch" grid structure
-largest_box_xyz_min = - numpy.array(largest_box_xyz_max)  
-
-static_grid_number = 48                              ## grid points of each static AMR grid (in x direction) (Reduced from 96)
-                                                     ## (grid points in y and z directions are automatically adjusted)
-moving_grid_number = 24                              ## grid points of each moving AMR grid (Reduced from 48)
-shell_grid_number  = [32, 32, 100]                   ## grid points of Shell-Patch grid
-                                                     ## in (phi, theta, r) direction
-devide_factor      = 2.0                             ## resolution between different grid levels dh0/dh1, only support 2.0 now
-                                                     
-
-static_grid_type   = 'Linear'                        ## AMR static grid structure , only supports "Linear"
-moving_grid_type   = 'Linear'                        ## AMR moving grid structure , only supports "Linear"
-
-quarter_sphere_number = 48                           ## grid number of 1/4 s pher ical surface (Reduced from 96)
-                                                     ## (which is needed for evaluating the spherical surface integral)
-
-#################################################
-
-
-#################################################
-
-## Setting the puncture information
-
-puncture_number       = 2                                     
-
-position_BH           = numpy.zeros( (puncture_number, 3) )   
-parameter_BH          = numpy.zeros( (puncture_number, 3) )   
-dimensionless_spin_BH = numpy.zeros( (puncture_number, 3) )   
-momentum_BH           = numpy.zeros( (puncture_number, 3) )   
-
-puncture_data_set     = "Manually"                       ## Method to give Puncture’s positions and momentum
-                                                         ## choose "Manually" or "Automatically-BBH"
-                                                         ## Prefer to choose "Manually", because "Automatically-BBH" is developing now
-
-## initial orbital distance and ellipticity for BBHs system
-## ( needed for "Automatically-BBH" case , not affect the "Manually" case )
-Distance = 10.0
-e0       = 0.0
-
-## black hole parameter (M Q* a*)
-parameter_BH[0] = [ 36.0/(36.0+29.0),  0.0,  +0.31 ]   
-parameter_BH[1] = [ 29.0/(36.0+29.0),  0.0,  -0.46 ]  
-## dimensionless spin in each direction
-dimensionless_spin_BH[0] = [ 0.0,  0.0,  +0.31 ]   
-dimensionless_spin_BH[1] = [ 0.0,  0.0,  -0.46 ]  
-
-## use Brugmann's convention
-##  -----0-----> y
-##   -      +     
-
-#---------------------------------------------
-
-## If puncture_data_set is chosen to be "Manually", it is necessary to set the position and momentum of each puncture manually
-
-## initial position for each puncture
-position_BH[0]  = [  0.0,  10.0*29.0/(36.0+29.0), 0.0 ]  
-position_BH[1]  = [  0.0, -10.0*36.0/(36.0+29.0), 0.0 ] 
-
-## initial mumentum for each puncture
-## (needed for "Manually" case, does not affect the "Automatically-BBH" case)
-momentum_BH[0]  = [ -0.09530152296974252,  -0.00084541526517121,   0.0 ]
-momentum_BH[1]  = [ +0.09530152296974252,  +0.00084541526517121,   0.0 ]
-
-
-#################################################
-
-
-#################################################
-
-## Setting the gravitational wave information
-
-GW_L_max        = 4                      ## maximal L number in gravitational wave
-GW_M_max        = 4                      ## maximal M number in gravitational wave
-Detector_Number = 12                     ## number of dector
-Detector_Rmin   = 50.0                   ## nearest dector distance
-Detector_Rmax   = 160.0                  ## farest dector distance
-
-#################################################
-
-
-#################################################
-
-## Setting the apprent horizon
-
-AHF_Find       = "no"                    ## whether to find the apparent horizon: choose "yes" or "no"
-
-AHF_Find_Every = 24
-AHF_Dump_Time  = 20.0
-
-#################################################
-
-
-#################################################
-
-## Other parameters (testing)
-## Only influence the Equation_Class = "BSSN-EScalar" case
-
-FR_a2     = 3.0        ## f(R) = R + a2 * R^2    
-FR_l2     = 10000.0
-FR_phi0   = 0.00005
-FR_r0     = 120.0
-FR_sigma0 = 8.0
-FR_Choice = 2          ## Choice options: 1 2 3 4 5
-                       ## 1: phi(r) = phi0 * Exp(-(r-r0)**2/sigma0)   
-                       ##    V(r)   = 0
-                       ## 2: phi(r) =  phi0 * a2^2/(1+a2^2)  
-                       ##    V(r)   = Exp(-8*Sqrt(PI/3)*phi(r)) * (1-Exp(4*Sqrt(PI/3)*phi(r)))**2 / (32*PI*a2)
-                       ## 3: Schrodinger-Newton gived by system phi(r) 
-                       ##    V(r)   = Exp(-8*Sqrt(PI/3)*phi(r)) * (1-Exp(4*Sqrt(PI/3)*phi(r)))**2 / (32*PI*a2)
-                       ## 4: phi(r) = phi0 * 0.5 * ( tanh((r+r0)/sigma0) - tanh((r-r0)/sigma0) )  
-                       ##    V(r)   = 0
-                       ##    f(R)   = R + a2*R^2  with a2 = +oo
-                       ## 5: phi(r) = phi0 * Exp(-(r-r0)**2/sigma)   
-                       ##    V(r)   = 0
-
-#################################################
-
-
-#################################################
-
-## Other parameters (testing)
-## (please do not change if not necessary)
-
-boundary_choice = "BAM-choice"     ## Sommerfeld boundary condition : choose "BAM-choice" or "Shibata-choice" 
-                                   ## prefer "BAM-choice"
-
-gauge_choice  = 0                  ## gauge choice
-                                   ## 0: B^i gauge
-                                   ## 1: David's puncture gauge
-                                   ## 2: MB B^i gauge               
-                                   ## 3: RIT B^i gauge
-                                   ## 4: MB beta gauge 
-                                   ## 5: RIT beta gauge 
-                                   ## 6: MGB1 B^i gauge
-                                   ## 7: MGB2 B^i gauge
-                                   ## prefer 0 or 1
-                                   
-tetrad_type  = 2                   ## tetradtype 
-                                   ##  v:r; u: phi; w: theta
-                                   ##      v^a = (x,y,z)
-                                   ## 0: orthonormal order: v,u,w
-                                   ##    v^a = (x,y,z)   
-                                   ##    m = (phi - i theta)/sqrt(2) 
-                                   ##    following Frans, Eq.(8) of  PRD 75, 124018(2007)
-                                   ## 1: orthonormal order: w,u,v
-                                   ##    m = (theta + i phi)/sqrt(2) 
-                                   ##    following Sperhake, Eq.(3.2) of  PRD 85, 124062(2012)    
-                                   ## 2: orthonormal order: v,u,w
-                                   ##    v_a = (x,y,z)
-                                   ##    m = (phi - i theta)/sqrt(2) 
-                                   ##    following Frans, Eq.(8) of  PRD 75, 124018(2007)
-                                   ## this version recommend set to 2
-                                   ## prefer 2
-                                   
-#################################################
--- a/AMSS_NCKU_MiniProgram.py
+++ b/AMSS_NCKU_MiniProgram.py
@@ -1,224 +0,0 @@
-##################################################################
-##
-## AMSS-NCKU Numerical Relativity Mini Test Program
-## Author: Assistant (based on Xiaoqu's code)
-## 2026/01/20
-##
-## This script runs a scaled-down version of the GW150914 test case
-## suitable for laptop testing.
-##
-##################################################################
-
-import os
-import shutil
-import sys
-import time
-
-# --- Context Manager for Input File Swapping ---
-class InputFileSwapper:
-    def __init__(self, mini_file="AMSS_NCKU_Input_Mini.py", target_file="AMSS_NCKU_Input.py"):
-        self.mini_file = mini_file
-        self.target_file = target_file
-        self.backup_file = target_file + ".bak"
-        self.swapped = False
-
-    def __enter__(self):
-        print(f"[MiniProgram] Swapping {self.target_file} with {self.mini_file}...")
-        if os.path.exists(self.target_file):
-            shutil.move(self.target_file, self.backup_file)
-        shutil.copy(self.mini_file, self.target_file)
-        self.swapped = True
-        return self
-
-    def __exit__(self, exc_type, exc_value, traceback):
-        if self.swapped:
-            print(f"[MiniProgram] Restoring original {self.target_file}...")
-            os.remove(self.target_file)
-            if os.path.exists(self.backup_file):
-                shutil.move(self.backup_file, self.target_file)
-
-def main():
-    # Use the swapper to ensure all imported modules see the mini configuration
-    with InputFileSwapper():
-        
-        # Import modules AFTER swapping input file
-        try:
-            import AMSS_NCKU_Input as input_data
-            import print_information
-            import setup
-            import numerical_grid
-            import generate_macrodef
-            import makefile_and_run
-            import generate_TwoPuncture_input
-            import renew_puncture_parameter
-            import plot_xiaoqu
-            import plot_GW_strain_amplitude_xiaoqu
-        except ImportError as e:
-            print(f"Error importing modules: {e}")
-            return
-
-        print_information.print_program_introduction()
-        
-        print("\n" + "#"*60)
-        print(" RUNNING MINI TEST CASE: GW150914-mini")
-        print("#"*60 + "\n")
-
-        # --- Directory Setup ---
-        File_directory = os.path.join(input_data.File_directory)
-        
-        if os.path.exists(File_directory):
-            print(f" Output directory '{File_directory}' exists. Removing for mini test...")
-            shutil.rmtree(File_directory, ignore_errors=True)
-        
-        os.mkdir(File_directory)
-        shutil.copy("AMSS_NCKU_Input.py", File_directory) # Copies the current (mini) input
-
-        output_directory = os.path.join(File_directory, "AMSS_NCKU_output")
-        os.mkdir(output_directory)
-
-        binary_results_directory = os.path.join(output_directory, input_data.Output_directory)
-        os.mkdir(binary_results_directory)
-
-        figure_directory = os.path.join(File_directory, "figure")
-        os.mkdir(figure_directory)
-
-        print(" Output directories generated.\n")
-
-        # --- Setup and Input Generation ---
-        setup.print_input_data(File_directory)
-        setup.generate_AMSSNCKU_input()
-        setup.print_puncture_information()
-
-        print("\n Generating AMSS-NCKU input parfile...")
-        numerical_grid.append_AMSSNCKU_cgh_input()
-        
-        print("\n Plotting initial grid...")
-        numerical_grid.plot_initial_grid()
-
-        print("\n Generating macro files...")
-        generate_macrodef.generate_macrodef_h()
-        generate_macrodef.generate_macrodef_fh()
-        
-        # --- Compilation Preparation ---
-        print("\n Preparing to compile and run...")
-        
-        AMSS_NCKU_source_path = "AMSS_NCKU_source"
-        AMSS_NCKU_source_copy = os.path.join(File_directory, "AMSS_NCKU_source_copy")
-
-        if not os.path.exists(AMSS_NCKU_source_path):
-             print(" Error: AMSS_NCKU_source not found! Please run in the project root.")
-             return
-
-        shutil.copytree(AMSS_NCKU_source_path, AMSS_NCKU_source_copy)
-        
-        macrodef_h_path  = os.path.join(File_directory, "macrodef.h") 
-        macrodef_fh_path = os.path.join(File_directory, "macrodef.fh") 
-
-        shutil.copy2(macrodef_h_path,  AMSS_NCKU_source_copy)
-        shutil.copy2(macrodef_fh_path, AMSS_NCKU_source_copy)
-
-        # --- Compilation ---
-        cwd = os.getcwd()
-        os.chdir(AMSS_NCKU_source_copy)
-        
-        print(" Compiling ABE...")
-        makefile_and_run.makefile_ABE()
-        
-        if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ): 
-            print(" Compiling TwoPunctureABE...")
-            makefile_and_run.makefile_TwoPunctureABE()
-            
-        os.chdir(cwd)
-
-        # --- Copy Executables ---
-        if (input_data.GPU_Calculation == "no"):
-            ABE_file = os.path.join(AMSS_NCKU_source_copy, "ABE")
-        else:
-            ABE_file = os.path.join(AMSS_NCKU_source_copy, "ABEGPU")
-            
-        if not os.path.exists(ABE_file):
-            print(" Error: ABE executable compilation failed.")
-            return
-            
-        shutil.copy2(ABE_file, output_directory)
-
-        TwoPuncture_file = os.path.join(AMSS_NCKU_source_copy, "TwoPunctureABE")
-        if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ):
-            if not os.path.exists(TwoPuncture_file):
-                print(" Error: TwoPunctureABE compilation failed.")
-                return
-            shutil.copy2(TwoPuncture_file, output_directory)
-
-        # --- Execution ---
-        start_time = time.time()
-        
-        if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ):
-             print("\n Generating TwoPuncture input...")
-             generate_TwoPuncture_input.generate_AMSSNCKU_TwoPuncture_input()
-             
-             AMSS_NCKU_TwoPuncture_inputfile = 'AMSS-NCKU-TwoPuncture.input'
-             AMSS_NCKU_TwoPuncture_inputfile_path = os.path.join( File_directory, AMSS_NCKU_TwoPuncture_inputfile )
-             shutil.copy2( AMSS_NCKU_TwoPuncture_inputfile_path, os.path.join(output_directory, 'TwoPunctureinput.par') )
-             
-             print(" Running TwoPunctureABE...")
-             os.chdir(output_directory)
-             makefile_and_run.run_TwoPunctureABE()
-             os.chdir(cwd)
-
-        # Update Puncture Parameter
-        renew_puncture_parameter.append_AMSSNCKU_BSSN_input(File_directory, output_directory)
-        
-        AMSS_NCKU_inputfile = 'AMSS-NCKU.input'
-        AMSS_NCKU_inputfile_path = os.path.join(File_directory, AMSS_NCKU_inputfile)
-        shutil.copy2( AMSS_NCKU_inputfile_path, os.path.join(output_directory, 'input.par') )
-        
-        print("\n Input files ready. Launching ABE...")
-        
-        os.chdir(output_directory)
-        makefile_and_run.run_ABE()
-        os.chdir(cwd)
-        
-        end_time = time.time()
-        elapsed_time = end_time - start_time
-        
-        # --- Post-processing ---
-        print("\n Copying output files for inspection...")
-        AMSS_NCKU_error_file_path = os.path.join(binary_results_directory, "setting.par")
-        if os.path.exists(AMSS_NCKU_error_file_path):
-            shutil.copy( AMSS_NCKU_error_file_path, os.path.join(output_directory, "AMSSNCKU_setting_parameter") )
-
-        AMSS_NCKU_error_file_path = os.path.join(binary_results_directory, "Error.log")
-        if os.path.exists(AMSS_NCKU_error_file_path):
-            shutil.copy( AMSS_NCKU_error_file_path, os.path.join(output_directory, "Error.log") )
-
-        for fname in ["bssn_BH.dat", "bssn_ADMQs.dat", "bssn_psi4.dat", "bssn_constraint.dat"]:
-            fpath = os.path.join(binary_results_directory, fname)
-            if os.path.exists(fpath):
-                shutil.copy(fpath, os.path.join(output_directory, fname))
-
-        # --- Plotting ---
-        print("\n Plotting results...")
-        try:
-            plot_xiaoqu.generate_puncture_orbit_plot(   binary_results_directory, figure_directory )
-            plot_xiaoqu.generate_puncture_orbit_plot3D( binary_results_directory, figure_directory )
-            plot_xiaoqu.generate_puncture_distence_plot( binary_results_directory, figure_directory )
-            
-            for i in range(input_data.Detector_Number):
-                plot_xiaoqu.generate_gravitational_wave_psi4_plot( binary_results_directory, figure_directory, i )
-                plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot( binary_results_directory, figure_directory, i )
-
-            for i in range(input_data.Detector_Number):
-                plot_xiaoqu.generate_ADMmass_plot( binary_results_directory, figure_directory, i )
-
-            for i in range(input_data.grid_level):
-                plot_xiaoqu.generate_constraint_check_plot( binary_results_directory, figure_directory, i )
-                
-            plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory )
-        except Exception as e:
-            print(f"Warning: Plotting failed: {e}")
-
-        print(f"\n Program Cost = {elapsed_time:.2f} Seconds \n")
-        print(" AMSS-NCKU-Python simulation finished (Mini Test).\n")
-
-if __name__ == "__main__":
-    main()
--- a/AMSS_NCKU_source/bssn_rhs.f90
+++ b/AMSS_NCKU_source/bssn_rhs.f90
--- a/AMSS_NCKU_source/bssn_rhs_legacy.f90
+++ b/AMSS_NCKU_source/bssn_rhs_legacy.f90
--- a/AMSS_NCKU_source/bssn_rhs_opt.f90
+++ b/AMSS_NCKU_source/bssn_rhs_opt.f90
--- a/AMSS_NCKU_source/diff_new.f90
+++ b/AMSS_NCKU_source/diff_new.f90
@@ -1939,309 +1939,6 @@
  return

  end subroutine fddyz
-  subroutine fderivs_batch4(ex,f1,f2,f3,f4, &
-                            f1x,f1y,f1z,f2x,f2y,f2z,f3x,f3y,f3z,f4x,f4y,f4z, &
-                            X,Y,Z,SYM1,SYM2,SYM3,symmetry,onoff)
-  implicit none
-
-  integer,                               intent(in ):: ex(1:3),symmetry,onoff
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(in ):: f1,f2,f3,f4
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f1x,f1y,f1z
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f2x,f2y,f2z
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f3x,f3y,f3z
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f4x,f4y,f4z
-  real*8,                                intent(in) :: X(ex(1)),Y(ex(2)),Z(ex(3))
-  real*8,                                intent(in ):: SYM1,SYM2,SYM3
-
-!~~~~~~ other variables
-
-  real*8 :: dX,dY,dZ
-  real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh1,fh2,fh3,fh4
-  real*8, dimension(3) :: SoA
-  integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
-  real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
-  integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
-  real*8,  parameter :: ZEO=0.d0,ONE=1.d0
-  real*8,  parameter :: TWO=2.d0,EIT=8.d0
-  real*8,  parameter :: F12=1.2d1
-
-  dX = X(2)-X(1)
-  dY = Y(2)-Y(1)
-  dZ = Z(2)-Z(1)
-
-  imax = ex(1)
-  jmax = ex(2)
-  kmax = ex(3)
-
-  imin = 1
-  jmin = 1
-  kmin = 1
-  if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
-  if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
-  if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
-
-  SoA(1) = SYM1
-  SoA(2) = SYM2
-  SoA(3) = SYM3
-
-  call symmetry_bd(2,ex,f1,fh1,SoA)
-  call symmetry_bd(2,ex,f2,fh2,SoA)
-  call symmetry_bd(2,ex,f3,fh3,SoA)
-  call symmetry_bd(2,ex,f4,fh4,SoA)
-
-  d12dx = ONE/F12/dX
-  d12dy = ONE/F12/dY
-  d12dz = ONE/F12/dZ
-
-  d2dx = ONE/TWO/dX
-  d2dy = ONE/TWO/dY
-  d2dz = ONE/TWO/dZ
-
-  f1x = ZEO; f1y = ZEO; f1z = ZEO
-  f2x = ZEO; f2y = ZEO; f2z = ZEO
-  f3x = ZEO; f3y = ZEO; f3z = ZEO
-  f4x = ZEO; f4y = ZEO; f4z = ZEO
-
-  do k=1,ex(3)-1
-  do j=1,ex(2)-1
-  do i=1,ex(1)-1
-   if(i+2 <= imax .and. i-2 >= imin .and. &
-      j+2 <= jmax .and. j-2 >= jmin .and. &
-      k+2 <= kmax .and. k-2 >= kmin) then
-      f1x(i,j,k)=d12dx*(fh1(i-2,j,k)-EIT*fh1(i-1,j,k)+EIT*fh1(i+1,j,k)-fh1(i+2,j,k))
-      f1y(i,j,k)=d12dy*(fh1(i,j-2,k)-EIT*fh1(i,j-1,k)+EIT*fh1(i,j+1,k)-fh1(i,j+2,k))
-      f1z(i,j,k)=d12dz*(fh1(i,j,k-2)-EIT*fh1(i,j,k-1)+EIT*fh1(i,j,k+1)-fh1(i,j,k+2))
-
-      f2x(i,j,k)=d12dx*(fh2(i-2,j,k)-EIT*fh2(i-1,j,k)+EIT*fh2(i+1,j,k)-fh2(i+2,j,k))
-      f2y(i,j,k)=d12dy*(fh2(i,j-2,k)-EIT*fh2(i,j-1,k)+EIT*fh2(i,j+1,k)-fh2(i,j+2,k))
-      f2z(i,j,k)=d12dz*(fh2(i,j,k-2)-EIT*fh2(i,j,k-1)+EIT*fh2(i,j,k+1)-fh2(i,j,k+2))
-
-      f3x(i,j,k)=d12dx*(fh3(i-2,j,k)-EIT*fh3(i-1,j,k)+EIT*fh3(i+1,j,k)-fh3(i+2,j,k))
-      f3y(i,j,k)=d12dy*(fh3(i,j-2,k)-EIT*fh3(i,j-1,k)+EIT*fh3(i,j+1,k)-fh3(i,j+2,k))
-      f3z(i,j,k)=d12dz*(fh3(i,j,k-2)-EIT*fh3(i,j,k-1)+EIT*fh3(i,j,k+1)-fh3(i,j,k+2))
-
-      f4x(i,j,k)=d12dx*(fh4(i-2,j,k)-EIT*fh4(i-1,j,k)+EIT*fh4(i+1,j,k)-fh4(i+2,j,k))
-      f4y(i,j,k)=d12dy*(fh4(i,j-2,k)-EIT*fh4(i,j-1,k)+EIT*fh4(i,j+1,k)-fh4(i,j+2,k))
-      f4z(i,j,k)=d12dz*(fh4(i,j,k-2)-EIT*fh4(i,j,k-1)+EIT*fh4(i,j,k+1)-fh4(i,j,k+2))
-   elseif(i+1 <= imax .and. i-1 >= imin .and. &
-          j+1 <= jmax .and. j-1 >= jmin .and. &
-          k+1 <= kmax .and. k-1 >= kmin) then
-      f1x(i,j,k)=d2dx*(-fh1(i-1,j,k)+fh1(i+1,j,k))
-      f1y(i,j,k)=d2dy*(-fh1(i,j-1,k)+fh1(i,j+1,k))
-      f1z(i,j,k)=d2dz*(-fh1(i,j,k-1)+fh1(i,j,k+1))
-
-      f2x(i,j,k)=d2dx*(-fh2(i-1,j,k)+fh2(i+1,j,k))
-      f2y(i,j,k)=d2dy*(-fh2(i,j-1,k)+fh2(i,j+1,k))
-      f2z(i,j,k)=d2dz*(-fh2(i,j,k-1)+fh2(i,j,k+1))
-
-      f3x(i,j,k)=d2dx*(-fh3(i-1,j,k)+fh3(i+1,j,k))
-      f3y(i,j,k)=d2dy*(-fh3(i,j-1,k)+fh3(i,j+1,k))
-      f3z(i,j,k)=d2dz*(-fh3(i,j,k-1)+fh3(i,j,k+1))
-
-      f4x(i,j,k)=d2dx*(-fh4(i-1,j,k)+fh4(i+1,j,k))
-      f4y(i,j,k)=d2dy*(-fh4(i,j-1,k)+fh4(i,j+1,k))
-      f4z(i,j,k)=d2dz*(-fh4(i,j,k-1)+fh4(i,j,k+1))
-   endif
-  enddo
-  enddo
-  enddo
-
-  return
-
-  end subroutine fderivs_batch4
-!-----------------------------------------------------------------------------
-! batch first derivatives (3 fields), same symmetry setup
-!-----------------------------------------------------------------------------
-  subroutine fderivs_batch3(ex,f1,f2,f3, &
-                            f1x,f1y,f1z,f2x,f2y,f2z,f3x,f3y,f3z, &
-                            X,Y,Z,SYM1,SYM2,SYM3,symmetry,onoff)
-  implicit none
-
-  integer,                               intent(in ):: ex(1:3),symmetry,onoff
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(in ):: f1,f2,f3
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f1x,f1y,f1z
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f2x,f2y,f2z
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f3x,f3y,f3z
-  real*8,                                intent(in) :: X(ex(1)),Y(ex(2)),Z(ex(3))
-  real*8,                                intent(in ):: SYM1,SYM2,SYM3
-
-!~~~~~~ other variables
-
-  real*8 :: dX,dY,dZ
-  real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh1,fh2,fh3
-  real*8, dimension(3) :: SoA
-  integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
-  real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
-  integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
-  real*8,  parameter :: ZEO=0.d0,ONE=1.d0
-  real*8,  parameter :: TWO=2.d0,EIT=8.d0
-  real*8,  parameter :: F12=1.2d1
-
-  dX = X(2)-X(1)
-  dY = Y(2)-Y(1)
-  dZ = Z(2)-Z(1)
-
-  imax = ex(1)
-  jmax = ex(2)
-  kmax = ex(3)
-
-  imin = 1
-  jmin = 1
-  kmin = 1
-  if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
-  if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
-  if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
-
-  SoA(1) = SYM1
-  SoA(2) = SYM2
-  SoA(3) = SYM3
-
-  call symmetry_bd(2,ex,f1,fh1,SoA)
-  call symmetry_bd(2,ex,f2,fh2,SoA)
-  call symmetry_bd(2,ex,f3,fh3,SoA)
-
-  d12dx = ONE/F12/dX
-  d12dy = ONE/F12/dY
-  d12dz = ONE/F12/dZ
-
-  d2dx = ONE/TWO/dX
-  d2dy = ONE/TWO/dY
-  d2dz = ONE/TWO/dZ
-
-  f1x = ZEO; f1y = ZEO; f1z = ZEO
-  f2x = ZEO; f2y = ZEO; f2z = ZEO
-  f3x = ZEO; f3y = ZEO; f3z = ZEO
-
-  do k=1,ex(3)-1
-  do j=1,ex(2)-1
-  do i=1,ex(1)-1
-   if(i+2 <= imax .and. i-2 >= imin .and. &
-      j+2 <= jmax .and. j-2 >= jmin .and. &
-      k+2 <= kmax .and. k-2 >= kmin) then
-      f1x(i,j,k)=d12dx*(fh1(i-2,j,k)-EIT*fh1(i-1,j,k)+EIT*fh1(i+1,j,k)-fh1(i+2,j,k))
-      f1y(i,j,k)=d12dy*(fh1(i,j-2,k)-EIT*fh1(i,j-1,k)+EIT*fh1(i,j+1,k)-fh1(i,j+2,k))
-      f1z(i,j,k)=d12dz*(fh1(i,j,k-2)-EIT*fh1(i,j,k-1)+EIT*fh1(i,j,k+1)-fh1(i,j,k+2))
-
-      f2x(i,j,k)=d12dx*(fh2(i-2,j,k)-EIT*fh2(i-1,j,k)+EIT*fh2(i+1,j,k)-fh2(i+2,j,k))
-      f2y(i,j,k)=d12dy*(fh2(i,j-2,k)-EIT*fh2(i,j-1,k)+EIT*fh2(i,j+1,k)-fh2(i,j+2,k))
-      f2z(i,j,k)=d12dz*(fh2(i,j,k-2)-EIT*fh2(i,j,k-1)+EIT*fh2(i,j,k+1)-fh2(i,j,k+2))
-
-      f3x(i,j,k)=d12dx*(fh3(i-2,j,k)-EIT*fh3(i-1,j,k)+EIT*fh3(i+1,j,k)-fh3(i+2,j,k))
-      f3y(i,j,k)=d12dy*(fh3(i,j-2,k)-EIT*fh3(i,j-1,k)+EIT*fh3(i,j+1,k)-fh3(i,j+2,k))
-      f3z(i,j,k)=d12dz*(fh3(i,j,k-2)-EIT*fh3(i,j,k-1)+EIT*fh3(i,j,k+1)-fh3(i,j,k+2))
-   elseif(i+1 <= imax .and. i-1 >= imin .and. &
-          j+1 <= jmax .and. j-1 >= jmin .and. &
-          k+1 <= kmax .and. k-1 >= kmin) then
-      f1x(i,j,k)=d2dx*(-fh1(i-1,j,k)+fh1(i+1,j,k))
-      f1y(i,j,k)=d2dy*(-fh1(i,j-1,k)+fh1(i,j+1,k))
-      f1z(i,j,k)=d2dz*(-fh1(i,j,k-1)+fh1(i,j,k+1))
-
-      f2x(i,j,k)=d2dx*(-fh2(i-1,j,k)+fh2(i+1,j,k))
-      f2y(i,j,k)=d2dy*(-fh2(i,j-1,k)+fh2(i,j+1,k))
-      f2z(i,j,k)=d2dz*(-fh2(i,j,k-1)+fh2(i,j,k+1))
-
-      f3x(i,j,k)=d2dx*(-fh3(i-1,j,k)+fh3(i+1,j,k))
-      f3y(i,j,k)=d2dy*(-fh3(i,j-1,k)+fh3(i,j+1,k))
-      f3z(i,j,k)=d2dz*(-fh3(i,j,k-1)+fh3(i,j,k+1))
-   endif
-  enddo
-  enddo
-  enddo
-
-  return
-
-  end subroutine fderivs_batch3
-!-----------------------------------------------------------------------------
-! batch first derivatives (2 fields), same symmetry setup
-!-----------------------------------------------------------------------------
-  subroutine fderivs_batch2(ex,f1,f2, &
-                            f1x,f1y,f1z,f2x,f2y,f2z, &
-                            X,Y,Z,SYM1,SYM2,SYM3,symmetry,onoff)
-  implicit none
-
-  integer,                               intent(in ):: ex(1:3),symmetry,onoff
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(in ):: f1,f2
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f1x,f1y,f1z
-  real*8,  dimension(ex(1),ex(2),ex(3)), intent(out):: f2x,f2y,f2z
-  real*8,                                intent(in) :: X(ex(1)),Y(ex(2)),Z(ex(3))
-  real*8,                                intent(in ):: SYM1,SYM2,SYM3
-
-!~~~~~~ other variables
-
-  real*8 :: dX,dY,dZ
-  real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh1,fh2
-  real*8, dimension(3) :: SoA
-  integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
-  real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
-  integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
-  real*8,  parameter :: ZEO=0.d0,ONE=1.d0
-  real*8,  parameter :: TWO=2.d0,EIT=8.d0
-  real*8,  parameter :: F12=1.2d1
-
-  dX = X(2)-X(1)
-  dY = Y(2)-Y(1)
-  dZ = Z(2)-Z(1)
-
-  imax = ex(1)
-  jmax = ex(2)
-  kmax = ex(3)
-
-  imin = 1
-  jmin = 1
-  kmin = 1
-  if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
-  if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
-  if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
-
-  SoA(1) = SYM1
-  SoA(2) = SYM2
-  SoA(3) = SYM3
-
-  call symmetry_bd(2,ex,f1,fh1,SoA)
-  call symmetry_bd(2,ex,f2,fh2,SoA)
-
-  d12dx = ONE/F12/dX
-  d12dy = ONE/F12/dY
-  d12dz = ONE/F12/dZ
-
-  d2dx = ONE/TWO/dX
-  d2dy = ONE/TWO/dY
-  d2dz = ONE/TWO/dZ
-
-  f1x = ZEO; f1y = ZEO; f1z = ZEO
-  f2x = ZEO; f2y = ZEO; f2z = ZEO
-
-  do k=1,ex(3)-1
-  do j=1,ex(2)-1
-  do i=1,ex(1)-1
-   if(i+2 <= imax .and. i-2 >= imin .and. &
-      j+2 <= jmax .and. j-2 >= jmin .and. &
-      k+2 <= kmax .and. k-2 >= kmin) then
-      f1x(i,j,k)=d12dx*(fh1(i-2,j,k)-EIT*fh1(i-1,j,k)+EIT*fh1(i+1,j,k)-fh1(i+2,j,k))
-      f1y(i,j,k)=d12dy*(fh1(i,j-2,k)-EIT*fh1(i,j-1,k)+EIT*fh1(i,j+1,k)-fh1(i,j+2,k))
-      f1z(i,j,k)=d12dz*(fh1(i,j,k-2)-EIT*fh1(i,j,k-1)+EIT*fh1(i,j,k+1)-fh1(i,j,k+2))
-
-      f2x(i,j,k)=d12dx*(fh2(i-2,j,k)-EIT*fh2(i-1,j,k)+EIT*fh2(i+1,j,k)-fh2(i+2,j,k))
-      f2y(i,j,k)=d12dy*(fh2(i,j-2,k)-EIT*fh2(i,j-1,k)+EIT*fh2(i,j+1,k)-fh2(i,j+2,k))
-      f2z(i,j,k)=d12dz*(fh2(i,j,k-2)-EIT*fh2(i,j,k-1)+EIT*fh2(i,j,k+1)-fh2(i,j,k+2))
-   elseif(i+1 <= imax .and. i-1 >= imin .and. &
-          j+1 <= jmax .and. j-1 >= jmin .and. &
-          k+1 <= kmax .and. k-1 >= kmin) then
-      f1x(i,j,k)=d2dx*(-fh1(i-1,j,k)+fh1(i+1,j,k))
-      f1y(i,j,k)=d2dy*(-fh1(i,j-1,k)+fh1(i,j+1,k))
-      f1z(i,j,k)=d2dz*(-fh1(i,j,k-1)+fh1(i,j,k+1))
-
-      f2x(i,j,k)=d2dx*(-fh2(i-1,j,k)+fh2(i+1,j,k))
-      f2y(i,j,k)=d2dy*(-fh2(i,j-1,k)+fh2(i,j+1,k))
-      f2z(i,j,k)=d2dz*(-fh2(i,j,k-1)+fh2(i,j,k+1))
-   endif
-  enddo
-  enddo
-  enddo
-
-  return
-
-  end subroutine fderivs_batch2

 #elif (ghost_width == 4)
 ! sixth order code
@@ -2380,9 +2077,6 @@

  end subroutine fderivs
 !-----------------------------------------------------------------------------
-! batch first derivatives (4 fields), same symmetry setup
-!-----------------------------------------------------------------------------
-!-----------------------------------------------------------------------------
 !
 ! single derivatives dx
 !
--- a/AMSS_NCKU_source/makefile
+++ b/AMSS_NCKU_source/makefile
@@ -34,7 +34,7 @@ C++FILES_GPU = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o

 F90FILES = enforce_algebra.o fmisc.o initial_puncture.o prolongrestrict.o\
 	   prolongrestrict_cell.o prolongrestrict_vertex.o\
-	   rungekutta4_rout.o bssn_rhs.o diff_new.o kodiss.o kodiss_sh.o\
+	   rungekutta4_rout.o bssn_rhs_opt.o bssn_rhs.o bssn_rhs_legacy.o diff_new.o kodiss.o kodiss_sh.o\
 	   lopsidediff.o sommerfeld_rout.o getnp4.o diff_new_sh.o\
 	   shellfunctions.o bssn_rhs_ss.o Set_Rho_ADM.o\
           getnp4EScalar.o bssnEScalar_rhs.o bssn_constraint.o ricci_gamma.o\
--- a/AMSS_NCKU_source/makefile.inc
+++ b/AMSS_NCKU_source/makefile.inc
@@ -7,8 +7,9 @@
 filein  = -I/usr/include/ -I${MKLROOT}/include

 ## Using sequential MKL (OpenMP disabled for better single-threaded performance)
-## Added -lifcore for Intel Fortran runtime and -limf for Intel math library
-LDLIBS  = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore -limf -lpthread -lm -ldl
+LDLIBS  = -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -lifcore -limf -lmpi \
+          -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \
+          -lpthread -lm -ldl

 ## Aggressive optimization flags:
 ## -O3: Maximum optimization
--- a/generate_macrodef.py
+++ b/generate_macrodef.py
@@ -392,17 +392,6 @@ def generate_macrodef_fh():
        print( "# Finite_Difference_Method #define ghost_width setting error!!!",   file=file1 )
        print(                                                   file=file1 )

-    # Define macro DEBUG_NAN_CHECK
-    # 0: off (default), 1: on
-
-    debug_nan_check = getattr(input_data, "Debug_NaN_Check", 0)
-    if debug_nan_check:
-        print( "#define DEBUG_NAN_CHECK 1", file=file1 )
-        print(                             file=file1 )
-    else:
-        print( "#define DEBUG_NAN_CHECK 0", file=file1 )
-        print(                             file=file1 )
-
    # Whether to use a shell-patch grid
    # use shell or not

@@ -525,9 +514,6 @@ def generate_macrodef_fh():
    print( "    6th order: 4",                                                                      file=file1 )
    print( "    8th order: 5",                                                                      file=file1 )
    print(                                                                                          file=file1 )
-    print( "define DEBUG_NAN_CHECK",                                                                file=file1 )
-    print( "    0: off (default), 1: on",                                                           file=file1 )
-    print(                                                                                          file=file1 )
    print( "define WithShell",                                                                      file=file1 )
    print( "    use shell or not",                                                                  file=file1 )
    print(                                                                                          file=file1 )
--- a/inputfile_example/AMSS_NCKU_Input.py
+++ b/inputfile_example/AMSS_NCKU_Input.py
@@ -35,8 +35,7 @@ Equation_Class           = "BSSN"                  ## Evolution Equation: choose
                                                   ## If "BSSN-EScalar" is chosen, it is necessary to set other parameters below
 Initial_Data_Method      = "Ansorg-TwoPuncture"    ## initial data method: choose "Ansorg-TwoPuncture", "Lousto-Analytical", "Cao-Analytical", "KerrSchild-Analytical"
 Time_Evolution_Method    = "runge-kutta-45"        ## time evolution method: choose "runge-kutta-45"
-Finite_Diffenence_Method = "4th-order"             ## finite-difference method: choose "2nd-order", "4th-order", "6th-order", "8th-order"
-Debug_NaN_Check          = 0                       ## enable NaN checks in compute_rhs_bssn: 0 (off) or 1 (on)
+Finite_Diffenence_Method = "4th-order"             ## finite-difference method: choose "2nd-order", "4th-order", "6th-order", "8th-order"

 #################################################

--- a/makefile_and_run.py
+++ b/makefile_and_run.py
@@ -15,13 +15,12 @@ import subprocess
 ## taskset ensures all child processes inherit the CPU affinity mask
 ## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111)
 ## Format: taskset -c 4-55,60-111 ensures processes only run on these cores
-#NUMACTL_CPU_BIND = "taskset -c 4-55,60-111"
-NUMACTL_CPU_BIND = ""
+NUMACTL_CPU_BIND = "taskset -c 4-55,60-111"

 ## Build parallelism configuration
 ## Use nohz_full cores (4-55, 60-111) for compilation: 52 + 52 = 104 cores
 ## Set make -j to utilize available cores for faster builds
-BUILD_JOBS = 14
+BUILD_JOBS = 104


 ##################################################################
Author	SHA1	Message	Date
CGH0S7	ed89bc029b	Fix potential division by zero in reta_val calculation and enable NaN checks Added a safety check for the denominator in the reta_val calculation to prevent division by zero when chi approaches zero (e.g., at far-field boundaries). Also enabled DEBUG_NAN_CHECK macro to catch invalid inputs early. Initialized output arrays to zero to prevent uninitialized memory access.	2026-01-19 20:29:48 +08:00
CGH0S7	19274e93d1	Fix boundary handling in bssn_rhs_opt.f90 to prevent NaNs Refactored calc_derivs and calc_dderivs to include correct boundary handling logic matching the legacy code. Implemented fallback to 2nd order derivatives when near boundaries where 4th order stencils cannot be used. Added logic to initialize output arrays to zero to avoid uninitialized memory access.	2026-01-19 20:03:22 +08:00
CGH0S7	ae1a474cca	Fix compilation errors and complete logic in BSSN RHS optimization	2026-01-19 19:22:52 +08:00
CGH0S7	cbb8fb3a87	patched last commit	2026-01-19 17:14:28 +08:00
CGH0S7	4472d89a9f	Optimize bssn_rhs calculation with cache blocking and vectorization - Implemented cache blocking (BLK=8) in bssn_rhs_opt.f90 to improve L1/L2 cache hit rate. - Introduced bssn_rhs_opt.f90 module with vectorized derivative and physics kernels. - Renamed original implementation to bssn_rhs_legacy.f90 for fallback. - Updated bssn_rhs.f90 to act as a dispatcher, using the optimized path for ghost_width=3. - Updated makefile to include new source files. - Added DEBUG_NAN_CHECK macro to optionally disable NaN checks in production.	2026-01-19 16:39:24 +08:00