Commit aca56129ab61bb59e7657c9a0aefb2b0b853d3ad
0 parents
Initial commit from NPB 3.3.1
Source: http://www.nas.nasa.gov/assets/npb/NPB3.3.1.tar.gz
Showing
498 changed files
with
105197 additions
and
0 deletions
Too many changes to show.
To preserve performance only 27 of 498 files are displayed.
Changes.log
0 โ 100644
| 1 | +++ a/Changes.log | ||
| 1 | +########################################### | ||
| 2 | +# Modification History of NPB3.x # | ||
| 3 | +# ------------------------------ # | ||
| 4 | +# NPB development team # | ||
| 5 | +# NASA Ames Research Center # | ||
| 6 | +# npb@nas.nasa.gov # | ||
| 7 | +# http://www.nas.nasa.gov/Software/NPB/ # | ||
| 8 | +########################################### | ||
| 9 | + | ||
| 10 | +------------------------------------------------------ | ||
| 11 | +Changes in NPB3.3.1 | ||
| 12 | + (NPB3.3-SER, NPB3.3-OMP, NPB3.3-MPI ) | ||
| 13 | +------------------------------------------------------ | ||
| 14 | +[17-Feb-09] | ||
| 15 | + | ||
| 16 | +This is a bug fixing release of NPB3.3. | ||
| 17 | + | ||
| 18 | +1. All versions | ||
| 19 | + | ||
| 20 | + - sys/setparams.c: fixed a problem in dealing with quoted (") flags | ||
| 21 | + from make.def when producing npbparams.h for C. | ||
| 22 | + | ||
| 23 | + - CG: ensure 'implicit none' used in all subroutines. | ||
| 24 | + | ||
| 25 | +2. MPI version | ||
| 26 | + | ||
| 27 | + - Additional timers can be used for profiling purpose, similar | ||
| 28 | + to those already included in the OMP and SER versions. | ||
| 29 | + | ||
| 30 | + - LU: | ||
| 31 | + * code clean up (suggested by Rob Van der Wijngaart) | ||
| 32 | + > avoid using MPI_ANY_SOURCE in exchange_*.f, which might | ||
| 33 | + alter performance in some cases. | ||
| 34 | + > delete references to sethyper and 'icomm*', which are | ||
| 35 | + no longer used since NPB2.2. | ||
| 36 | + * change the low-bound limit on the sub-domain size in subdomain.f | ||
| 37 | + from 4 to 3 in order to increase allowable process counts. | ||
| 38 | + * allow number of processes other than power of two. | ||
| 39 | + | ||
| 40 | + - FT: fix a non-portable way of broadcasting input parameters | ||
| 41 | + (pointed out by Art Lazanoff) | ||
| 42 | + | ||
| 43 | + - BT: include 'btio_cleanup' as part of the I/O timing | ||
| 44 | + | ||
| 45 | +3. OMP and SER versions | ||
| 46 | + | ||
| 47 | + - DC: fix access to out-of-bound array elements in adc.c | ||
| 48 | + Reported by Per Larsen of Demark <pl@imm.dtu.dk> | ||
| 49 | + | ||
| 50 | + - UA: fix the use of uninitialized array 'sje' in mortar_vertex() by | ||
| 51 | + adding "call nr_init[_omp](sje,4*6*nelt,0)" in the main program. | ||
| 52 | + | ||
| 53 | + - MG, UA: include additional timers for profiling purpose. | ||
| 54 | + | ||
| 55 | + - Executables now use ".x" as a name extension | ||
| 56 | + | ||
| 57 | + | ||
| 58 | +------------------------------------------------------ | ||
| 59 | +Changes in NPB3.3 | ||
| 60 | + (NPB3.3-SER, NPB3.3-OMP, NPB3.3-MPI ) | ||
| 61 | +------------------------------------------------------ | ||
| 62 | +[02-Aug-07] | ||
| 63 | + | ||
| 64 | +1. New and improvements | ||
| 65 | + | ||
| 66 | + - The Class E problem has been introduced in seven of the benchmarks | ||
| 67 | + (BT, SP, LU, CG, MG, FT, and EP) in all three implementations. | ||
| 68 | + | ||
| 69 | + - The Class D problem has been added to the IS benchmark in all | ||
| 70 | + three implementations. It requires the compiler support of | ||
| 71 | + 64-bit "long" type in C. The MPI version of IS now allows runs | ||
| 72 | + up to 1024 processes. | ||
| 73 | + | ||
| 74 | + - The Bucket Sort option (USE_BUCKETS) has been added to | ||
| 75 | + the OpenMP version of IS and made as the default. | ||
| 76 | + | ||
| 77 | + - Introduced the "twiddle" array in the OpenMP FT benchmark, | ||
| 78 | + which has been used in the MPI and SER versions and seems | ||
| 79 | + to improve performance for larger problem sizes. | ||
| 80 | + | ||
| 81 | + - Merged vector codes for the BT and LU benchmarks into | ||
| 82 | + the release. | ||
| 83 | + | ||
| 84 | + - Updates to BTIO (MPI/BT with IO subtypes): | ||
| 85 | + * added I/O stats (I/O timing, data size written, I/O data rate) | ||
| 86 | + * added an option for interleaving reads between writes through | ||
| 87 | + the inputbt.data file. Although the data file size would be | ||
| 88 | + smaller as a result, the total amount of data written is still | ||
| 89 | + the same. | ||
| 90 | + | ||
| 91 | + - Made documents more consistent throughout different versions | ||
| 92 | + (README and README.install). | ||
| 93 | + | ||
| 94 | +2. Bug fixes | ||
| 95 | + | ||
| 96 | + - MPI/FT: fixed a verification failure for cases where NX/=NY | ||
| 97 | + and the 2D decomposition are used. The bug occurred at least | ||
| 98 | + for (Class D, NPROCS=2048) and (Class B, NPROCS=512). | ||
| 99 | + | ||
| 100 | + fixed an output printing format problem occurred when | ||
| 101 | + the number of processes >= 1000. | ||
| 102 | + | ||
| 103 | + - MPI/SP: fixed a performance regression due to improper | ||
| 104 | + padding of array dimensions. | ||
| 105 | + | ||
| 106 | + - MPI/IS: minor fix to support large processor counts (>=512). | ||
| 107 | + | ||
| 108 | + - OMP/UA: fixed a race condition in mason.f, avoided the use | ||
| 109 | + of the LASTPRIVATE directive. | ||
| 110 | + | ||
| 111 | + - OMP/LU: minor fix in data flushing for pipelining. | ||
| 112 | + | ||
| 113 | + - DC: There are a number of fixes - | ||
| 114 | + * fixed segmentation fault in both OMP and SER versions | ||
| 115 | + caused by accessing zero-length array elements. | ||
| 116 | + Reported by Jeff Odom <jodom@cs.umd.edu>. | ||
| 117 | + | ||
| 118 | + * fixed a race in reporting benchmark timing in the OMP version | ||
| 119 | + | ||
| 120 | + * fixed the use of timer in the OMP version, which limited | ||
| 121 | + the number of threads to 64. The number of threads is now | ||
| 122 | + lifted to a maximum of MAX_NUMBER_OF_TASKS (=256). | ||
| 123 | + | ||
| 124 | + * made the benchmark output consistent with other NPBs. | ||
| 125 | + | ||
| 126 | + - fixed a use of uninitialized variable in MPI/sys/setparams.c. | ||
| 127 | + setparams in all three versions was updated to deal with | ||
| 128 | + make.def that contains carriage-return character ('\r'). | ||
| 129 | + | ||
| 130 | + - SER/FT: added 'implicit none' to all missing places. | ||
| 131 | + | ||
| 132 | + - SER/IS: fixed missing variable declarations for the Bucket | ||
| 133 | + Sort option (when USE_BUCKETS is defined). | ||
| 134 | + | ||
| 135 | +3. Others | ||
| 136 | + | ||
| 137 | + - The default value for collbuf_nodes in the BT I/O benchmark | ||
| 138 | + is now set to 0, indicating no file hints will be used. | ||
| 139 | + The setting can be changed by using the "inputbt.data" file. | ||
| 140 | + | ||
| 141 | + - The hyperplane version of LU (LU-HP) is no longer included | ||
| 142 | + in the distribution. | ||
| 143 | + | ||
| 144 | + | ||
| 145 | +------------------------------------------------------ | ||
| 146 | +Changes in NPB3.2.1 | ||
| 147 | + (NPB3.2-SER, NPB3.2-OMP, NPB3.2-MPI ) | ||
| 148 | +------------------------------------------------------ | ||
| 149 | +[27-Jul-05] | ||
| 150 | + | ||
| 151 | +This is a bug fixing release of NPB3.2. | ||
| 152 | + | ||
| 153 | +1. MPI version | ||
| 154 | + - sys/setparams.c: removed a duplicated statement for writing | ||
| 155 | + FT parameters and made invalid SUBTYPE as an error condition. | ||
| 156 | + The 'duplicated statement' problem was fixed in NPB3.2 (See | ||
| 157 | + the note below). However, during the final updating process, | ||
| 158 | + the fix was left out, even though the log file was updated. | ||
| 159 | + | ||
| 160 | + - BT: included SUBTYPE=EPIO in the I/O verification. | ||
| 161 | + | ||
| 162 | + - LU: bcast_inputs.f: fixed wrong data type (dp_type) used for | ||
| 163 | + communicating integers (nx0,ny0,nz0) with the correct type | ||
| 164 | + MPI_INTEGER. | ||
| 165 | + | ||
| 166 | + - MG: fixed a mis-calculation of parameter "nr" in globals.h | ||
| 167 | + that caused run-time failure for NPROCS >= 512 | ||
| 168 | + (reported by Donald Ferry of Cray). Expanded to limit to | ||
| 169 | + 131072 processes and added an error checking code. | ||
| 170 | + | ||
| 171 | + The use of MPI_ANY_SOURCE for MPI_Irecv inside subroutine | ||
| 172 | + ready() could cause MPI_Wait return a message meant for | ||
| 173 | + the wrong k. The problem is fixed with nbr(axis,-dir,k) | ||
| 174 | + in place of MPI_ANY_SOURCE in the call to MPI_Irecv | ||
| 175 | + (reported and suggested by Hideo Saito). | ||
| 176 | + | ||
| 177 | +2. OpenMP version | ||
| 178 | + - EP: use THREADPRIVATE for working array storage. It should not | ||
| 179 | + change performance but made some compiler happier. | ||
| 180 | + | ||
| 181 | + - LU: add variable "v" to FLUSH to ensure solution data properly | ||
| 182 | + flushed for pipeline. This change is needed according to | ||
| 183 | + the OpenMP 2.5 standard. | ||
| 184 | + | ||
| 185 | + - IS: reorganized working buffers so that the count for key | ||
| 186 | + population could be more naturally performed. This version | ||
| 187 | + uses much less stack space. | ||
| 188 | + | ||
| 189 | + - UA: implemented atomic updates with locks in order to achieve | ||
| 190 | + better scaling on those systems that have an inefficient | ||
| 191 | + (or even buggy) ATOMIC implementation. | ||
| 192 | + | ||
| 193 | + | ||
| 194 | +------------------------------------------------------ | ||
| 195 | +Changes in NPB3.2 | ||
| 196 | + (NPB3.2-SER, NPB3.2-OMP, NPB3.2-MPI ) | ||
| 197 | +------------------------------------------------------ | ||
| 198 | +[07-Jan-05] | ||
| 199 | + | ||
| 200 | +1. DC version in NPB3.2-SER was converted to C from C++ | ||
| 201 | + (CLASSES S, W, A, B). | ||
| 202 | + sys/setparams.c file was changed appropriately. | ||
| 203 | + | ||
| 204 | +2. OpenMP version of DC was added to NPB3.2-OMP. | ||
| 205 | + | ||
| 206 | +3. Data Traffic benchmark DT was added to NPB3.2-MPI. | ||
| 207 | + | ||
| 208 | +[24-May-04] | ||
| 209 | + | ||
| 210 | +All versions: | ||
| 211 | + - use assumed shape "(*)" declaration in CG | ||
| 212 | + - fixed the use of an uninitialized variable in EP | ||
| 213 | + - avoid using integer array for assumed shape dimensions in FT | ||
| 214 | + - fix in UA: | ||
| 215 | + * fix the reference to file "inputua.data" | ||
| 216 | + * avoid overindexing | ||
| 217 | + * avoid reference to out-of-bound array elements | ||
| 218 | + * change declaration "real*8" to "double precision" | ||
| 219 | + | ||
| 220 | +OMP version: | ||
| 221 | + - explicitly added "SCHEDULE(STATIC)" to the OMP version | ||
| 222 | + - use the "omp_get_wtime()" function for timer if available | ||
| 223 | + - removed the call to "getenv" for portability | ||
| 224 | + - change in UA: | ||
| 225 | + * implemented an alternative approach for atomic update | ||
| 226 | + | ||
| 227 | +MPI version: | ||
| 228 | + - removed a duplicated declaration in FT (from setparams.c) | ||
| 229 | + - removed a duplicated declaration in BT/full_mpiio.f | ||
| 230 | + - fixed a missing "NPROCS=" in sys/suite.awk | ||
| 231 | + | ||
| 232 | + | ||
| 233 | +------------------------------------------------------ | ||
| 234 | +Changes in NPB3.1 | ||
| 235 | + (NPB3.1-MPI, NPB3.1-SER, NPB3.1-OMP) | ||
| 236 | +------------------------------------------------------ | ||
| 237 | +[22-Apr-04] NPB3.1-MPI | ||
| 238 | + | ||
| 239 | +Merged the NPB2.4-MPI branch into NPB3.1 with the following changes. | ||
| 240 | + | ||
| 241 | + - Optimized the BT memory usage. The new version is about 1/3 of | ||
| 242 | + the memory used in NPB2.x. | ||
| 243 | + - Fixed a bug in CG for running on a large number of processes | ||
| 244 | + - Redefined the Class W size in MG so that the verification value | ||
| 245 | + will not be too small. (see below for SER & OMP versions) | ||
| 246 | + - Use the relative errors for verification in both CG and MG | ||
| 247 | + - Fixed a race in 'make suite' | ||
| 248 | + | ||
| 249 | +[08-Apr-04] NPB3.1-SER and NPB3.1-OMP | ||
| 250 | + | ||
| 251 | +The following changes are made in both NPB3.1-SER and NPB3.1-OMP. | ||
| 252 | + | ||
| 253 | +1. Added the Class D problem | ||
| 254 | + - verification values taken from NPB2.4-MPI | ||
| 255 | + - modified variables to fit in large problem | ||
| 256 | + | ||
| 257 | +2. Improvements for LU and LU-HP: | ||
| 258 | + - reduced the memory usage for the 'tv' variable in LU and LU-HP | ||
| 259 | + - a more efficient memory access for variables "a,b,c,d" in LU-HP | ||
| 260 | + - a dummy iteration added before the time step loop for consistency | ||
| 261 | + with other benchmarks | ||
| 262 | + | ||
| 263 | +3. Improvement and fix in MG: | ||
| 264 | + - verification in MG now uses the relative error | ||
| 265 | + (instead of the absolute error). This will avoid incorrect | ||
| 266 | + verification for small reference values. | ||
| 267 | + - redefined the class size for Class W so that the verification | ||
| 268 | + value will not be too small. | ||
| 269 | + In version 3.0 and earlier: 64x64x64, 40 iters | ||
| 270 | + New size in version 3.1 : 128x128x128, 4 iters | ||
| 271 | + - fixed incorrect verification values for Classes A and C. | ||
| 272 | + | ||
| 273 | +4. CG: | ||
| 274 | + - use relative error for verification | ||
| 275 | + - clean up codes for matrix initialization (makea). | ||
| 276 | + The new code uses about 1/2 memory of the previous version. | ||
| 277 | + | ||
| 278 | +5. Fixed makefile related issues | ||
| 279 | + - fixed dependence on make.def for files in common. | ||
| 280 | + - fixed a race in 'make suite' | ||
| 281 | + - added 'LU-HP' as a valid benchmark option in makefiles | ||
| 282 | + | ||
| 283 | +The following changes are made in NPB3.1-OMP. | ||
| 284 | + | ||
| 285 | +1. Included a hyper-plane version of the LU benchmark: LU-HP | ||
| 286 | + - based on the serial version | ||
| 287 | + | ||
| 288 | +2. The dummy 'omp_lib_dum' library is not longer used for compilation | ||
| 289 | + without an OpenMP compiler. Conditional compilation is now used. | ||
| 290 | + | ||
| 291 | +3. Parallelization of the initialization part of MG. | ||
| 292 | + It improves the turn-around time quite a bit for the larger | ||
| 293 | + classes, such as class D. | ||
| 294 | + | ||
| 295 | +4. Parallelize codes for matrix initialization (makea) in CG. | ||
| 296 | + The new code uses about 2/3 memory of the version in NPB3.0-OMP. | ||
| 297 | + | ||
| 298 | +5. Code clean up in SP so that the structure is more consistent | ||
| 299 | + with the serial version. | ||
| 300 | + | ||
| 301 | + | ||
| 302 | + | ||
| 303 | +------------------------------------------------------ | ||
| 304 | +Changes in NPB2.x MPI version | ||
| 305 | +------------------------------------------------------ | ||
| 306 | + | ||
| 307 | +Changes in 2.4.1 | ||
| 308 | +- fixed error in BT/Makefile (replaced "==" with "=") | ||
| 309 | +- added stub function accumulate_norms in BT/btio.f | ||
| 310 | +- changed type of Class B verification constants in BT/verify.f from | ||
| 311 | + single to double precision | ||
| 312 | + | ||
| 313 | +Changes in 2.4 | ||
| 314 | +- Added I/O benchmark (subtype of BT). | ||
| 315 | +- Added Class D for all benchmarks except IS. | ||
| 316 | +- Reduced size of tabulated exponentials in FT. | ||
| 317 | +- Made minor changes to FT to prevent integer overflow for class D on | ||
| 318 | + systems with 32-bit integers. FT class D will not run on small | ||
| 319 | + numbers of processors anymore. | ||
| 320 | + | ||
| 321 | + | ||
| 322 | +------------------------------------------------------ | ||
| 323 | +Changes in non-MPI versions of NPB (previously PBN3.0) | ||
| 324 | + (NPB3.0-SER, NPB3.0-HPF, NPB3.0-OMP, NPB3.0-JAV) | ||
| 325 | +------------------------------------------------------ | ||
| 326 | + | ||
| 327 | +[01-Mar-99] Initial Beta Release. | ||
| 328 | + | ||
| 329 | +[06-Apr-99] Based on report from Charles Grassl and Ramesh Menon (SGI). | ||
| 330 | + | ||
| 331 | + 1. NPB-SER, FT: file auxfnct.f - | ||
| 332 | + lines 74 and 75 were interchanged: | ||
| 333 | + | ||
| 334 | + double complex u0(d1+1,d2,d3), tmp(maxdim) | ||
| 335 | + integer d1,d2,d3 | ||
| 336 | + | ||
| 337 | + 2. NPB-OMP: The OpenMP standards requires reduction variable be scalars, | ||
| 338 | + thus, changes made to remove the use of array variable for reduction. | ||
| 339 | + Relevant modifications in EP, CG, LU, SP, and BT | ||
| 340 | + | ||
| 341 | + 3. NPB-OMP: Remove compiler warnings of "Referenced scalar variables | ||
| 342 | + use defaults" by declaring explicitly as shared. | ||
| 343 | + Relevant modifications in FT, LU, and BT | ||
| 344 | + | ||
| 345 | + 4. NPB-OMP, README.openmp: Explicitly spell out the requirement of | ||
| 346 | + the static scheduling (setenv OMP_SCHEDULE "static"). | ||
| 347 | + | ||
| 348 | + | ||
| 349 | +[05-Oct-99] NPB3.0-non-MPI Beta Release (02) | ||
| 350 | + | ||
| 351 | +General change to all (NPB-SER, NPB-HPF, NPB-OMP) - | ||
| 352 | + 1. Update header information for all benchmarks. | ||
| 353 | + | ||
| 354 | + 2. Allow continuation lines in 'make.def' (modification done | ||
| 355 | + in sys/setparams.c). | ||
| 356 | + | ||
| 357 | +Change made in NPB-OMP - | ||
| 358 | + 1. 'print_results' now prints Number-Of-Threads and Mflops/s/thread. | ||
| 359 | + The printed number is the activated threads during the run, which | ||
| 360 | + may not be the same as what's requested. | ||
| 361 | + | ||
| 362 | + 2. A initial data touch loop for array A is added in CG. | ||
| 363 | + | ||
| 364 | + 3. 'CRITICAL' section is used for reduction with array. | ||
| 365 | + Relevant changes in EP, CG, LU, SP, and BT. | ||
| 366 | + | ||
| 367 | + 4. Reconfigure 'make.def' such that 'omp_lib_dum' can be activated | ||
| 368 | + from the file for no directive compilation. | ||
| 369 | + | ||
| 370 | + 5. The "!$OMP END DO" seems needed before "!$OMP MASTER" in rhs.f | ||
| 371 | + for both BT and SP for some f90 compilers. | ||
| 372 | + | ||
| 373 | + 6. "SCHEDULE(STATIC)" are used for the pipeline in LU to ensure | ||
| 374 | + compliance with the OMP standard. | ||
| 375 | + | ||
| 376 | +Change made in NPB-HPF - | ||
| 377 | + 1. 'print_results' now prints Number-Of-Processes and Mflops/s/process. | ||
| 378 | + | ||
| 379 | + 2. Use more consistent output format (via print_results). | ||
| 380 | + | ||
| 381 | + 3. More consistent makefiles (via config/make.def). | ||
| 382 | + | ||
| 383 | + | ||
| 384 | +[04-Apr-00] NPB3.0-non-MPI Beta Release (03) | ||
| 385 | + | ||
| 386 | +Change made in NPB-OMP - | ||
| 387 | + 1. The OpenMP-C version of IS has been added, including more timers. | ||
| 388 | + | ||
| 389 | + 2. 'cprint_results' includes Number-Of-Threads and Mflops/s/thread. | ||
| 390 | + | ||
| 391 | +Change made in NPB-SER - | ||
| 392 | + 1. More timers included in IS. | ||
| 393 | + | ||
| 394 | +NPB-JAV has been included in NPB3.0-non-MPI. | ||
| 395 | + | ||
| 396 | + | ||
| 397 | +[31-May-01] NPB3.0-non-MPI Beta Release (04) | ||
| 398 | + | ||
| 399 | +Change made in NPB-OMP - | ||
| 400 | + 1. NPB-OMP/LU: Failure in verification for number of threads greater | ||
| 401 | + than the problem size is now fixed. | ||
| 402 | + | ||
| 403 | + 2. If OMP_NUM_THREADS is unset, the printout will report as "unset" | ||
| 404 | + instead of "1" | ||
| 405 | + | ||
| 406 | + 3. NPB-OMP/IS: Allocating work_buff on the stack seems to cause problem | ||
| 407 | + for large problem size (CLASS C). "work_buff" is now allocated | ||
| 408 | + by "malloc" on the heap for CLASS C. | ||
| 409 | + | ||
| 410 | + 4. NPB-OMP/IS: Reported by <RaeLyn.Crowell@compaq.com> - potential | ||
| 411 | + synchronization problem could arise due to the use of "static" | ||
| 412 | + variables inside "randlc()". Declaration of these static variables | ||
| 413 | + are moved out of randlc() and put in the threadprivate directive. | ||
| 414 | + | ||
| 415 | +General change to all (NPB-SER, NPB-HPF, NPB-OMP) - | ||
| 416 | + 1. Cleanup in makefiles | ||
| 417 | + | ||
| 418 | + | ||
| 419 | +[28-Aug-02] The Official NPB3.0 Release | ||
| 420 | + | ||
| 421 | +Change made in all - | ||
| 422 | + 1. Fixed a bogus verification for "NaN". | ||
| 423 | + | ||
| 424 | + 2. Name change from "PBN3.0" to "NPB3.0". Updated all the banners. | ||
| 425 | + | ||
| 426 | + 3. NPB-SER/FT: use a derived version from NPB2.3-serial. | ||
| 427 | + | ||
| 428 | + 4. NPB-HPF/FT: use a consistent printing format. |
NPB3.3-HPF.README
0 โ 100644
NPB3.3-JAV.README
0 โ 100644
NPB3.3-MPI/BT/Makefile
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/Makefile | ||
| 1 | +SHELL=/bin/sh | ||
| 2 | +BENCHMARK=bt | ||
| 3 | +BENCHMARKU=BT | ||
| 4 | +VEC= | ||
| 5 | + | ||
| 6 | +include ../config/make.def | ||
| 7 | + | ||
| 8 | + | ||
| 9 | +OBJS = bt.o make_set.o initialize.o exact_solution.o exact_rhs.o \ | ||
| 10 | + set_constants.o adi.o define.o copy_faces.o rhs.o solve_subs.o \ | ||
| 11 | + x_solve$(VEC).o y_solve$(VEC).o z_solve$(VEC).o add.o error.o \ | ||
| 12 | + verify.o setup_mpi.o \ | ||
| 13 | + ${COMMON}/print_results.o ${COMMON}/timers.o | ||
| 14 | + | ||
| 15 | +include ../sys/make.common | ||
| 16 | + | ||
| 17 | +# npbparams.h is included by header.h | ||
| 18 | +# The following rule should do the trick but many make programs (not gmake) | ||
| 19 | +# will do the wrong thing and rebuild the world every time (because the | ||
| 20 | +# mod time on header.h is not changed. One solution would be to | ||
| 21 | +# touch header.h but this might cause confusion if someone has | ||
| 22 | +# accidentally deleted it. Instead, make the dependency on npbparams.h | ||
| 23 | +# explicit in all the lines below (even though dependence is indirect). | ||
| 24 | + | ||
| 25 | +# header.h: npbparams.h | ||
| 26 | + | ||
| 27 | +${PROGRAM}: config | ||
| 28 | + @if [ x$(VERSION) = xvec ] ; then \ | ||
| 29 | + ${MAKE} VEC=_vec exec; \ | ||
| 30 | + elif [ x$(VERSION) = xVEC ] ; then \ | ||
| 31 | + ${MAKE} VEC=_vec exec; \ | ||
| 32 | + else \ | ||
| 33 | + ${MAKE} exec; \ | ||
| 34 | + fi | ||
| 35 | + | ||
| 36 | +exec: $(OBJS) | ||
| 37 | + @if [ x$(SUBTYPE) = xfull ] ; then \ | ||
| 38 | + ${MAKE} bt-full; \ | ||
| 39 | + elif [ x$(SUBTYPE) = xFULL ] ; then \ | ||
| 40 | + ${MAKE} bt-full; \ | ||
| 41 | + elif [ x$(SUBTYPE) = xsimple ] ; then \ | ||
| 42 | + ${MAKE} bt-simple; \ | ||
| 43 | + elif [ x$(SUBTYPE) = xSIMPLE ] ; then \ | ||
| 44 | + ${MAKE} bt-simple; \ | ||
| 45 | + elif [ x$(SUBTYPE) = xfortran ] ; then \ | ||
| 46 | + ${MAKE} bt-fortran; \ | ||
| 47 | + elif [ x$(SUBTYPE) = xFORTRAN ] ; then \ | ||
| 48 | + ${MAKE} bt-fortran; \ | ||
| 49 | + elif [ x$(SUBTYPE) = xepio ] ; then \ | ||
| 50 | + ${MAKE} bt-epio; \ | ||
| 51 | + elif [ x$(SUBTYPE) = xEPIO ] ; then \ | ||
| 52 | + ${MAKE} bt-epio; \ | ||
| 53 | + else \ | ||
| 54 | + ${MAKE} bt-bt; \ | ||
| 55 | + fi | ||
| 56 | + | ||
| 57 | +bt-bt: ${OBJS} btio.o | ||
| 58 | + ${FLINK} ${FLINKFLAGS} -o ${PROGRAM} ${OBJS} btio.o ${FMPI_LIB} | ||
| 59 | + | ||
| 60 | +bt-full: ${OBJS} full_mpiio.o btio_common.o | ||
| 61 | + ${FLINK} ${FLINKFLAGS} -o ${PROGRAM}.mpi_io_full ${OBJS} btio_common.o full_mpiio.o ${FMPI_LIB} | ||
| 62 | + | ||
| 63 | +bt-simple: ${OBJS} simple_mpiio.o btio_common.o | ||
| 64 | + ${FLINK} ${FLINKFLAGS} -o ${PROGRAM}.mpi_io_simple ${OBJS} btio_common.o simple_mpiio.o ${FMPI_LIB} | ||
| 65 | + | ||
| 66 | +bt-fortran: ${OBJS} fortran_io.o btio_common.o | ||
| 67 | + ${FLINK} ${FLINKFLAGS} -o ${PROGRAM}.fortran_io ${OBJS} btio_common.o fortran_io.o ${FMPI_LIB} | ||
| 68 | + | ||
| 69 | +bt-epio: ${OBJS} epio.o btio_common.o | ||
| 70 | + ${FLINK} ${FLINKFLAGS} -o ${PROGRAM}.ep_io ${OBJS} btio_common.o epio.o ${FMPI_LIB} | ||
| 71 | + | ||
| 72 | +.f.o: | ||
| 73 | + ${FCOMPILE} $< | ||
| 74 | + | ||
| 75 | +.c.o: | ||
| 76 | + ${CCOMPILE} $< | ||
| 77 | + | ||
| 78 | + | ||
| 79 | +bt.o: bt.f header.h npbparams.h mpinpb.h | ||
| 80 | +make_set.o: make_set.f header.h npbparams.h mpinpb.h | ||
| 81 | +initialize.o: initialize.f header.h npbparams.h | ||
| 82 | +exact_solution.o: exact_solution.f header.h npbparams.h | ||
| 83 | +exact_rhs.o: exact_rhs.f header.h npbparams.h | ||
| 84 | +set_constants.o: set_constants.f header.h npbparams.h | ||
| 85 | +adi.o: adi.f header.h npbparams.h | ||
| 86 | +define.o: define.f header.h npbparams.h | ||
| 87 | +copy_faces.o: copy_faces.f header.h npbparams.h mpinpb.h | ||
| 88 | +rhs.o: rhs.f header.h npbparams.h | ||
| 89 | +x_solve$(VEC).o: x_solve$(VEC).f header.h work_lhs$(VEC).h npbparams.h mpinpb.h | ||
| 90 | +y_solve$(VEC).o: y_solve$(VEC).f header.h work_lhs$(VEC).h npbparams.h mpinpb.h | ||
| 91 | +z_solve$(VEC).o: z_solve$(VEC).f header.h work_lhs$(VEC).h npbparams.h mpinpb.h | ||
| 92 | +solve_subs.o: solve_subs.f npbparams.h | ||
| 93 | +add.o: add.f header.h npbparams.h | ||
| 94 | +error.o: error.f header.h npbparams.h mpinpb.h | ||
| 95 | +verify.o: verify.f header.h npbparams.h mpinpb.h | ||
| 96 | +setup_mpi.o: setup_mpi.f mpinpb.h npbparams.h | ||
| 97 | +btio.o: btio.f header.h npbparams.h | ||
| 98 | +btio_common.o: btio_common.f mpinpb.h npbparams.h | ||
| 99 | +fortran_io.o: fortran_io.f mpinpb.h npbparams.h | ||
| 100 | +simple_mpiio.o: simple_mpiio.f mpinpb.h npbparams.h | ||
| 101 | +full_mpiio.o: full_mpiio.f mpinpb.h npbparams.h | ||
| 102 | +epio.o: epio.f mpinpb.h npbparams.h | ||
| 103 | + | ||
| 104 | +clean: | ||
| 105 | + - rm -f *.o *~ mputil* | ||
| 106 | + - rm -f npbparams.h core |
NPB3.3-MPI/BT/add.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/add.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine add | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | +c--------------------------------------------------------------------- | ||
| 10 | +c addition of update to the vector u | ||
| 11 | +c--------------------------------------------------------------------- | ||
| 12 | + | ||
| 13 | + include 'header.h' | ||
| 14 | + | ||
| 15 | + integer c, i, j, k, m | ||
| 16 | + | ||
| 17 | + do c = 1, ncells | ||
| 18 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 19 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 20 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 21 | + do m = 1, 5 | ||
| 22 | + u(m,i,j,k,c) = u(m,i,j,k,c) + rhs(m,i,j,k,c) | ||
| 23 | + enddo | ||
| 24 | + enddo | ||
| 25 | + enddo | ||
| 26 | + enddo | ||
| 27 | + enddo | ||
| 28 | + | ||
| 29 | + return | ||
| 30 | + end |
NPB3.3-MPI/BT/adi.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/adi.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine adi | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | + call copy_faces | ||
| 10 | + | ||
| 11 | + call x_solve | ||
| 12 | + | ||
| 13 | + call y_solve | ||
| 14 | + | ||
| 15 | + call z_solve | ||
| 16 | + | ||
| 17 | + call add | ||
| 18 | + | ||
| 19 | + return | ||
| 20 | + end | ||
| 21 | + |
NPB3.3-MPI/BT/bt.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/bt.f | ||
| 1 | +!-------------------------------------------------------------------------! | ||
| 2 | +! ! | ||
| 3 | +! N A S P A R A L L E L B E N C H M A R K S 3.3 ! | ||
| 4 | +! ! | ||
| 5 | +! B T ! | ||
| 6 | +! ! | ||
| 7 | +!-------------------------------------------------------------------------! | ||
| 8 | +! ! | ||
| 9 | +! This benchmark is part of the NAS Parallel Benchmark 3.3 suite. ! | ||
| 10 | +! It is described in NAS Technical Reports 95-020 and 02-007. ! | ||
| 11 | +! ! | ||
| 12 | +! Permission to use, copy, distribute and modify this software ! | ||
| 13 | +! for any purpose with or without fee is hereby granted. We ! | ||
| 14 | +! request, however, that all derived work reference the NAS ! | ||
| 15 | +! Parallel Benchmarks 3.3. This software is provided "as is" ! | ||
| 16 | +! without express or implied warranty. ! | ||
| 17 | +! ! | ||
| 18 | +! Information on NPB 3.3, including the technical report, the ! | ||
| 19 | +! original specifications, source code, results and information ! | ||
| 20 | +! on how to submit new results, is available at: ! | ||
| 21 | +! ! | ||
| 22 | +! http://www.nas.nasa.gov/Software/NPB/ ! | ||
| 23 | +! ! | ||
| 24 | +! Send comments or suggestions to npb@nas.nasa.gov ! | ||
| 25 | +! ! | ||
| 26 | +! NAS Parallel Benchmarks Group ! | ||
| 27 | +! NASA Ames Research Center ! | ||
| 28 | +! Mail Stop: T27A-1 ! | ||
| 29 | +! Moffett Field, CA 94035-1000 ! | ||
| 30 | +! ! | ||
| 31 | +! E-mail: npb@nas.nasa.gov ! | ||
| 32 | +! Fax: (650) 604-3957 ! | ||
| 33 | +! ! | ||
| 34 | +!-------------------------------------------------------------------------! | ||
| 35 | + | ||
| 36 | +c--------------------------------------------------------------------- | ||
| 37 | +c | ||
| 38 | +c Authors: R. F. Van der Wijngaart | ||
| 39 | +c T. Harris | ||
| 40 | +c M. Yarrow | ||
| 41 | +c | ||
| 42 | +c--------------------------------------------------------------------- | ||
| 43 | + | ||
| 44 | +c--------------------------------------------------------------------- | ||
| 45 | + program MPBT | ||
| 46 | +c--------------------------------------------------------------------- | ||
| 47 | + | ||
| 48 | + include 'header.h' | ||
| 49 | + include 'mpinpb.h' | ||
| 50 | + | ||
| 51 | + integer i, niter, step, c, error, fstatus | ||
| 52 | + double precision navg, mflops, mbytes, n3 | ||
| 53 | + | ||
| 54 | + external timer_read | ||
| 55 | + double precision t, tmax, tiominv, tpc, timer_read | ||
| 56 | + logical verified | ||
| 57 | + character class, cbuff*40 | ||
| 58 | + double precision t1(t_last+2), tsum(t_last+2), | ||
| 59 | + > tming(t_last+2), tmaxg(t_last+2) | ||
| 60 | + character t_recs(t_last+2)*8 | ||
| 61 | + | ||
| 62 | + integer wr_interval | ||
| 63 | + | ||
| 64 | + data t_recs/'total', 'i/o', 'rhs', 'xsolve', 'ysolve', 'zsolve', | ||
| 65 | + > 'bpack', 'exch', 'xcomm', 'ycomm', 'zcomm', | ||
| 66 | + > ' totcomp', ' totcomm'/ | ||
| 67 | + | ||
| 68 | + call setup_mpi | ||
| 69 | + if (.not. active) goto 999 | ||
| 70 | + | ||
| 71 | +c--------------------------------------------------------------------- | ||
| 72 | +c Root node reads input file (if it exists) else takes | ||
| 73 | +c defaults from parameters | ||
| 74 | +c--------------------------------------------------------------------- | ||
| 75 | + if (node .eq. root) then | ||
| 76 | + | ||
| 77 | + write(*, 1000) | ||
| 78 | + | ||
| 79 | + open (unit=2,file='timer.flag',status='old',iostat=fstatus) | ||
| 80 | + timeron = .false. | ||
| 81 | + if (fstatus .eq. 0) then | ||
| 82 | + timeron = .true. | ||
| 83 | + close(2) | ||
| 84 | + endif | ||
| 85 | + | ||
| 86 | + open (unit=2,file='inputbt.data',status='old', iostat=fstatus) | ||
| 87 | +c | ||
| 88 | + rd_interval = 0 | ||
| 89 | + if (fstatus .eq. 0) then | ||
| 90 | + write(*,233) | ||
| 91 | + 233 format(' Reading from input file inputbt.data') | ||
| 92 | + read (2,*) niter | ||
| 93 | + read (2,*) dt | ||
| 94 | + read (2,*) grid_points(1), grid_points(2), grid_points(3) | ||
| 95 | + if (iotype .ne. 0) then | ||
| 96 | + read (2,'(A)') cbuff | ||
| 97 | + read (cbuff,*,iostat=i) wr_interval, rd_interval | ||
| 98 | + if (i .ne. 0) rd_interval = 0 | ||
| 99 | + if (wr_interval .le. 0) wr_interval = wr_default | ||
| 100 | + endif | ||
| 101 | + if (iotype .eq. 1) then | ||
| 102 | + read (2,*) collbuf_nodes, collbuf_size | ||
| 103 | + write(*,*) 'collbuf_nodes ', collbuf_nodes | ||
| 104 | + write(*,*) 'collbuf_size ', collbuf_size | ||
| 105 | + endif | ||
| 106 | + close(2) | ||
| 107 | + else | ||
| 108 | + write(*,234) | ||
| 109 | + niter = niter_default | ||
| 110 | + dt = dt_default | ||
| 111 | + grid_points(1) = problem_size | ||
| 112 | + grid_points(2) = problem_size | ||
| 113 | + grid_points(3) = problem_size | ||
| 114 | + wr_interval = wr_default | ||
| 115 | + if (iotype .eq. 1) then | ||
| 116 | +c set number of nodes involved in collective buffering to 4, | ||
| 117 | +c unless total number of nodes is smaller than that. | ||
| 118 | +c set buffer size for collective buffering to 1MB per node | ||
| 119 | +c collbuf_nodes = min(4,no_nodes) | ||
| 120 | +c set default to No-File-Hints with a value of 0 | ||
| 121 | + collbuf_nodes = 0 | ||
| 122 | + collbuf_size = 1000000 | ||
| 123 | + endif | ||
| 124 | + endif | ||
| 125 | + 234 format(' No input file inputbt.data. Using compiled defaults') | ||
| 126 | + | ||
| 127 | + write(*, 1001) grid_points(1), grid_points(2), grid_points(3) | ||
| 128 | + write(*, 1002) niter, dt | ||
| 129 | + if (no_nodes .ne. total_nodes) write(*, 1004) total_nodes | ||
| 130 | + if (no_nodes .ne. maxcells*maxcells) | ||
| 131 | + > write(*, 1005) maxcells*maxcells | ||
| 132 | + write(*, 1003) no_nodes | ||
| 133 | + | ||
| 134 | + if (iotype .eq. 1) write(*, 1006) 'FULL MPI-IO', wr_interval | ||
| 135 | + if (iotype .eq. 2) write(*, 1006) 'SIMPLE MPI-IO', wr_interval | ||
| 136 | + if (iotype .eq. 3) write(*, 1006) 'EPIO', wr_interval | ||
| 137 | + if (iotype .eq. 4) write(*, 1006) 'FORTRAN IO', wr_interval | ||
| 138 | + | ||
| 139 | + 1000 format(//, ' NAS Parallel Benchmarks 3.3 -- BT Benchmark ',/) | ||
| 140 | + 1001 format(' Size: ', i4, 'x', i4, 'x', i4) | ||
| 141 | + 1002 format(' Iterations: ', i4, ' dt: ', F11.7) | ||
| 142 | + 1004 format(' Total number of processes: ', i5) | ||
| 143 | + 1005 format(' WARNING: compiled for ', i5, ' processes ') | ||
| 144 | + 1003 format(' Number of active processes: ', i5, /) | ||
| 145 | + 1006 format(' BTIO -- ', A, ' write interval: ', i3 /) | ||
| 146 | + | ||
| 147 | + endif | ||
| 148 | + | ||
| 149 | + call mpi_bcast(niter, 1, MPI_INTEGER, | ||
| 150 | + > root, comm_setup, error) | ||
| 151 | + | ||
| 152 | + call mpi_bcast(dt, 1, dp_type, | ||
| 153 | + > root, comm_setup, error) | ||
| 154 | + | ||
| 155 | + call mpi_bcast(grid_points(1), 3, MPI_INTEGER, | ||
| 156 | + > root, comm_setup, error) | ||
| 157 | + | ||
| 158 | + call mpi_bcast(wr_interval, 1, MPI_INTEGER, | ||
| 159 | + > root, comm_setup, error) | ||
| 160 | + | ||
| 161 | + call mpi_bcast(rd_interval, 1, MPI_INTEGER, | ||
| 162 | + > root, comm_setup, error) | ||
| 163 | + | ||
| 164 | + call mpi_bcast(timeron, 1, MPI_LOGICAL, | ||
| 165 | + > root, comm_setup, error) | ||
| 166 | + | ||
| 167 | + call make_set | ||
| 168 | + | ||
| 169 | + do c = 1, maxcells | ||
| 170 | + if ( (cell_size(1,c) .gt. IMAX) .or. | ||
| 171 | + > (cell_size(2,c) .gt. JMAX) .or. | ||
| 172 | + > (cell_size(3,c) .gt. KMAX) ) then | ||
| 173 | + print *,node, c, (cell_size(i,c),i=1,3) | ||
| 174 | + print *,' Problem size too big for compiled array sizes' | ||
| 175 | + goto 999 | ||
| 176 | + endif | ||
| 177 | + end do | ||
| 178 | + | ||
| 179 | + do i = 1, t_last | ||
| 180 | + call timer_clear(i) | ||
| 181 | + end do | ||
| 182 | + | ||
| 183 | + call set_constants | ||
| 184 | + | ||
| 185 | + call initialize | ||
| 186 | + | ||
| 187 | + call setup_btio | ||
| 188 | + idump = 0 | ||
| 189 | + | ||
| 190 | + call lhsinit | ||
| 191 | + | ||
| 192 | + call exact_rhs | ||
| 193 | + | ||
| 194 | + call compute_buffer_size(5) | ||
| 195 | + | ||
| 196 | +c--------------------------------------------------------------------- | ||
| 197 | +c do one time step to touch all code, and reinitialize | ||
| 198 | +c--------------------------------------------------------------------- | ||
| 199 | + call adi | ||
| 200 | + call initialize | ||
| 201 | + | ||
| 202 | +c--------------------------------------------------------------------- | ||
| 203 | +c Synchronize before placing time stamp | ||
| 204 | +c--------------------------------------------------------------------- | ||
| 205 | + do i = 1, t_last | ||
| 206 | + call timer_clear(i) | ||
| 207 | + end do | ||
| 208 | + call mpi_barrier(comm_setup, error) | ||
| 209 | + | ||
| 210 | + call timer_start(1) | ||
| 211 | + | ||
| 212 | + do step = 1, niter | ||
| 213 | + | ||
| 214 | + if (node .eq. root) then | ||
| 215 | + if (mod(step, 20) .eq. 0 .or. step .eq. niter .or. | ||
| 216 | + > step .eq. 1) then | ||
| 217 | + write(*, 200) step | ||
| 218 | + 200 format(' Time step ', i4) | ||
| 219 | + endif | ||
| 220 | + endif | ||
| 221 | + | ||
| 222 | + call adi | ||
| 223 | + | ||
| 224 | + if (iotype .ne. 0) then | ||
| 225 | + if (mod(step, wr_interval).eq.0 .or. step .eq. niter) then | ||
| 226 | + if (node .eq. root) then | ||
| 227 | + print *, 'Writing data set, time step', step | ||
| 228 | + endif | ||
| 229 | + if (step .eq. niter .and. rd_interval .gt. 1) then | ||
| 230 | + rd_interval = 1 | ||
| 231 | + endif | ||
| 232 | + call timer_start(2) | ||
| 233 | + call output_timestep | ||
| 234 | + call timer_stop(2) | ||
| 235 | + idump = idump + 1 | ||
| 236 | + endif | ||
| 237 | + endif | ||
| 238 | + end do | ||
| 239 | + | ||
| 240 | + call timer_start(2) | ||
| 241 | + call btio_cleanup | ||
| 242 | + call timer_stop(2) | ||
| 243 | + | ||
| 244 | + call timer_stop(1) | ||
| 245 | + t = timer_read(1) | ||
| 246 | + | ||
| 247 | + call verify(niter, class, verified) | ||
| 248 | + | ||
| 249 | + call mpi_reduce(t, tmax, 1, | ||
| 250 | + > dp_type, MPI_MAX, | ||
| 251 | + > root, comm_setup, error) | ||
| 252 | + | ||
| 253 | + if (iotype .ne. 0) then | ||
| 254 | + t = timer_read(2) | ||
| 255 | + if (t .ne. 0.d0) t = 1.0d0 / t | ||
| 256 | + call mpi_reduce(t, tiominv, 1, | ||
| 257 | + > dp_type, MPI_SUM, | ||
| 258 | + > root, comm_setup, error) | ||
| 259 | + endif | ||
| 260 | + | ||
| 261 | + if( node .eq. root ) then | ||
| 262 | + n3 = 1.0d0*grid_points(1)*grid_points(2)*grid_points(3) | ||
| 263 | + navg = (grid_points(1)+grid_points(2)+grid_points(3))/3.0 | ||
| 264 | + if( tmax .ne. 0. ) then | ||
| 265 | + mflops = 1.0e-6*float(niter)* | ||
| 266 | + > (3478.8*n3-17655.7*navg**2+28023.7*navg) | ||
| 267 | + > / tmax | ||
| 268 | + else | ||
| 269 | + mflops = 0.0 | ||
| 270 | + endif | ||
| 271 | + | ||
| 272 | + if (iotype .ne. 0) then | ||
| 273 | + mbytes = n3 * 40.0 * idump * 1.0d-6 | ||
| 274 | + tiominv = tiominv / no_nodes | ||
| 275 | + t = 0.0 | ||
| 276 | + if (tiominv .ne. 0.) t = 1.d0 / tiominv | ||
| 277 | + tpc = 0.0 | ||
| 278 | + if (tmax .ne. 0.) tpc = t * 100.0 / tmax | ||
| 279 | + write(*,1100) t, tpc, mbytes, mbytes*tiominv | ||
| 280 | + 1100 format(/' BTIO -- statistics:'/ | ||
| 281 | + > ' I/O timing in seconds : ', f14.2/ | ||
| 282 | + > ' I/O timing percentage : ', f14.2/ | ||
| 283 | + > ' Total data written (MB) : ', f14.2/ | ||
| 284 | + > ' I/O data rate (MB/sec) : ', f14.2) | ||
| 285 | + endif | ||
| 286 | + | ||
| 287 | + call print_results('BT', class, grid_points(1), | ||
| 288 | + > grid_points(2), grid_points(3), niter, maxcells*maxcells, | ||
| 289 | + > total_nodes, tmax, mflops, ' floating point', | ||
| 290 | + > verified, npbversion,compiletime, cs1, cs2, cs3, cs4, cs5, | ||
| 291 | + > cs6, '(none)') | ||
| 292 | + endif | ||
| 293 | + | ||
| 294 | + if (.not.timeron) goto 999 | ||
| 295 | + | ||
| 296 | + do i = 1, t_last | ||
| 297 | + t1(i) = timer_read(i) | ||
| 298 | + end do | ||
| 299 | + t1(t_xsolve) = t1(t_xsolve) - t1(t_xcomm) | ||
| 300 | + t1(t_ysolve) = t1(t_ysolve) - t1(t_ycomm) | ||
| 301 | + t1(t_zsolve) = t1(t_zsolve) - t1(t_zcomm) | ||
| 302 | + t1(t_last+2) = t1(t_xcomm)+t1(t_ycomm)+t1(t_zcomm)+t1(t_exch) | ||
| 303 | + t1(t_last+1) = t1(t_total) - t1(t_last+2) | ||
| 304 | + | ||
| 305 | + call MPI_Reduce(t1, tsum, t_last+2, dp_type, MPI_SUM, | ||
| 306 | + > 0, comm_setup, error) | ||
| 307 | + call MPI_Reduce(t1, tming, t_last+2, dp_type, MPI_MIN, | ||
| 308 | + > 0, comm_setup, error) | ||
| 309 | + call MPI_Reduce(t1, tmaxg, t_last+2, dp_type, MPI_MAX, | ||
| 310 | + > 0, comm_setup, error) | ||
| 311 | + | ||
| 312 | + if (node .eq. 0) then | ||
| 313 | + write(*, 800) total_nodes | ||
| 314 | + do i = 1, t_last+2 | ||
| 315 | + tsum(i) = tsum(i) / total_nodes | ||
| 316 | + write(*, 810) i, t_recs(i), tming(i), tmaxg(i), tsum(i) | ||
| 317 | + end do | ||
| 318 | + endif | ||
| 319 | + 800 format(' nprocs =', i6, 11x, 'minimum', 5x, 'maximum', | ||
| 320 | + > 5x, 'average') | ||
| 321 | + 810 format(' timer ', i2, '(', A8, ') :', 3(2x,f10.4)) | ||
| 322 | + | ||
| 323 | + 999 continue | ||
| 324 | + call mpi_barrier(MPI_COMM_WORLD, error) | ||
| 325 | + call mpi_finalize(error) | ||
| 326 | + | ||
| 327 | + end | ||
| 328 | + |
NPB3.3-MPI/BT/btio.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/btio.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine setup_btio | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | + return | ||
| 11 | + end | ||
| 12 | + | ||
| 13 | +c--------------------------------------------------------------------- | ||
| 14 | +c--------------------------------------------------------------------- | ||
| 15 | + | ||
| 16 | + subroutine output_timestep | ||
| 17 | + | ||
| 18 | +c--------------------------------------------------------------------- | ||
| 19 | +c--------------------------------------------------------------------- | ||
| 20 | + | ||
| 21 | + return | ||
| 22 | + end | ||
| 23 | + | ||
| 24 | +c--------------------------------------------------------------------- | ||
| 25 | +c--------------------------------------------------------------------- | ||
| 26 | + | ||
| 27 | + subroutine btio_cleanup | ||
| 28 | + | ||
| 29 | +c--------------------------------------------------------------------- | ||
| 30 | +c--------------------------------------------------------------------- | ||
| 31 | + | ||
| 32 | + return | ||
| 33 | + end | ||
| 34 | + | ||
| 35 | +c--------------------------------------------------------------------- | ||
| 36 | +c--------------------------------------------------------------------- | ||
| 37 | + | ||
| 38 | + subroutine btio_verify(verified) | ||
| 39 | + | ||
| 40 | +c--------------------------------------------------------------------- | ||
| 41 | +c--------------------------------------------------------------------- | ||
| 42 | + | ||
| 43 | + logical verified | ||
| 44 | + | ||
| 45 | + verified = .true. | ||
| 46 | + | ||
| 47 | + return | ||
| 48 | + end | ||
| 49 | + | ||
| 50 | +c--------------------------------------------------------------------- | ||
| 51 | +c--------------------------------------------------------------------- | ||
| 52 | + | ||
| 53 | + subroutine accumulate_norms(xce_acc) | ||
| 54 | + | ||
| 55 | +c--------------------------------------------------------------------- | ||
| 56 | +c--------------------------------------------------------------------- | ||
| 57 | + | ||
| 58 | + double precision xce_acc(5) | ||
| 59 | + | ||
| 60 | + return | ||
| 61 | + end | ||
| 62 | + | ||
| 63 | +c--------------------------------------------------------------------- | ||
| 64 | +c--------------------------------------------------------------------- | ||
| 65 | + | ||
| 66 | + subroutine checksum_timestep | ||
| 67 | + | ||
| 68 | +c--------------------------------------------------------------------- | ||
| 69 | +c--------------------------------------------------------------------- | ||
| 70 | + | ||
| 71 | + return | ||
| 72 | + end |
NPB3.3-MPI/BT/btio_common.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/btio_common.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine clear_timestep | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | + include 'header.h' | ||
| 10 | + include 'mpinpb.h' | ||
| 11 | + | ||
| 12 | + integer cio, kio, jio, ix | ||
| 13 | + | ||
| 14 | + do cio=1,ncells | ||
| 15 | + do kio=0, cell_size(3,cio)-1 | ||
| 16 | + do jio=0, cell_size(2,cio)-1 | ||
| 17 | + do ix=0,cell_size(1,cio)-1 | ||
| 18 | + u(1,ix, jio,kio,cio) = 0 | ||
| 19 | + u(2,ix, jio,kio,cio) = 0 | ||
| 20 | + u(3,ix, jio,kio,cio) = 0 | ||
| 21 | + u(4,ix, jio,kio,cio) = 0 | ||
| 22 | + u(5,ix, jio,kio,cio) = 0 | ||
| 23 | + enddo | ||
| 24 | + enddo | ||
| 25 | + enddo | ||
| 26 | + enddo | ||
| 27 | + | ||
| 28 | + return | ||
| 29 | + end | ||
| 30 | + |
NPB3.3-MPI/BT/copy_faces.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/copy_faces.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine copy_faces | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | +c--------------------------------------------------------------------- | ||
| 10 | +c | ||
| 11 | +c This function copies the face values of a variable defined on a set | ||
| 12 | +c of cells to the overlap locations of the adjacent sets of cells. | ||
| 13 | +c Because a set of cells interfaces in each direction with exactly one | ||
| 14 | +c other set, we only need to fill six different buffers. We could try to | ||
| 15 | +c overlap communication with computation, by computing | ||
| 16 | +c some internal values while communicating boundary values, but this | ||
| 17 | +c adds so much overhead that it's not clearly useful. | ||
| 18 | +c--------------------------------------------------------------------- | ||
| 19 | + | ||
| 20 | + include 'header.h' | ||
| 21 | + include 'mpinpb.h' | ||
| 22 | + | ||
| 23 | + integer i, j, k, c, m, requests(0:11), p0, p1, | ||
| 24 | + > p2, p3, p4, p5, b_size(0:5), ss(0:5), | ||
| 25 | + > sr(0:5), error, statuses(MPI_STATUS_SIZE, 0:11) | ||
| 26 | + | ||
| 27 | +c--------------------------------------------------------------------- | ||
| 28 | +c exit immediately if there are no faces to be copied | ||
| 29 | +c--------------------------------------------------------------------- | ||
| 30 | + if (no_nodes .eq. 1) then | ||
| 31 | + call compute_rhs | ||
| 32 | + return | ||
| 33 | + endif | ||
| 34 | + | ||
| 35 | + ss(0) = start_send_east | ||
| 36 | + ss(1) = start_send_west | ||
| 37 | + ss(2) = start_send_north | ||
| 38 | + ss(3) = start_send_south | ||
| 39 | + ss(4) = start_send_top | ||
| 40 | + ss(5) = start_send_bottom | ||
| 41 | + | ||
| 42 | + sr(0) = start_recv_east | ||
| 43 | + sr(1) = start_recv_west | ||
| 44 | + sr(2) = start_recv_north | ||
| 45 | + sr(3) = start_recv_south | ||
| 46 | + sr(4) = start_recv_top | ||
| 47 | + sr(5) = start_recv_bottom | ||
| 48 | + | ||
| 49 | + b_size(0) = east_size | ||
| 50 | + b_size(1) = west_size | ||
| 51 | + b_size(2) = north_size | ||
| 52 | + b_size(3) = south_size | ||
| 53 | + b_size(4) = top_size | ||
| 54 | + b_size(5) = bottom_size | ||
| 55 | + | ||
| 56 | +c--------------------------------------------------------------------- | ||
| 57 | +c because the difference stencil for the diagonalized scheme is | ||
| 58 | +c orthogonal, we do not have to perform the staged copying of faces, | ||
| 59 | +c but can send all face information simultaneously to the neighboring | ||
| 60 | +c cells in all directions | ||
| 61 | +c--------------------------------------------------------------------- | ||
| 62 | + if (timeron) call timer_start(t_bpack) | ||
| 63 | + p0 = 0 | ||
| 64 | + p1 = 0 | ||
| 65 | + p2 = 0 | ||
| 66 | + p3 = 0 | ||
| 67 | + p4 = 0 | ||
| 68 | + p5 = 0 | ||
| 69 | + | ||
| 70 | + do c = 1, ncells | ||
| 71 | + | ||
| 72 | +c--------------------------------------------------------------------- | ||
| 73 | +c fill the buffer to be sent to eastern neighbors (i-dir) | ||
| 74 | +c--------------------------------------------------------------------- | ||
| 75 | + if (cell_coord(1,c) .ne. ncells) then | ||
| 76 | + do k = 0, cell_size(3,c)-1 | ||
| 77 | + do j = 0, cell_size(2,c)-1 | ||
| 78 | + do i = cell_size(1,c)-2, cell_size(1,c)-1 | ||
| 79 | + do m = 1, 5 | ||
| 80 | + out_buffer(ss(0)+p0) = u(m,i,j,k,c) | ||
| 81 | + p0 = p0 + 1 | ||
| 82 | + end do | ||
| 83 | + end do | ||
| 84 | + end do | ||
| 85 | + end do | ||
| 86 | + endif | ||
| 87 | + | ||
| 88 | +c--------------------------------------------------------------------- | ||
| 89 | +c fill the buffer to be sent to western neighbors | ||
| 90 | +c--------------------------------------------------------------------- | ||
| 91 | + if (cell_coord(1,c) .ne. 1) then | ||
| 92 | + do k = 0, cell_size(3,c)-1 | ||
| 93 | + do j = 0, cell_size(2,c)-1 | ||
| 94 | + do i = 0, 1 | ||
| 95 | + do m = 1, 5 | ||
| 96 | + out_buffer(ss(1)+p1) = u(m,i,j,k,c) | ||
| 97 | + p1 = p1 + 1 | ||
| 98 | + end do | ||
| 99 | + end do | ||
| 100 | + end do | ||
| 101 | + end do | ||
| 102 | + | ||
| 103 | + endif | ||
| 104 | + | ||
| 105 | +c--------------------------------------------------------------------- | ||
| 106 | +c fill the buffer to be sent to northern neighbors (j_dir) | ||
| 107 | +c--------------------------------------------------------------------- | ||
| 108 | + if (cell_coord(2,c) .ne. ncells) then | ||
| 109 | + do k = 0, cell_size(3,c)-1 | ||
| 110 | + do j = cell_size(2,c)-2, cell_size(2,c)-1 | ||
| 111 | + do i = 0, cell_size(1,c)-1 | ||
| 112 | + do m = 1, 5 | ||
| 113 | + out_buffer(ss(2)+p2) = u(m,i,j,k,c) | ||
| 114 | + p2 = p2 + 1 | ||
| 115 | + end do | ||
| 116 | + end do | ||
| 117 | + end do | ||
| 118 | + end do | ||
| 119 | + endif | ||
| 120 | + | ||
| 121 | +c--------------------------------------------------------------------- | ||
| 122 | +c fill the buffer to be sent to southern neighbors | ||
| 123 | +c--------------------------------------------------------------------- | ||
| 124 | + if (cell_coord(2,c).ne. 1) then | ||
| 125 | + do k = 0, cell_size(3,c)-1 | ||
| 126 | + do j = 0, 1 | ||
| 127 | + do i = 0, cell_size(1,c)-1 | ||
| 128 | + do m = 1, 5 | ||
| 129 | + out_buffer(ss(3)+p3) = u(m,i,j,k,c) | ||
| 130 | + p3 = p3 + 1 | ||
| 131 | + end do | ||
| 132 | + end do | ||
| 133 | + end do | ||
| 134 | + end do | ||
| 135 | + endif | ||
| 136 | + | ||
| 137 | +c--------------------------------------------------------------------- | ||
| 138 | +c fill the buffer to be sent to top neighbors (k-dir) | ||
| 139 | +c--------------------------------------------------------------------- | ||
| 140 | + if (cell_coord(3,c) .ne. ncells) then | ||
| 141 | + do k = cell_size(3,c)-2, cell_size(3,c)-1 | ||
| 142 | + do j = 0, cell_size(2,c)-1 | ||
| 143 | + do i = 0, cell_size(1,c)-1 | ||
| 144 | + do m = 1, 5 | ||
| 145 | + out_buffer(ss(4)+p4) = u(m,i,j,k,c) | ||
| 146 | + p4 = p4 + 1 | ||
| 147 | + end do | ||
| 148 | + end do | ||
| 149 | + end do | ||
| 150 | + end do | ||
| 151 | + endif | ||
| 152 | + | ||
| 153 | +c--------------------------------------------------------------------- | ||
| 154 | +c fill the buffer to be sent to bottom neighbors | ||
| 155 | +c--------------------------------------------------------------------- | ||
| 156 | + if (cell_coord(3,c).ne. 1) then | ||
| 157 | + do k=0, 1 | ||
| 158 | + do j = 0, cell_size(2,c)-1 | ||
| 159 | + do i = 0, cell_size(1,c)-1 | ||
| 160 | + do m = 1, 5 | ||
| 161 | + out_buffer(ss(5)+p5) = u(m,i,j,k,c) | ||
| 162 | + p5 = p5 + 1 | ||
| 163 | + end do | ||
| 164 | + end do | ||
| 165 | + end do | ||
| 166 | + end do | ||
| 167 | + endif | ||
| 168 | + | ||
| 169 | +c--------------------------------------------------------------------- | ||
| 170 | +c cell loop | ||
| 171 | +c--------------------------------------------------------------------- | ||
| 172 | + end do | ||
| 173 | + if (timeron) call timer_stop(t_bpack) | ||
| 174 | + | ||
| 175 | + if (timeron) call timer_start(t_exch) | ||
| 176 | + call mpi_irecv(in_buffer(sr(0)), b_size(0), | ||
| 177 | + > dp_type, successor(1), WEST, | ||
| 178 | + > comm_rhs, requests(0), error) | ||
| 179 | + call mpi_irecv(in_buffer(sr(1)), b_size(1), | ||
| 180 | + > dp_type, predecessor(1), EAST, | ||
| 181 | + > comm_rhs, requests(1), error) | ||
| 182 | + call mpi_irecv(in_buffer(sr(2)), b_size(2), | ||
| 183 | + > dp_type, successor(2), SOUTH, | ||
| 184 | + > comm_rhs, requests(2), error) | ||
| 185 | + call mpi_irecv(in_buffer(sr(3)), b_size(3), | ||
| 186 | + > dp_type, predecessor(2), NORTH, | ||
| 187 | + > comm_rhs, requests(3), error) | ||
| 188 | + call mpi_irecv(in_buffer(sr(4)), b_size(4), | ||
| 189 | + > dp_type, successor(3), BOTTOM, | ||
| 190 | + > comm_rhs, requests(4), error) | ||
| 191 | + call mpi_irecv(in_buffer(sr(5)), b_size(5), | ||
| 192 | + > dp_type, predecessor(3), TOP, | ||
| 193 | + > comm_rhs, requests(5), error) | ||
| 194 | + | ||
| 195 | + call mpi_isend(out_buffer(ss(0)), b_size(0), | ||
| 196 | + > dp_type, successor(1), EAST, | ||
| 197 | + > comm_rhs, requests(6), error) | ||
| 198 | + call mpi_isend(out_buffer(ss(1)), b_size(1), | ||
| 199 | + > dp_type, predecessor(1), WEST, | ||
| 200 | + > comm_rhs, requests(7), error) | ||
| 201 | + call mpi_isend(out_buffer(ss(2)), b_size(2), | ||
| 202 | + > dp_type,successor(2), NORTH, | ||
| 203 | + > comm_rhs, requests(8), error) | ||
| 204 | + call mpi_isend(out_buffer(ss(3)), b_size(3), | ||
| 205 | + > dp_type,predecessor(2), SOUTH, | ||
| 206 | + > comm_rhs, requests(9), error) | ||
| 207 | + call mpi_isend(out_buffer(ss(4)), b_size(4), | ||
| 208 | + > dp_type,successor(3), TOP, | ||
| 209 | + > comm_rhs, requests(10), error) | ||
| 210 | + call mpi_isend(out_buffer(ss(5)), b_size(5), | ||
| 211 | + > dp_type,predecessor(3), BOTTOM, | ||
| 212 | + > comm_rhs,requests(11), error) | ||
| 213 | + | ||
| 214 | + | ||
| 215 | + call mpi_waitall(12, requests, statuses, error) | ||
| 216 | + if (timeron) call timer_stop(t_exch) | ||
| 217 | + | ||
| 218 | +c--------------------------------------------------------------------- | ||
| 219 | +c unpack the data that has just been received; | ||
| 220 | +c--------------------------------------------------------------------- | ||
| 221 | + if (timeron) call timer_start(t_bpack) | ||
| 222 | + p0 = 0 | ||
| 223 | + p1 = 0 | ||
| 224 | + p2 = 0 | ||
| 225 | + p3 = 0 | ||
| 226 | + p4 = 0 | ||
| 227 | + p5 = 0 | ||
| 228 | + | ||
| 229 | + do c = 1, ncells | ||
| 230 | + | ||
| 231 | + if (cell_coord(1,c) .ne. 1) then | ||
| 232 | + do k = 0, cell_size(3,c)-1 | ||
| 233 | + do j = 0, cell_size(2,c)-1 | ||
| 234 | + do i = -2, -1 | ||
| 235 | + do m = 1, 5 | ||
| 236 | + u(m,i,j,k,c) = in_buffer(sr(1)+p0) | ||
| 237 | + p0 = p0 + 1 | ||
| 238 | + end do | ||
| 239 | + end do | ||
| 240 | + end do | ||
| 241 | + end do | ||
| 242 | + endif | ||
| 243 | + | ||
| 244 | + if (cell_coord(1,c) .ne. ncells) then | ||
| 245 | + do k = 0, cell_size(3,c)-1 | ||
| 246 | + do j = 0, cell_size(2,c)-1 | ||
| 247 | + do i = cell_size(1,c), cell_size(1,c)+1 | ||
| 248 | + do m = 1, 5 | ||
| 249 | + u(m,i,j,k,c) = in_buffer(sr(0)+p1) | ||
| 250 | + p1 = p1 + 1 | ||
| 251 | + end do | ||
| 252 | + end do | ||
| 253 | + end do | ||
| 254 | + end do | ||
| 255 | + end if | ||
| 256 | + | ||
| 257 | + if (cell_coord(2,c) .ne. 1) then | ||
| 258 | + do k = 0, cell_size(3,c)-1 | ||
| 259 | + do j = -2, -1 | ||
| 260 | + do i = 0, cell_size(1,c)-1 | ||
| 261 | + do m = 1, 5 | ||
| 262 | + u(m,i,j,k,c) = in_buffer(sr(3)+p2) | ||
| 263 | + p2 = p2 + 1 | ||
| 264 | + end do | ||
| 265 | + end do | ||
| 266 | + end do | ||
| 267 | + end do | ||
| 268 | + | ||
| 269 | + endif | ||
| 270 | + | ||
| 271 | + if (cell_coord(2,c) .ne. ncells) then | ||
| 272 | + do k = 0, cell_size(3,c)-1 | ||
| 273 | + do j = cell_size(2,c), cell_size(2,c)+1 | ||
| 274 | + do i = 0, cell_size(1,c)-1 | ||
| 275 | + do m = 1, 5 | ||
| 276 | + u(m,i,j,k,c) = in_buffer(sr(2)+p3) | ||
| 277 | + p3 = p3 + 1 | ||
| 278 | + end do | ||
| 279 | + end do | ||
| 280 | + end do | ||
| 281 | + end do | ||
| 282 | + endif | ||
| 283 | + | ||
| 284 | + if (cell_coord(3,c) .ne. 1) then | ||
| 285 | + do k = -2, -1 | ||
| 286 | + do j = 0, cell_size(2,c)-1 | ||
| 287 | + do i = 0, cell_size(1,c)-1 | ||
| 288 | + do m = 1, 5 | ||
| 289 | + u(m,i,j,k,c) = in_buffer(sr(5)+p4) | ||
| 290 | + p4 = p4 + 1 | ||
| 291 | + end do | ||
| 292 | + end do | ||
| 293 | + end do | ||
| 294 | + end do | ||
| 295 | + endif | ||
| 296 | + | ||
| 297 | + if (cell_coord(3,c) .ne. ncells) then | ||
| 298 | + do k = cell_size(3,c), cell_size(3,c)+1 | ||
| 299 | + do j = 0, cell_size(2,c)-1 | ||
| 300 | + do i = 0, cell_size(1,c)-1 | ||
| 301 | + do m = 1, 5 | ||
| 302 | + u(m,i,j,k,c) = in_buffer(sr(4)+p5) | ||
| 303 | + p5 = p5 + 1 | ||
| 304 | + end do | ||
| 305 | + end do | ||
| 306 | + end do | ||
| 307 | + end do | ||
| 308 | + endif | ||
| 309 | + | ||
| 310 | +c--------------------------------------------------------------------- | ||
| 311 | +c cells loop | ||
| 312 | +c--------------------------------------------------------------------- | ||
| 313 | + end do | ||
| 314 | + if (timeron) call timer_stop(t_bpack) | ||
| 315 | + | ||
| 316 | +c--------------------------------------------------------------------- | ||
| 317 | +c do the rest of the rhs that uses the copied face values | ||
| 318 | +c--------------------------------------------------------------------- | ||
| 319 | + call compute_rhs | ||
| 320 | + | ||
| 321 | + return | ||
| 322 | + end |
NPB3.3-MPI/BT/define.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/define.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine compute_buffer_size(dim) | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | + include 'header.h' | ||
| 10 | + | ||
| 11 | + integer c, dim, face_size | ||
| 12 | + | ||
| 13 | + if (ncells .eq. 1) return | ||
| 14 | + | ||
| 15 | +c--------------------------------------------------------------------- | ||
| 16 | +c compute the actual sizes of the buffers; note that there is | ||
| 17 | +c always one cell face that doesn't need buffer space, because it | ||
| 18 | +c is at the boundary of the grid | ||
| 19 | +c--------------------------------------------------------------------- | ||
| 20 | + west_size = 0 | ||
| 21 | + east_size = 0 | ||
| 22 | + | ||
| 23 | + do c = 1, ncells | ||
| 24 | + face_size = cell_size(2,c) * cell_size(3,c) * dim * 2 | ||
| 25 | + if (cell_coord(1,c).ne.1) west_size = west_size + face_size | ||
| 26 | + if (cell_coord(1,c).ne.ncells) east_size = east_size + | ||
| 27 | + > face_size | ||
| 28 | + end do | ||
| 29 | + | ||
| 30 | + north_size = 0 | ||
| 31 | + south_size = 0 | ||
| 32 | + do c = 1, ncells | ||
| 33 | + face_size = cell_size(1,c)*cell_size(3,c) * dim * 2 | ||
| 34 | + if (cell_coord(2,c).ne.1) south_size = south_size + face_size | ||
| 35 | + if (cell_coord(2,c).ne.ncells) north_size = north_size + | ||
| 36 | + > face_size | ||
| 37 | + end do | ||
| 38 | + | ||
| 39 | + top_size = 0 | ||
| 40 | + bottom_size = 0 | ||
| 41 | + do c = 1, ncells | ||
| 42 | + face_size = cell_size(1,c) * cell_size(2,c) * dim * 2 | ||
| 43 | + if (cell_coord(3,c).ne.1) bottom_size = bottom_size + | ||
| 44 | + > face_size | ||
| 45 | + if (cell_coord(3,c).ne.ncells) top_size = top_size + | ||
| 46 | + > face_size | ||
| 47 | + end do | ||
| 48 | + | ||
| 49 | + start_send_west = 1 | ||
| 50 | + start_send_east = start_send_west + west_size | ||
| 51 | + start_send_south = start_send_east + east_size | ||
| 52 | + start_send_north = start_send_south + south_size | ||
| 53 | + start_send_bottom = start_send_north + north_size | ||
| 54 | + start_send_top = start_send_bottom + bottom_size | ||
| 55 | + start_recv_west = 1 | ||
| 56 | + start_recv_east = start_recv_west + west_size | ||
| 57 | + start_recv_south = start_recv_east + east_size | ||
| 58 | + start_recv_north = start_recv_south + south_size | ||
| 59 | + start_recv_bottom = start_recv_north + north_size | ||
| 60 | + start_recv_top = start_recv_bottom + bottom_size | ||
| 61 | + | ||
| 62 | + return | ||
| 63 | + end | ||
| 64 | + |
NPB3.3-MPI/BT/epio.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/epio.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine setup_btio | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | + include 'header.h' | ||
| 11 | + include 'mpinpb.h' | ||
| 12 | + | ||
| 13 | + character*(128) newfilenm | ||
| 14 | + integer m | ||
| 15 | + | ||
| 16 | + if (node .lt. 10000) then | ||
| 17 | + write (newfilenm, 996) filenm,node | ||
| 18 | + else | ||
| 19 | + print *, 'error generating file names (> 10000 nodes)' | ||
| 20 | + stop | ||
| 21 | + endif | ||
| 22 | + | ||
| 23 | +996 format (a,'.',i4.4) | ||
| 24 | + | ||
| 25 | + open (unit=99, file=newfilenm, form='unformatted', | ||
| 26 | + $ status='unknown') | ||
| 27 | + | ||
| 28 | + do m = 1, 5 | ||
| 29 | + xce_sub(m) = 0.d0 | ||
| 30 | + end do | ||
| 31 | + | ||
| 32 | + idump_sub = 0 | ||
| 33 | + | ||
| 34 | + return | ||
| 35 | + end | ||
| 36 | + | ||
| 37 | +c--------------------------------------------------------------------- | ||
| 38 | +c--------------------------------------------------------------------- | ||
| 39 | + | ||
| 40 | + subroutine output_timestep | ||
| 41 | + | ||
| 42 | +c--------------------------------------------------------------------- | ||
| 43 | +c--------------------------------------------------------------------- | ||
| 44 | + include 'header.h' | ||
| 45 | + include 'mpinpb.h' | ||
| 46 | + | ||
| 47 | + integer ix, iio, jio, kio, cio, aio | ||
| 48 | + | ||
| 49 | + do cio=1,ncells | ||
| 50 | + write(99) | ||
| 51 | + $ ((((u(aio,ix, jio,kio,cio),aio=1,5), | ||
| 52 | + $ ix=0, cell_size(1,cio)-1), | ||
| 53 | + $ jio=0, cell_size(2,cio)-1), | ||
| 54 | + $ kio=0, cell_size(3,cio)-1) | ||
| 55 | + enddo | ||
| 56 | + | ||
| 57 | + idump_sub = idump_sub + 1 | ||
| 58 | + if (rd_interval .gt. 0) then | ||
| 59 | + if (idump_sub .ge. rd_interval) then | ||
| 60 | + | ||
| 61 | + rewind(99) | ||
| 62 | + call acc_sub_norms(idump+1) | ||
| 63 | + | ||
| 64 | + rewind(99) | ||
| 65 | + idump_sub = 0 | ||
| 66 | + endif | ||
| 67 | + endif | ||
| 68 | + | ||
| 69 | + return | ||
| 70 | + end | ||
| 71 | + | ||
| 72 | +c--------------------------------------------------------------------- | ||
| 73 | +c--------------------------------------------------------------------- | ||
| 74 | + | ||
| 75 | + subroutine acc_sub_norms(idump_cur) | ||
| 76 | + | ||
| 77 | + include 'header.h' | ||
| 78 | + include 'mpinpb.h' | ||
| 79 | + | ||
| 80 | + integer idump_cur | ||
| 81 | + | ||
| 82 | + integer ix, jio, kio, cio, ii, m, ichunk | ||
| 83 | + double precision xce_single(5) | ||
| 84 | + | ||
| 85 | + ichunk = idump_cur - idump_sub + 1 | ||
| 86 | + do ii=0, idump_sub-1 | ||
| 87 | + do cio=1,ncells | ||
| 88 | + read(99) | ||
| 89 | + $ ((((u(m,ix, jio,kio,cio),m=1,5), | ||
| 90 | + $ ix=0, cell_size(1,cio)-1), | ||
| 91 | + $ jio=0, cell_size(2,cio)-1), | ||
| 92 | + $ kio=0, cell_size(3,cio)-1) | ||
| 93 | + enddo | ||
| 94 | + | ||
| 95 | + if (node .eq. root) print *, 'Reading data set ', ii+ichunk | ||
| 96 | + | ||
| 97 | + call error_norm(xce_single) | ||
| 98 | + do m = 1, 5 | ||
| 99 | + xce_sub(m) = xce_sub(m) + xce_single(m) | ||
| 100 | + end do | ||
| 101 | + enddo | ||
| 102 | + | ||
| 103 | + return | ||
| 104 | + end | ||
| 105 | + | ||
| 106 | +c--------------------------------------------------------------------- | ||
| 107 | +c--------------------------------------------------------------------- | ||
| 108 | + | ||
| 109 | + subroutine btio_cleanup | ||
| 110 | + | ||
| 111 | +c--------------------------------------------------------------------- | ||
| 112 | +c--------------------------------------------------------------------- | ||
| 113 | + | ||
| 114 | + close(unit=99) | ||
| 115 | + | ||
| 116 | + return | ||
| 117 | + end | ||
| 118 | + | ||
| 119 | +c--------------------------------------------------------------------- | ||
| 120 | +c--------------------------------------------------------------------- | ||
| 121 | + | ||
| 122 | + subroutine accumulate_norms(xce_acc) | ||
| 123 | + | ||
| 124 | +c--------------------------------------------------------------------- | ||
| 125 | +c--------------------------------------------------------------------- | ||
| 126 | + | ||
| 127 | + include 'header.h' | ||
| 128 | + include 'mpinpb.h' | ||
| 129 | + | ||
| 130 | + double precision xce_acc(5) | ||
| 131 | + | ||
| 132 | + character*(128) newfilenm | ||
| 133 | + integer m | ||
| 134 | + | ||
| 135 | + if (rd_interval .gt. 0) goto 20 | ||
| 136 | + | ||
| 137 | + if (node .lt. 10000) then | ||
| 138 | + write (newfilenm, 996) filenm,node | ||
| 139 | + else | ||
| 140 | + print *, 'error generating file names (> 10000 nodes)' | ||
| 141 | + stop | ||
| 142 | + endif | ||
| 143 | + | ||
| 144 | +996 format (a,'.',i4.4) | ||
| 145 | + | ||
| 146 | + open (unit=99, file=newfilenm, | ||
| 147 | + $ form='unformatted') | ||
| 148 | + | ||
| 149 | +c clear the last time step | ||
| 150 | + | ||
| 151 | + call clear_timestep | ||
| 152 | + | ||
| 153 | +c read back the time steps and accumulate norms | ||
| 154 | + | ||
| 155 | + call acc_sub_norms(idump) | ||
| 156 | + | ||
| 157 | + close(unit=99) | ||
| 158 | + | ||
| 159 | + 20 continue | ||
| 160 | + do m = 1, 5 | ||
| 161 | + xce_acc(m) = xce_sub(m) / dble(idump) | ||
| 162 | + end do | ||
| 163 | + | ||
| 164 | + return | ||
| 165 | + end |
NPB3.3-MPI/BT/error.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/error.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine error_norm(rms) | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | +c--------------------------------------------------------------------- | ||
| 10 | +c this function computes the norm of the difference between the | ||
| 11 | +c computed solution and the exact solution | ||
| 12 | +c--------------------------------------------------------------------- | ||
| 13 | + | ||
| 14 | + include 'header.h' | ||
| 15 | + include 'mpinpb.h' | ||
| 16 | + | ||
| 17 | + integer c, i, j, k, m, ii, jj, kk, d, error | ||
| 18 | + double precision xi, eta, zeta, u_exact(5), rms(5), rms_work(5), | ||
| 19 | + > add | ||
| 20 | + | ||
| 21 | + do m = 1, 5 | ||
| 22 | + rms_work(m) = 0.0d0 | ||
| 23 | + enddo | ||
| 24 | + | ||
| 25 | + do c = 1, ncells | ||
| 26 | + kk = 0 | ||
| 27 | + do k = cell_low(3,c), cell_high(3,c) | ||
| 28 | + zeta = dble(k) * dnzm1 | ||
| 29 | + jj = 0 | ||
| 30 | + do j = cell_low(2,c), cell_high(2,c) | ||
| 31 | + eta = dble(j) * dnym1 | ||
| 32 | + ii = 0 | ||
| 33 | + do i = cell_low(1,c), cell_high(1,c) | ||
| 34 | + xi = dble(i) * dnxm1 | ||
| 35 | + call exact_solution(xi, eta, zeta, u_exact) | ||
| 36 | + | ||
| 37 | + do m = 1, 5 | ||
| 38 | + add = u(m,ii,jj,kk,c)-u_exact(m) | ||
| 39 | + rms_work(m) = rms_work(m) + add*add | ||
| 40 | + enddo | ||
| 41 | + ii = ii + 1 | ||
| 42 | + enddo | ||
| 43 | + jj = jj + 1 | ||
| 44 | + enddo | ||
| 45 | + kk = kk + 1 | ||
| 46 | + enddo | ||
| 47 | + enddo | ||
| 48 | + | ||
| 49 | + call mpi_allreduce(rms_work, rms, 5, dp_type, | ||
| 50 | + > MPI_SUM, comm_setup, error) | ||
| 51 | + | ||
| 52 | + do m = 1, 5 | ||
| 53 | + do d = 1, 3 | ||
| 54 | + rms(m) = rms(m) / dble(grid_points(d)-2) | ||
| 55 | + enddo | ||
| 56 | + rms(m) = dsqrt(rms(m)) | ||
| 57 | + enddo | ||
| 58 | + | ||
| 59 | + return | ||
| 60 | + end | ||
| 61 | + | ||
| 62 | + | ||
| 63 | +c--------------------------------------------------------------------- | ||
| 64 | +c--------------------------------------------------------------------- | ||
| 65 | + | ||
| 66 | + subroutine rhs_norm(rms) | ||
| 67 | + | ||
| 68 | +c--------------------------------------------------------------------- | ||
| 69 | +c--------------------------------------------------------------------- | ||
| 70 | + | ||
| 71 | + include 'header.h' | ||
| 72 | + include 'mpinpb.h' | ||
| 73 | + | ||
| 74 | + integer c, i, j, k, d, m, error | ||
| 75 | + double precision rms(5), rms_work(5), add | ||
| 76 | + | ||
| 77 | + do m = 1, 5 | ||
| 78 | + rms_work(m) = 0.0d0 | ||
| 79 | + enddo | ||
| 80 | + | ||
| 81 | + do c = 1, ncells | ||
| 82 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 83 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 84 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 85 | + do m = 1, 5 | ||
| 86 | + add = rhs(m,i,j,k,c) | ||
| 87 | + rms_work(m) = rms_work(m) + add*add | ||
| 88 | + enddo | ||
| 89 | + enddo | ||
| 90 | + enddo | ||
| 91 | + enddo | ||
| 92 | + enddo | ||
| 93 | + | ||
| 94 | + call mpi_allreduce(rms_work, rms, 5, dp_type, | ||
| 95 | + > MPI_SUM, comm_setup, error) | ||
| 96 | + | ||
| 97 | + do m = 1, 5 | ||
| 98 | + do d = 1, 3 | ||
| 99 | + rms(m) = rms(m) / dble(grid_points(d)-2) | ||
| 100 | + enddo | ||
| 101 | + rms(m) = dsqrt(rms(m)) | ||
| 102 | + enddo | ||
| 103 | + | ||
| 104 | + return | ||
| 105 | + end | ||
| 106 | + |
NPB3.3-MPI/BT/exact_rhs.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/exact_rhs.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine exact_rhs | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | +c--------------------------------------------------------------------- | ||
| 11 | +c compute the right hand side based on exact solution | ||
| 12 | +c--------------------------------------------------------------------- | ||
| 13 | + | ||
| 14 | + include 'header.h' | ||
| 15 | + | ||
| 16 | + double precision dtemp(5), xi, eta, zeta, dtpp | ||
| 17 | + integer c, m, i, j, k, ip1, im1, jp1, | ||
| 18 | + > jm1, km1, kp1 | ||
| 19 | + | ||
| 20 | + | ||
| 21 | +c--------------------------------------------------------------------- | ||
| 22 | +c loop over all cells owned by this node | ||
| 23 | +c--------------------------------------------------------------------- | ||
| 24 | + do c = 1, ncells | ||
| 25 | + | ||
| 26 | +c--------------------------------------------------------------------- | ||
| 27 | +c initialize | ||
| 28 | +c--------------------------------------------------------------------- | ||
| 29 | + do k= 0, cell_size(3,c)-1 | ||
| 30 | + do j = 0, cell_size(2,c)-1 | ||
| 31 | + do i = 0, cell_size(1,c)-1 | ||
| 32 | + do m = 1, 5 | ||
| 33 | + forcing(m,i,j,k,c) = 0.0d0 | ||
| 34 | + enddo | ||
| 35 | + enddo | ||
| 36 | + enddo | ||
| 37 | + enddo | ||
| 38 | + | ||
| 39 | +c--------------------------------------------------------------------- | ||
| 40 | +c xi-direction flux differences | ||
| 41 | +c--------------------------------------------------------------------- | ||
| 42 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 43 | + zeta = dble(k+cell_low(3,c)) * dnzm1 | ||
| 44 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 45 | + eta = dble(j+cell_low(2,c)) * dnym1 | ||
| 46 | + | ||
| 47 | + do i=-2*(1-start(1,c)), cell_size(1,c)+1-2*end(1,c) | ||
| 48 | + xi = dble(i+cell_low(1,c)) * dnxm1 | ||
| 49 | + | ||
| 50 | + call exact_solution(xi, eta, zeta, dtemp) | ||
| 51 | + do m = 1, 5 | ||
| 52 | + ue(i,m) = dtemp(m) | ||
| 53 | + enddo | ||
| 54 | + | ||
| 55 | + dtpp = 1.0d0 / dtemp(1) | ||
| 56 | + | ||
| 57 | + do m = 2, 5 | ||
| 58 | + buf(i,m) = dtpp * dtemp(m) | ||
| 59 | + enddo | ||
| 60 | + | ||
| 61 | + cuf(i) = buf(i,2) * buf(i,2) | ||
| 62 | + buf(i,1) = cuf(i) + buf(i,3) * buf(i,3) + | ||
| 63 | + > buf(i,4) * buf(i,4) | ||
| 64 | + q(i) = 0.5d0*(buf(i,2)*ue(i,2) + buf(i,3)*ue(i,3) + | ||
| 65 | + > buf(i,4)*ue(i,4)) | ||
| 66 | + | ||
| 67 | + enddo | ||
| 68 | + | ||
| 69 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 70 | + im1 = i-1 | ||
| 71 | + ip1 = i+1 | ||
| 72 | + | ||
| 73 | + forcing(1,i,j,k,c) = forcing(1,i,j,k,c) - | ||
| 74 | + > tx2*( ue(ip1,2)-ue(im1,2) )+ | ||
| 75 | + > dx1tx1*(ue(ip1,1)-2.0d0*ue(i,1)+ue(im1,1)) | ||
| 76 | + | ||
| 77 | + forcing(2,i,j,k,c) = forcing(2,i,j,k,c) - tx2 * ( | ||
| 78 | + > (ue(ip1,2)*buf(ip1,2)+c2*(ue(ip1,5)-q(ip1)))- | ||
| 79 | + > (ue(im1,2)*buf(im1,2)+c2*(ue(im1,5)-q(im1))))+ | ||
| 80 | + > xxcon1*(buf(ip1,2)-2.0d0*buf(i,2)+buf(im1,2))+ | ||
| 81 | + > dx2tx1*( ue(ip1,2)-2.0d0* ue(i,2)+ue(im1,2)) | ||
| 82 | + | ||
| 83 | + forcing(3,i,j,k,c) = forcing(3,i,j,k,c) - tx2 * ( | ||
| 84 | + > ue(ip1,3)*buf(ip1,2)-ue(im1,3)*buf(im1,2))+ | ||
| 85 | + > xxcon2*(buf(ip1,3)-2.0d0*buf(i,3)+buf(im1,3))+ | ||
| 86 | + > dx3tx1*( ue(ip1,3)-2.0d0*ue(i,3) +ue(im1,3)) | ||
| 87 | + | ||
| 88 | + forcing(4,i,j,k,c) = forcing(4,i,j,k,c) - tx2*( | ||
| 89 | + > ue(ip1,4)*buf(ip1,2)-ue(im1,4)*buf(im1,2))+ | ||
| 90 | + > xxcon2*(buf(ip1,4)-2.0d0*buf(i,4)+buf(im1,4))+ | ||
| 91 | + > dx4tx1*( ue(ip1,4)-2.0d0* ue(i,4)+ ue(im1,4)) | ||
| 92 | + | ||
| 93 | + forcing(5,i,j,k,c) = forcing(5,i,j,k,c) - tx2*( | ||
| 94 | + > buf(ip1,2)*(c1*ue(ip1,5)-c2*q(ip1))- | ||
| 95 | + > buf(im1,2)*(c1*ue(im1,5)-c2*q(im1)))+ | ||
| 96 | + > 0.5d0*xxcon3*(buf(ip1,1)-2.0d0*buf(i,1)+ | ||
| 97 | + > buf(im1,1))+ | ||
| 98 | + > xxcon4*(cuf(ip1)-2.0d0*cuf(i)+cuf(im1))+ | ||
| 99 | + > xxcon5*(buf(ip1,5)-2.0d0*buf(i,5)+buf(im1,5))+ | ||
| 100 | + > dx5tx1*( ue(ip1,5)-2.0d0* ue(i,5)+ ue(im1,5)) | ||
| 101 | + enddo | ||
| 102 | + | ||
| 103 | +c--------------------------------------------------------------------- | ||
| 104 | +c Fourth-order dissipation | ||
| 105 | +c--------------------------------------------------------------------- | ||
| 106 | + if (start(1,c) .gt. 0) then | ||
| 107 | + do m = 1, 5 | ||
| 108 | + i = 1 | ||
| 109 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 110 | + > (5.0d0*ue(i,m) - 4.0d0*ue(i+1,m) +ue(i+2,m)) | ||
| 111 | + i = 2 | ||
| 112 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 113 | + > (-4.0d0*ue(i-1,m) + 6.0d0*ue(i,m) - | ||
| 114 | + > 4.0d0*ue(i+1,m) + ue(i+2,m)) | ||
| 115 | + enddo | ||
| 116 | + endif | ||
| 117 | + | ||
| 118 | + do i = start(1,c)*3, cell_size(1,c)-3*end(1,c)-1 | ||
| 119 | + do m = 1, 5 | ||
| 120 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp* | ||
| 121 | + > (ue(i-2,m) - 4.0d0*ue(i-1,m) + | ||
| 122 | + > 6.0d0*ue(i,m) - 4.0d0*ue(i+1,m) + ue(i+2,m)) | ||
| 123 | + enddo | ||
| 124 | + enddo | ||
| 125 | + | ||
| 126 | + if (end(1,c) .gt. 0) then | ||
| 127 | + do m = 1, 5 | ||
| 128 | + i = cell_size(1,c)-3 | ||
| 129 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 130 | + > (ue(i-2,m) - 4.0d0*ue(i-1,m) + | ||
| 131 | + > 6.0d0*ue(i,m) - 4.0d0*ue(i+1,m)) | ||
| 132 | + i = cell_size(1,c)-2 | ||
| 133 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 134 | + > (ue(i-2,m) - 4.0d0*ue(i-1,m) + 5.0d0*ue(i,m)) | ||
| 135 | + enddo | ||
| 136 | + endif | ||
| 137 | + | ||
| 138 | + enddo | ||
| 139 | + enddo | ||
| 140 | + | ||
| 141 | +c--------------------------------------------------------------------- | ||
| 142 | +c eta-direction flux differences | ||
| 143 | +c--------------------------------------------------------------------- | ||
| 144 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 145 | + zeta = dble(k+cell_low(3,c)) * dnzm1 | ||
| 146 | + do i=start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 147 | + xi = dble(i+cell_low(1,c)) * dnxm1 | ||
| 148 | + | ||
| 149 | + do j=-2*(1-start(2,c)), cell_size(2,c)+1-2*end(2,c) | ||
| 150 | + eta = dble(j+cell_low(2,c)) * dnym1 | ||
| 151 | + | ||
| 152 | + call exact_solution(xi, eta, zeta, dtemp) | ||
| 153 | + do m = 1, 5 | ||
| 154 | + ue(j,m) = dtemp(m) | ||
| 155 | + enddo | ||
| 156 | + | ||
| 157 | + dtpp = 1.0d0/dtemp(1) | ||
| 158 | + | ||
| 159 | + do m = 2, 5 | ||
| 160 | + buf(j,m) = dtpp * dtemp(m) | ||
| 161 | + enddo | ||
| 162 | + | ||
| 163 | + cuf(j) = buf(j,3) * buf(j,3) | ||
| 164 | + buf(j,1) = cuf(j) + buf(j,2) * buf(j,2) + | ||
| 165 | + > buf(j,4) * buf(j,4) | ||
| 166 | + q(j) = 0.5d0*(buf(j,2)*ue(j,2) + buf(j,3)*ue(j,3) + | ||
| 167 | + > buf(j,4)*ue(j,4)) | ||
| 168 | + enddo | ||
| 169 | + | ||
| 170 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 171 | + jm1 = j-1 | ||
| 172 | + jp1 = j+1 | ||
| 173 | + | ||
| 174 | + forcing(1,i,j,k,c) = forcing(1,i,j,k,c) - | ||
| 175 | + > ty2*( ue(jp1,3)-ue(jm1,3) )+ | ||
| 176 | + > dy1ty1*(ue(jp1,1)-2.0d0*ue(j,1)+ue(jm1,1)) | ||
| 177 | + | ||
| 178 | + forcing(2,i,j,k,c) = forcing(2,i,j,k,c) - ty2*( | ||
| 179 | + > ue(jp1,2)*buf(jp1,3)-ue(jm1,2)*buf(jm1,3))+ | ||
| 180 | + > yycon2*(buf(jp1,2)-2.0d0*buf(j,2)+buf(jm1,2))+ | ||
| 181 | + > dy2ty1*( ue(jp1,2)-2.0* ue(j,2)+ ue(jm1,2)) | ||
| 182 | + | ||
| 183 | + forcing(3,i,j,k,c) = forcing(3,i,j,k,c) - ty2*( | ||
| 184 | + > (ue(jp1,3)*buf(jp1,3)+c2*(ue(jp1,5)-q(jp1)))- | ||
| 185 | + > (ue(jm1,3)*buf(jm1,3)+c2*(ue(jm1,5)-q(jm1))))+ | ||
| 186 | + > yycon1*(buf(jp1,3)-2.0d0*buf(j,3)+buf(jm1,3))+ | ||
| 187 | + > dy3ty1*( ue(jp1,3)-2.0d0*ue(j,3) +ue(jm1,3)) | ||
| 188 | + | ||
| 189 | + forcing(4,i,j,k,c) = forcing(4,i,j,k,c) - ty2*( | ||
| 190 | + > ue(jp1,4)*buf(jp1,3)-ue(jm1,4)*buf(jm1,3))+ | ||
| 191 | + > yycon2*(buf(jp1,4)-2.0d0*buf(j,4)+buf(jm1,4))+ | ||
| 192 | + > dy4ty1*( ue(jp1,4)-2.0d0*ue(j,4)+ ue(jm1,4)) | ||
| 193 | + | ||
| 194 | + forcing(5,i,j,k,c) = forcing(5,i,j,k,c) - ty2*( | ||
| 195 | + > buf(jp1,3)*(c1*ue(jp1,5)-c2*q(jp1))- | ||
| 196 | + > buf(jm1,3)*(c1*ue(jm1,5)-c2*q(jm1)))+ | ||
| 197 | + > 0.5d0*yycon3*(buf(jp1,1)-2.0d0*buf(j,1)+ | ||
| 198 | + > buf(jm1,1))+ | ||
| 199 | + > yycon4*(cuf(jp1)-2.0d0*cuf(j)+cuf(jm1))+ | ||
| 200 | + > yycon5*(buf(jp1,5)-2.0d0*buf(j,5)+buf(jm1,5))+ | ||
| 201 | + > dy5ty1*(ue(jp1,5)-2.0d0*ue(j,5)+ue(jm1,5)) | ||
| 202 | + enddo | ||
| 203 | + | ||
| 204 | +c--------------------------------------------------------------------- | ||
| 205 | +c Fourth-order dissipation | ||
| 206 | +c--------------------------------------------------------------------- | ||
| 207 | + if (start(2,c) .gt. 0) then | ||
| 208 | + do m = 1, 5 | ||
| 209 | + j = 1 | ||
| 210 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 211 | + > (5.0d0*ue(j,m) - 4.0d0*ue(j+1,m) +ue(j+2,m)) | ||
| 212 | + j = 2 | ||
| 213 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 214 | + > (-4.0d0*ue(j-1,m) + 6.0d0*ue(j,m) - | ||
| 215 | + > 4.0d0*ue(j+1,m) + ue(j+2,m)) | ||
| 216 | + enddo | ||
| 217 | + endif | ||
| 218 | + | ||
| 219 | + do j = start(2,c)*3, cell_size(2,c)-3*end(2,c)-1 | ||
| 220 | + do m = 1, 5 | ||
| 221 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp* | ||
| 222 | + > (ue(j-2,m) - 4.0d0*ue(j-1,m) + | ||
| 223 | + > 6.0d0*ue(j,m) - 4.0d0*ue(j+1,m) + ue(j+2,m)) | ||
| 224 | + enddo | ||
| 225 | + enddo | ||
| 226 | + | ||
| 227 | + if (end(2,c) .gt. 0) then | ||
| 228 | + do m = 1, 5 | ||
| 229 | + j = cell_size(2,c)-3 | ||
| 230 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 231 | + > (ue(j-2,m) - 4.0d0*ue(j-1,m) + | ||
| 232 | + > 6.0d0*ue(j,m) - 4.0d0*ue(j+1,m)) | ||
| 233 | + j = cell_size(2,c)-2 | ||
| 234 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 235 | + > (ue(j-2,m) - 4.0d0*ue(j-1,m) + 5.0d0*ue(j,m)) | ||
| 236 | + | ||
| 237 | + enddo | ||
| 238 | + endif | ||
| 239 | + | ||
| 240 | + enddo | ||
| 241 | + enddo | ||
| 242 | + | ||
| 243 | +c--------------------------------------------------------------------- | ||
| 244 | +c zeta-direction flux differences | ||
| 245 | +c--------------------------------------------------------------------- | ||
| 246 | + do j=start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 247 | + eta = dble(j+cell_low(2,c)) * dnym1 | ||
| 248 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 249 | + xi = dble(i+cell_low(1,c)) * dnxm1 | ||
| 250 | + | ||
| 251 | + do k=-2*(1-start(3,c)), cell_size(3,c)+1-2*end(3,c) | ||
| 252 | + zeta = dble(k+cell_low(3,c)) * dnzm1 | ||
| 253 | + | ||
| 254 | + call exact_solution(xi, eta, zeta, dtemp) | ||
| 255 | + do m = 1, 5 | ||
| 256 | + ue(k,m) = dtemp(m) | ||
| 257 | + enddo | ||
| 258 | + | ||
| 259 | + dtpp = 1.0d0/dtemp(1) | ||
| 260 | + | ||
| 261 | + do m = 2, 5 | ||
| 262 | + buf(k,m) = dtpp * dtemp(m) | ||
| 263 | + enddo | ||
| 264 | + | ||
| 265 | + cuf(k) = buf(k,4) * buf(k,4) | ||
| 266 | + buf(k,1) = cuf(k) + buf(k,2) * buf(k,2) + | ||
| 267 | + > buf(k,3) * buf(k,3) | ||
| 268 | + q(k) = 0.5d0*(buf(k,2)*ue(k,2) + buf(k,3)*ue(k,3) + | ||
| 269 | + > buf(k,4)*ue(k,4)) | ||
| 270 | + enddo | ||
| 271 | + | ||
| 272 | + do k=start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 273 | + km1 = k-1 | ||
| 274 | + kp1 = k+1 | ||
| 275 | + | ||
| 276 | + forcing(1,i,j,k,c) = forcing(1,i,j,k,c) - | ||
| 277 | + > tz2*( ue(kp1,4)-ue(km1,4) )+ | ||
| 278 | + > dz1tz1*(ue(kp1,1)-2.0d0*ue(k,1)+ue(km1,1)) | ||
| 279 | + | ||
| 280 | + forcing(2,i,j,k,c) = forcing(2,i,j,k,c) - tz2 * ( | ||
| 281 | + > ue(kp1,2)*buf(kp1,4)-ue(km1,2)*buf(km1,4))+ | ||
| 282 | + > zzcon2*(buf(kp1,2)-2.0d0*buf(k,2)+buf(km1,2))+ | ||
| 283 | + > dz2tz1*( ue(kp1,2)-2.0d0* ue(k,2)+ ue(km1,2)) | ||
| 284 | + | ||
| 285 | + forcing(3,i,j,k,c) = forcing(3,i,j,k,c) - tz2 * ( | ||
| 286 | + > ue(kp1,3)*buf(kp1,4)-ue(km1,3)*buf(km1,4))+ | ||
| 287 | + > zzcon2*(buf(kp1,3)-2.0d0*buf(k,3)+buf(km1,3))+ | ||
| 288 | + > dz3tz1*(ue(kp1,3)-2.0d0*ue(k,3)+ue(km1,3)) | ||
| 289 | + | ||
| 290 | + forcing(4,i,j,k,c) = forcing(4,i,j,k,c) - tz2 * ( | ||
| 291 | + > (ue(kp1,4)*buf(kp1,4)+c2*(ue(kp1,5)-q(kp1)))- | ||
| 292 | + > (ue(km1,4)*buf(km1,4)+c2*(ue(km1,5)-q(km1))))+ | ||
| 293 | + > zzcon1*(buf(kp1,4)-2.0d0*buf(k,4)+buf(km1,4))+ | ||
| 294 | + > dz4tz1*( ue(kp1,4)-2.0d0*ue(k,4) +ue(km1,4)) | ||
| 295 | + | ||
| 296 | + forcing(5,i,j,k,c) = forcing(5,i,j,k,c) - tz2 * ( | ||
| 297 | + > buf(kp1,4)*(c1*ue(kp1,5)-c2*q(kp1))- | ||
| 298 | + > buf(km1,4)*(c1*ue(km1,5)-c2*q(km1)))+ | ||
| 299 | + > 0.5d0*zzcon3*(buf(kp1,1)-2.0d0*buf(k,1) | ||
| 300 | + > +buf(km1,1))+ | ||
| 301 | + > zzcon4*(cuf(kp1)-2.0d0*cuf(k)+cuf(km1))+ | ||
| 302 | + > zzcon5*(buf(kp1,5)-2.0d0*buf(k,5)+buf(km1,5))+ | ||
| 303 | + > dz5tz1*( ue(kp1,5)-2.0d0*ue(k,5)+ ue(km1,5)) | ||
| 304 | + enddo | ||
| 305 | + | ||
| 306 | +c--------------------------------------------------------------------- | ||
| 307 | +c Fourth-order dissipation | ||
| 308 | +c--------------------------------------------------------------------- | ||
| 309 | + if (start(3,c) .gt. 0) then | ||
| 310 | + do m = 1, 5 | ||
| 311 | + k = 1 | ||
| 312 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 313 | + > (5.0d0*ue(k,m) - 4.0d0*ue(k+1,m) +ue(k+2,m)) | ||
| 314 | + k = 2 | ||
| 315 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 316 | + > (-4.0d0*ue(k-1,m) + 6.0d0*ue(k,m) - | ||
| 317 | + > 4.0d0*ue(k+1,m) + ue(k+2,m)) | ||
| 318 | + enddo | ||
| 319 | + endif | ||
| 320 | + | ||
| 321 | + do k = start(3,c)*3, cell_size(3,c)-3*end(3,c)-1 | ||
| 322 | + do m = 1, 5 | ||
| 323 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp* | ||
| 324 | + > (ue(k-2,m) - 4.0d0*ue(k-1,m) + | ||
| 325 | + > 6.0d0*ue(k,m) - 4.0d0*ue(k+1,m) + ue(k+2,m)) | ||
| 326 | + enddo | ||
| 327 | + enddo | ||
| 328 | + | ||
| 329 | + if (end(3,c) .gt. 0) then | ||
| 330 | + do m = 1, 5 | ||
| 331 | + k = cell_size(3,c)-3 | ||
| 332 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 333 | + > (ue(k-2,m) - 4.0d0*ue(k-1,m) + | ||
| 334 | + > 6.0d0*ue(k,m) - 4.0d0*ue(k+1,m)) | ||
| 335 | + k = cell_size(3,c)-2 | ||
| 336 | + forcing(m,i,j,k,c) = forcing(m,i,j,k,c) - dssp * | ||
| 337 | + > (ue(k-2,m) - 4.0d0*ue(k-1,m) + 5.0d0*ue(k,m)) | ||
| 338 | + enddo | ||
| 339 | + endif | ||
| 340 | + | ||
| 341 | + enddo | ||
| 342 | + enddo | ||
| 343 | + | ||
| 344 | +c--------------------------------------------------------------------- | ||
| 345 | +c now change the sign of the forcing function, | ||
| 346 | +c--------------------------------------------------------------------- | ||
| 347 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 348 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 349 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 350 | + do m = 1, 5 | ||
| 351 | + forcing(m,i,j,k,c) = -1.d0 * forcing(m,i,j,k,c) | ||
| 352 | + enddo | ||
| 353 | + enddo | ||
| 354 | + enddo | ||
| 355 | + enddo | ||
| 356 | + | ||
| 357 | + enddo | ||
| 358 | + | ||
| 359 | + return | ||
| 360 | + end |
NPB3.3-MPI/BT/exact_solution.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/exact_solution.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine exact_solution(xi,eta,zeta,dtemp) | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | +c--------------------------------------------------------------------- | ||
| 10 | +c this function returns the exact solution at point xi, eta, zeta | ||
| 11 | +c--------------------------------------------------------------------- | ||
| 12 | + | ||
| 13 | + include 'header.h' | ||
| 14 | + | ||
| 15 | + double precision xi, eta, zeta, dtemp(5) | ||
| 16 | + integer m | ||
| 17 | + | ||
| 18 | + do m = 1, 5 | ||
| 19 | + dtemp(m) = ce(m,1) + | ||
| 20 | + > xi*(ce(m,2) + xi*(ce(m,5) + xi*(ce(m,8) + xi*ce(m,11)))) + | ||
| 21 | + > eta*(ce(m,3) + eta*(ce(m,6) + eta*(ce(m,9) + eta*ce(m,12))))+ | ||
| 22 | + > zeta*(ce(m,4) + zeta*(ce(m,7) + zeta*(ce(m,10) + | ||
| 23 | + > zeta*ce(m,13)))) | ||
| 24 | + enddo | ||
| 25 | + | ||
| 26 | + return | ||
| 27 | + end | ||
| 28 | + | ||
| 29 | + |
NPB3.3-MPI/BT/fortran_io.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/fortran_io.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine setup_btio | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | + include 'header.h' | ||
| 11 | + include 'mpinpb.h' | ||
| 12 | + | ||
| 13 | + character*(128) newfilenm | ||
| 14 | + integer m, ierr | ||
| 15 | + | ||
| 16 | + if (node.eq.root) record_length = 40/fortran_rec_sz | ||
| 17 | + call mpi_bcast(record_length, 1, MPI_INTEGER, | ||
| 18 | + > root, comm_setup, ierr) | ||
| 19 | + | ||
| 20 | + open (unit=99, file=filenm, | ||
| 21 | + $ form='unformatted', access='direct', | ||
| 22 | + $ recl=record_length) | ||
| 23 | + | ||
| 24 | + do m = 1, 5 | ||
| 25 | + xce_sub(m) = 0.d0 | ||
| 26 | + end do | ||
| 27 | + | ||
| 28 | + idump_sub = 0 | ||
| 29 | + | ||
| 30 | + return | ||
| 31 | + end | ||
| 32 | + | ||
| 33 | + | ||
| 34 | +c--------------------------------------------------------------------- | ||
| 35 | +c--------------------------------------------------------------------- | ||
| 36 | + | ||
| 37 | + subroutine output_timestep | ||
| 38 | + | ||
| 39 | +c--------------------------------------------------------------------- | ||
| 40 | +c--------------------------------------------------------------------- | ||
| 41 | + include 'header.h' | ||
| 42 | + include 'mpinpb.h' | ||
| 43 | + | ||
| 44 | + integer ix, jio, kio, cio | ||
| 45 | + | ||
| 46 | + do cio=1,ncells | ||
| 47 | + do kio=0, cell_size(3,cio)-1 | ||
| 48 | + do jio=0, cell_size(2,cio)-1 | ||
| 49 | + iseek=(cell_low(1,cio) + | ||
| 50 | + $ PROBLEM_SIZE*((cell_low(2,cio)+jio) + | ||
| 51 | + $ PROBLEM_SIZE*((cell_low(3,cio)+kio) + | ||
| 52 | + $ PROBLEM_SIZE*idump_sub))) | ||
| 53 | + | ||
| 54 | + do ix=0,cell_size(1,cio)-1 | ||
| 55 | + write(99, rec=iseek+ix+1) | ||
| 56 | + $ u(1,ix, jio,kio,cio), | ||
| 57 | + $ u(2,ix, jio,kio,cio), | ||
| 58 | + $ u(3,ix, jio,kio,cio), | ||
| 59 | + $ u(4,ix, jio,kio,cio), | ||
| 60 | + $ u(5,ix, jio,kio,cio) | ||
| 61 | + enddo | ||
| 62 | + enddo | ||
| 63 | + enddo | ||
| 64 | + enddo | ||
| 65 | + | ||
| 66 | + idump_sub = idump_sub + 1 | ||
| 67 | + if (rd_interval .gt. 0) then | ||
| 68 | + if (idump_sub .ge. rd_interval) then | ||
| 69 | + | ||
| 70 | + call acc_sub_norms(idump+1) | ||
| 71 | + | ||
| 72 | + idump_sub = 0 | ||
| 73 | + endif | ||
| 74 | + endif | ||
| 75 | + | ||
| 76 | + return | ||
| 77 | + end | ||
| 78 | + | ||
| 79 | +c--------------------------------------------------------------------- | ||
| 80 | +c--------------------------------------------------------------------- | ||
| 81 | + | ||
| 82 | + subroutine acc_sub_norms(idump_cur) | ||
| 83 | + | ||
| 84 | + include 'header.h' | ||
| 85 | + include 'mpinpb.h' | ||
| 86 | + | ||
| 87 | + integer idump_cur | ||
| 88 | + | ||
| 89 | + integer ix, jio, kio, cio, ii, m, ichunk | ||
| 90 | + double precision xce_single(5) | ||
| 91 | + | ||
| 92 | + ichunk = idump_cur - idump_sub + 1 | ||
| 93 | + do ii=0, idump_sub-1 | ||
| 94 | + do cio=1,ncells | ||
| 95 | + do kio=0, cell_size(3,cio)-1 | ||
| 96 | + do jio=0, cell_size(2,cio)-1 | ||
| 97 | + iseek=(cell_low(1,cio) + | ||
| 98 | + $ PROBLEM_SIZE*((cell_low(2,cio)+jio) + | ||
| 99 | + $ PROBLEM_SIZE*((cell_low(3,cio)+kio) + | ||
| 100 | + $ PROBLEM_SIZE*ii))) | ||
| 101 | + | ||
| 102 | + | ||
| 103 | + do ix=0,cell_size(1,cio)-1 | ||
| 104 | + read(99, rec=iseek+ix+1) | ||
| 105 | + $ u(1,ix, jio,kio,cio), | ||
| 106 | + $ u(2,ix, jio,kio,cio), | ||
| 107 | + $ u(3,ix, jio,kio,cio), | ||
| 108 | + $ u(4,ix, jio,kio,cio), | ||
| 109 | + $ u(5,ix, jio,kio,cio) | ||
| 110 | + enddo | ||
| 111 | + enddo | ||
| 112 | + enddo | ||
| 113 | + enddo | ||
| 114 | + | ||
| 115 | + if (node .eq. root) print *, 'Reading data set ', ii+ichunk | ||
| 116 | + | ||
| 117 | + call error_norm(xce_single) | ||
| 118 | + do m = 1, 5 | ||
| 119 | + xce_sub(m) = xce_sub(m) + xce_single(m) | ||
| 120 | + end do | ||
| 121 | + enddo | ||
| 122 | + | ||
| 123 | + return | ||
| 124 | + end | ||
| 125 | + | ||
| 126 | +c--------------------------------------------------------------------- | ||
| 127 | +c--------------------------------------------------------------------- | ||
| 128 | + | ||
| 129 | + subroutine btio_cleanup | ||
| 130 | + | ||
| 131 | +c--------------------------------------------------------------------- | ||
| 132 | +c--------------------------------------------------------------------- | ||
| 133 | + | ||
| 134 | + close(unit=99) | ||
| 135 | + | ||
| 136 | + return | ||
| 137 | + end | ||
| 138 | + | ||
| 139 | +c--------------------------------------------------------------------- | ||
| 140 | +c--------------------------------------------------------------------- | ||
| 141 | + | ||
| 142 | + subroutine accumulate_norms(xce_acc) | ||
| 143 | + | ||
| 144 | +c--------------------------------------------------------------------- | ||
| 145 | +c--------------------------------------------------------------------- | ||
| 146 | + include 'header.h' | ||
| 147 | + include 'mpinpb.h' | ||
| 148 | + | ||
| 149 | + double precision xce_acc(5) | ||
| 150 | + integer m | ||
| 151 | + | ||
| 152 | + if (rd_interval .gt. 0) goto 20 | ||
| 153 | + | ||
| 154 | + open (unit=99, file=filenm, | ||
| 155 | + $ form='unformatted', access='direct', | ||
| 156 | + $ recl=record_length) | ||
| 157 | + | ||
| 158 | +c clear the last time step | ||
| 159 | + | ||
| 160 | + call clear_timestep | ||
| 161 | + | ||
| 162 | +c read back the time steps and accumulate norms | ||
| 163 | + | ||
| 164 | + call acc_sub_norms(idump) | ||
| 165 | + | ||
| 166 | + close(unit=99) | ||
| 167 | + | ||
| 168 | + 20 continue | ||
| 169 | + do m = 1, 5 | ||
| 170 | + xce_acc(m) = xce_sub(m) / dble(idump) | ||
| 171 | + end do | ||
| 172 | + | ||
| 173 | + return | ||
| 174 | + end |
NPB3.3-MPI/BT/full_mpiio.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/full_mpiio.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine setup_btio | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | + include 'header.h' | ||
| 11 | + include 'mpinpb.h' | ||
| 12 | + | ||
| 13 | + integer ierr | ||
| 14 | + integer mstatus(MPI_STATUS_SIZE) | ||
| 15 | + integer sizes(4), starts(4), subsizes(4) | ||
| 16 | + integer cell_btype(maxcells), cell_ftype(maxcells) | ||
| 17 | + integer cell_blength(maxcells) | ||
| 18 | + integer info | ||
| 19 | + character*20 cb_nodes, cb_size | ||
| 20 | + integer c, m | ||
| 21 | + integer cell_disp(maxcells) | ||
| 22 | + | ||
| 23 | + call mpi_bcast(collbuf_nodes, 1, MPI_INTEGER, | ||
| 24 | + > root, comm_setup, ierr) | ||
| 25 | + | ||
| 26 | + call mpi_bcast(collbuf_size, 1, MPI_INTEGER, | ||
| 27 | + > root, comm_setup, ierr) | ||
| 28 | + | ||
| 29 | + if (collbuf_nodes .eq. 0) then | ||
| 30 | + info = MPI_INFO_NULL | ||
| 31 | + else | ||
| 32 | + write (cb_nodes,*) collbuf_nodes | ||
| 33 | + write (cb_size,*) collbuf_size | ||
| 34 | + call MPI_Info_create(info, ierr) | ||
| 35 | + call MPI_Info_set(info, 'cb_nodes', cb_nodes, ierr) | ||
| 36 | + call MPI_Info_set(info, 'cb_buffer_size', cb_size, ierr) | ||
| 37 | + call MPI_Info_set(info, 'collective_buffering', 'true', ierr) | ||
| 38 | + endif | ||
| 39 | + | ||
| 40 | + call MPI_Type_contiguous(5, MPI_DOUBLE_PRECISION, | ||
| 41 | + $ element, ierr) | ||
| 42 | + call MPI_Type_commit(element, ierr) | ||
| 43 | + call MPI_Type_extent(element, eltext, ierr) | ||
| 44 | + | ||
| 45 | + do c = 1, ncells | ||
| 46 | +c | ||
| 47 | +c Outer array dimensions ar same for every cell | ||
| 48 | +c | ||
| 49 | + sizes(1) = IMAX+4 | ||
| 50 | + sizes(2) = JMAX+4 | ||
| 51 | + sizes(3) = KMAX+4 | ||
| 52 | +c | ||
| 53 | +c 4th dimension is cell number, total of maxcells cells | ||
| 54 | +c | ||
| 55 | + sizes(4) = maxcells | ||
| 56 | +c | ||
| 57 | +c Internal dimensions of cells can differ slightly between cells | ||
| 58 | +c | ||
| 59 | + subsizes(1) = cell_size(1, c) | ||
| 60 | + subsizes(2) = cell_size(2, c) | ||
| 61 | + subsizes(3) = cell_size(3, c) | ||
| 62 | +c | ||
| 63 | +c Cell is 4th dimension, 1 cell per cell type to handle varying | ||
| 64 | +c cell sub-array sizes | ||
| 65 | +c | ||
| 66 | + subsizes(4) = 1 | ||
| 67 | + | ||
| 68 | +c | ||
| 69 | +c type constructors use 0-based start addresses | ||
| 70 | +c | ||
| 71 | + starts(1) = 2 | ||
| 72 | + starts(2) = 2 | ||
| 73 | + starts(3) = 2 | ||
| 74 | + starts(4) = c-1 | ||
| 75 | + | ||
| 76 | +c | ||
| 77 | +c Create buftype for a cell | ||
| 78 | +c | ||
| 79 | + call MPI_Type_create_subarray(4, sizes, subsizes, | ||
| 80 | + $ starts, MPI_ORDER_FORTRAN, element, | ||
| 81 | + $ cell_btype(c), ierr) | ||
| 82 | +c | ||
| 83 | +c block length and displacement for joining cells - | ||
| 84 | +c 1 cell buftype per block, cell buftypes have own displacment | ||
| 85 | +c generated from cell number (4th array dimension) | ||
| 86 | +c | ||
| 87 | + cell_blength(c) = 1 | ||
| 88 | + cell_disp(c) = 0 | ||
| 89 | + | ||
| 90 | + enddo | ||
| 91 | +c | ||
| 92 | +c Create combined buftype for all cells | ||
| 93 | +c | ||
| 94 | + call MPI_Type_struct(ncells, cell_blength, cell_disp, | ||
| 95 | + $ cell_btype, combined_btype, ierr) | ||
| 96 | + call MPI_Type_commit(combined_btype, ierr) | ||
| 97 | + | ||
| 98 | + do c = 1, ncells | ||
| 99 | +c | ||
| 100 | +c Entire array size | ||
| 101 | +c | ||
| 102 | + sizes(1) = PROBLEM_SIZE | ||
| 103 | + sizes(2) = PROBLEM_SIZE | ||
| 104 | + sizes(3) = PROBLEM_SIZE | ||
| 105 | + | ||
| 106 | +c | ||
| 107 | +c Size of c'th cell | ||
| 108 | +c | ||
| 109 | + subsizes(1) = cell_size(1, c) | ||
| 110 | + subsizes(2) = cell_size(2, c) | ||
| 111 | + subsizes(3) = cell_size(3, c) | ||
| 112 | + | ||
| 113 | +c | ||
| 114 | +c Starting point in full array of c'th cell | ||
| 115 | +c | ||
| 116 | + starts(1) = cell_low(1,c) | ||
| 117 | + starts(2) = cell_low(2,c) | ||
| 118 | + starts(3) = cell_low(3,c) | ||
| 119 | + | ||
| 120 | + call MPI_Type_create_subarray(3, sizes, subsizes, | ||
| 121 | + $ starts, MPI_ORDER_FORTRAN, | ||
| 122 | + $ element, cell_ftype(c), ierr) | ||
| 123 | + cell_blength(c) = 1 | ||
| 124 | + cell_disp(c) = 0 | ||
| 125 | + enddo | ||
| 126 | + | ||
| 127 | + call MPI_Type_struct(ncells, cell_blength, cell_disp, | ||
| 128 | + $ cell_ftype, combined_ftype, ierr) | ||
| 129 | + call MPI_Type_commit(combined_ftype, ierr) | ||
| 130 | + | ||
| 131 | + iseek=0 | ||
| 132 | + if (node .eq. root) then | ||
| 133 | + call MPI_File_delete(filenm, MPI_INFO_NULL, ierr) | ||
| 134 | + endif | ||
| 135 | + | ||
| 136 | + | ||
| 137 | + call MPI_Barrier(comm_solve, ierr) | ||
| 138 | + | ||
| 139 | + call MPI_File_open(comm_solve, | ||
| 140 | + $ filenm, | ||
| 141 | + $ MPI_MODE_RDWR+MPI_MODE_CREATE, | ||
| 142 | + $ MPI_INFO_NULL, fp, ierr) | ||
| 143 | + | ||
| 144 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 145 | + print *, 'Error opening file' | ||
| 146 | + stop | ||
| 147 | + endif | ||
| 148 | + | ||
| 149 | + call MPI_File_set_view(fp, iseek, element, | ||
| 150 | + $ combined_ftype, 'native', info, ierr) | ||
| 151 | + | ||
| 152 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 153 | + print *, 'Error setting file view' | ||
| 154 | + stop | ||
| 155 | + endif | ||
| 156 | + | ||
| 157 | + do m = 1, 5 | ||
| 158 | + xce_sub(m) = 0.d0 | ||
| 159 | + end do | ||
| 160 | + | ||
| 161 | + idump_sub = 0 | ||
| 162 | + | ||
| 163 | + | ||
| 164 | + return | ||
| 165 | + end | ||
| 166 | + | ||
| 167 | +c--------------------------------------------------------------------- | ||
| 168 | +c--------------------------------------------------------------------- | ||
| 169 | + | ||
| 170 | + subroutine output_timestep | ||
| 171 | + | ||
| 172 | +c--------------------------------------------------------------------- | ||
| 173 | +c--------------------------------------------------------------------- | ||
| 174 | + include 'header.h' | ||
| 175 | + include 'mpinpb.h' | ||
| 176 | + | ||
| 177 | + integer mstatus(MPI_STATUS_SIZE) | ||
| 178 | + integer ierr | ||
| 179 | + | ||
| 180 | + call MPI_File_write_at_all(fp, iseek, u, | ||
| 181 | + $ 1, combined_btype, mstatus, ierr) | ||
| 182 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 183 | + print *, 'Error writing to file' | ||
| 184 | + stop | ||
| 185 | + endif | ||
| 186 | + | ||
| 187 | + call MPI_Type_size(combined_btype, iosize, ierr) | ||
| 188 | + iseek = iseek + iosize/eltext | ||
| 189 | + | ||
| 190 | + idump_sub = idump_sub + 1 | ||
| 191 | + if (rd_interval .gt. 0) then | ||
| 192 | + if (idump_sub .ge. rd_interval) then | ||
| 193 | + | ||
| 194 | + iseek = 0 | ||
| 195 | + call acc_sub_norms(idump+1) | ||
| 196 | + | ||
| 197 | + iseek = 0 | ||
| 198 | + idump_sub = 0 | ||
| 199 | + endif | ||
| 200 | + endif | ||
| 201 | + | ||
| 202 | + return | ||
| 203 | + end | ||
| 204 | + | ||
| 205 | +c--------------------------------------------------------------------- | ||
| 206 | +c--------------------------------------------------------------------- | ||
| 207 | + | ||
| 208 | + subroutine acc_sub_norms(idump_cur) | ||
| 209 | + | ||
| 210 | + include 'header.h' | ||
| 211 | + include 'mpinpb.h' | ||
| 212 | + | ||
| 213 | + integer idump_cur | ||
| 214 | + | ||
| 215 | + integer ii, m, ichunk | ||
| 216 | + integer ierr | ||
| 217 | + integer mstatus(MPI_STATUS_SIZE) | ||
| 218 | + double precision xce_single(5) | ||
| 219 | + | ||
| 220 | + ichunk = idump_cur - idump_sub + 1 | ||
| 221 | + do ii=0, idump_sub-1 | ||
| 222 | + | ||
| 223 | + call MPI_File_read_at_all(fp, iseek, u, | ||
| 224 | + $ 1, combined_btype, mstatus, ierr) | ||
| 225 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 226 | + print *, 'Error reading back file' | ||
| 227 | + call MPI_File_close(fp, ierr) | ||
| 228 | + stop | ||
| 229 | + endif | ||
| 230 | + | ||
| 231 | + call MPI_Type_size(combined_btype, iosize, ierr) | ||
| 232 | + iseek = iseek + iosize/eltext | ||
| 233 | + | ||
| 234 | + if (node .eq. root) print *, 'Reading data set ', ii+ichunk | ||
| 235 | + | ||
| 236 | + call error_norm(xce_single) | ||
| 237 | + do m = 1, 5 | ||
| 238 | + xce_sub(m) = xce_sub(m) + xce_single(m) | ||
| 239 | + end do | ||
| 240 | + enddo | ||
| 241 | + | ||
| 242 | + return | ||
| 243 | + end | ||
| 244 | + | ||
| 245 | +c--------------------------------------------------------------------- | ||
| 246 | +c--------------------------------------------------------------------- | ||
| 247 | + | ||
| 248 | + subroutine btio_cleanup | ||
| 249 | + | ||
| 250 | +c--------------------------------------------------------------------- | ||
| 251 | +c--------------------------------------------------------------------- | ||
| 252 | + include 'header.h' | ||
| 253 | + include 'mpinpb.h' | ||
| 254 | + | ||
| 255 | + integer ierr | ||
| 256 | + | ||
| 257 | + call MPI_File_close(fp, ierr) | ||
| 258 | + | ||
| 259 | + return | ||
| 260 | + end | ||
| 261 | + | ||
| 262 | +c--------------------------------------------------------------------- | ||
| 263 | +c--------------------------------------------------------------------- | ||
| 264 | + | ||
| 265 | + | ||
| 266 | + subroutine accumulate_norms(xce_acc) | ||
| 267 | + | ||
| 268 | +c--------------------------------------------------------------------- | ||
| 269 | +c--------------------------------------------------------------------- | ||
| 270 | + | ||
| 271 | + include 'header.h' | ||
| 272 | + include 'mpinpb.h' | ||
| 273 | + | ||
| 274 | + double precision xce_acc(5) | ||
| 275 | + integer m, ierr | ||
| 276 | + | ||
| 277 | + if (rd_interval .gt. 0) goto 20 | ||
| 278 | + | ||
| 279 | + call MPI_File_open(comm_solve, | ||
| 280 | + $ filenm, | ||
| 281 | + $ MPI_MODE_RDONLY, | ||
| 282 | + $ MPI_INFO_NULL, | ||
| 283 | + $ fp, | ||
| 284 | + $ ierr) | ||
| 285 | + | ||
| 286 | + iseek = 0 | ||
| 287 | + call MPI_File_set_view(fp, iseek, element, combined_ftype, | ||
| 288 | + $ 'native', MPI_INFO_NULL, ierr) | ||
| 289 | + | ||
| 290 | +c clear the last time step | ||
| 291 | + | ||
| 292 | + call clear_timestep | ||
| 293 | + | ||
| 294 | +c read back the time steps and accumulate norms | ||
| 295 | + | ||
| 296 | + call acc_sub_norms(idump) | ||
| 297 | + | ||
| 298 | + call MPI_File_close(fp, ierr) | ||
| 299 | + | ||
| 300 | + 20 continue | ||
| 301 | + do m = 1, 5 | ||
| 302 | + xce_acc(m) = xce_sub(m) / dble(idump) | ||
| 303 | + end do | ||
| 304 | + | ||
| 305 | + return | ||
| 306 | + end | ||
| 307 | + |
NPB3.3-MPI/BT/header.h
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/header.h | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c | ||
| 4 | +c header.h | ||
| 5 | +c | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | + implicit none | ||
| 10 | + | ||
| 11 | +c--------------------------------------------------------------------- | ||
| 12 | +c The following include file is generated automatically by the | ||
| 13 | +c "setparams" utility. It defines | ||
| 14 | +c maxcells: the square root of the maximum number of processors | ||
| 15 | +c problem_size: 12, 64, 102, 162 (for class T, A, B, C) | ||
| 16 | +c dt_default: default time step for this problem size if no | ||
| 17 | +c config file | ||
| 18 | +c niter_default: default number of iterations for this problem size | ||
| 19 | +c--------------------------------------------------------------------- | ||
| 20 | + | ||
| 21 | + include 'npbparams.h' | ||
| 22 | + | ||
| 23 | + integer aa, bb, cc, BLOCK_SIZE | ||
| 24 | + parameter (aa=1, bb=2, cc=3, BLOCK_SIZE=5) | ||
| 25 | + | ||
| 26 | + integer ncells, grid_points(3) | ||
| 27 | + double precision elapsed_time | ||
| 28 | + common /global/ elapsed_time, ncells, grid_points | ||
| 29 | + | ||
| 30 | + double precision tx1, tx2, tx3, ty1, ty2, ty3, tz1, tz2, tz3, | ||
| 31 | + > dx1, dx2, dx3, dx4, dx5, dy1, dy2, dy3, dy4, | ||
| 32 | + > dy5, dz1, dz2, dz3, dz4, dz5, dssp, dt, | ||
| 33 | + > ce(5,13), dxmax, dymax, dzmax, xxcon1, xxcon2, | ||
| 34 | + > xxcon3, xxcon4, xxcon5, dx1tx1, dx2tx1, dx3tx1, | ||
| 35 | + > dx4tx1, dx5tx1, yycon1, yycon2, yycon3, yycon4, | ||
| 36 | + > yycon5, dy1ty1, dy2ty1, dy3ty1, dy4ty1, dy5ty1, | ||
| 37 | + > zzcon1, zzcon2, zzcon3, zzcon4, zzcon5, dz1tz1, | ||
| 38 | + > dz2tz1, dz3tz1, dz4tz1, dz5tz1, dnxm1, dnym1, | ||
| 39 | + > dnzm1, c1c2, c1c5, c3c4, c1345, conz1, c1, c2, | ||
| 40 | + > c3, c4, c5, c4dssp, c5dssp, dtdssp, dttx1, bt, | ||
| 41 | + > dttx2, dtty1, dtty2, dttz1, dttz2, c2dttx1, | ||
| 42 | + > c2dtty1, c2dttz1, comz1, comz4, comz5, comz6, | ||
| 43 | + > c3c4tx3, c3c4ty3, c3c4tz3, c2iv, con43, con16 | ||
| 44 | + | ||
| 45 | + common /constants/ tx1, tx2, tx3, ty1, ty2, ty3, tz1, tz2, tz3, | ||
| 46 | + > dx1, dx2, dx3, dx4, dx5, dy1, dy2, dy3, dy4, | ||
| 47 | + > dy5, dz1, dz2, dz3, dz4, dz5, dssp, dt, | ||
| 48 | + > ce, dxmax, dymax, dzmax, xxcon1, xxcon2, | ||
| 49 | + > xxcon3, xxcon4, xxcon5, dx1tx1, dx2tx1, dx3tx1, | ||
| 50 | + > dx4tx1, dx5tx1, yycon1, yycon2, yycon3, yycon4, | ||
| 51 | + > yycon5, dy1ty1, dy2ty1, dy3ty1, dy4ty1, dy5ty1, | ||
| 52 | + > zzcon1, zzcon2, zzcon3, zzcon4, zzcon5, dz1tz1, | ||
| 53 | + > dz2tz1, dz3tz1, dz4tz1, dz5tz1, dnxm1, dnym1, | ||
| 54 | + > dnzm1, c1c2, c1c5, c3c4, c1345, conz1, c1, c2, | ||
| 55 | + > c3, c4, c5, c4dssp, c5dssp, dtdssp, dttx1, bt, | ||
| 56 | + > dttx2, dtty1, dtty2, dttz1, dttz2, c2dttx1, | ||
| 57 | + > c2dtty1, c2dttz1, comz1, comz4, comz5, comz6, | ||
| 58 | + > c3c4tx3, c3c4ty3, c3c4tz3, c2iv, con43, con16 | ||
| 59 | + | ||
| 60 | + integer EAST, WEST, NORTH, SOUTH, | ||
| 61 | + > BOTTOM, TOP | ||
| 62 | + | ||
| 63 | + parameter (EAST=2000, WEST=3000, NORTH=4000, SOUTH=5000, | ||
| 64 | + > BOTTOM=6000, TOP=7000) | ||
| 65 | + | ||
| 66 | + integer cell_coord (3,maxcells), cell_low (3,maxcells), | ||
| 67 | + > cell_high (3,maxcells), cell_size(3,maxcells), | ||
| 68 | + > predecessor(3), slice (3,maxcells), | ||
| 69 | + > grid_size (3), successor(3) , | ||
| 70 | + > start (3,maxcells), end (3,maxcells) | ||
| 71 | + common /partition/ cell_coord, cell_low, cell_high, cell_size, | ||
| 72 | + > grid_size, successor, predecessor, slice, | ||
| 73 | + > start, end | ||
| 74 | + | ||
| 75 | + integer IMAX, JMAX, KMAX, MAX_CELL_DIM, BUF_SIZE | ||
| 76 | + | ||
| 77 | + parameter (MAX_CELL_DIM = (problem_size/maxcells)+1) | ||
| 78 | + | ||
| 79 | + parameter (IMAX=MAX_CELL_DIM,JMAX=MAX_CELL_DIM,KMAX=MAX_CELL_DIM) | ||
| 80 | + | ||
| 81 | + parameter (BUF_SIZE=MAX_CELL_DIM*MAX_CELL_DIM*(maxcells-1)*60+1) | ||
| 82 | + | ||
| 83 | + double precision | ||
| 84 | + > us ( -1:IMAX, -1:JMAX, -1:KMAX, maxcells), | ||
| 85 | + > vs ( -1:IMAX, -1:JMAX, -1:KMAX, maxcells), | ||
| 86 | + > ws ( -1:IMAX, -1:JMAX, -1:KMAX, maxcells), | ||
| 87 | + > qs ( -1:IMAX, -1:JMAX, -1:KMAX, maxcells), | ||
| 88 | + > rho_i ( -1:IMAX, -1:JMAX, -1:KMAX, maxcells), | ||
| 89 | + > square ( -1:IMAX, -1:JMAX, -1:KMAX, maxcells), | ||
| 90 | + > forcing (5, 0:IMAX-1, 0:JMAX-1, 0:KMAX-1, maxcells), | ||
| 91 | + > u (5, -2:IMAX+1,-2:JMAX+1,-2:KMAX+1, maxcells), | ||
| 92 | + > rhs (5, -1:IMAX-1,-1:JMAX-1,-1:KMAX-1, maxcells), | ||
| 93 | + > lhsc (5,5,-1:IMAX-1,-1:JMAX-1,-1:KMAX-1, maxcells), | ||
| 94 | + > backsub_info (5, 0:MAX_CELL_DIM, 0:MAX_CELL_DIM, maxcells), | ||
| 95 | + > in_buffer(BUF_SIZE), out_buffer(BUF_SIZE) | ||
| 96 | + common /fields/ u, us, vs, ws, qs, rho_i, square, | ||
| 97 | + > rhs, forcing, lhsc, in_buffer, out_buffer, | ||
| 98 | + > backsub_info | ||
| 99 | + | ||
| 100 | + double precision cv(-2:MAX_CELL_DIM+1), rhon(-2:MAX_CELL_DIM+1), | ||
| 101 | + > rhos(-2:MAX_CELL_DIM+1), rhoq(-2:MAX_CELL_DIM+1), | ||
| 102 | + > cuf(-2:MAX_CELL_DIM+1), q(-2:MAX_CELL_DIM+1), | ||
| 103 | + > ue(-2:MAX_CELL_DIM+1,5), buf(-2:MAX_CELL_DIM+1,5) | ||
| 104 | + common /work_1d/ cv, rhon, rhos, rhoq, cuf, q, ue, buf | ||
| 105 | + | ||
| 106 | + integer west_size, east_size, bottom_size, top_size, | ||
| 107 | + > north_size, south_size, start_send_west, | ||
| 108 | + > start_send_east, start_send_south, start_send_north, | ||
| 109 | + > start_send_bottom, start_send_top, start_recv_west, | ||
| 110 | + > start_recv_east, start_recv_south, start_recv_north, | ||
| 111 | + > start_recv_bottom, start_recv_top | ||
| 112 | + common /box/ west_size, east_size, bottom_size, | ||
| 113 | + > top_size, north_size, south_size, | ||
| 114 | + > start_send_west, start_send_east, start_send_south, | ||
| 115 | + > start_send_north, start_send_bottom, start_send_top, | ||
| 116 | + > start_recv_west, start_recv_east, start_recv_south, | ||
| 117 | + > start_recv_north, start_recv_bottom, start_recv_top | ||
| 118 | + | ||
| 119 | + double precision tmp_block(5,5), b_inverse(5,5), tmp_vec(5) | ||
| 120 | + common /work_solve/ tmp_block, b_inverse, tmp_vec | ||
| 121 | + | ||
| 122 | +c | ||
| 123 | +c These are used by btio | ||
| 124 | +c | ||
| 125 | + integer collbuf_nodes, collbuf_size, iosize, eltext, | ||
| 126 | + $ combined_btype, fp, idump, record_length, element, | ||
| 127 | + $ combined_ftype, idump_sub, rd_interval | ||
| 128 | + common /btio/ collbuf_nodes, collbuf_size, iosize, eltext, | ||
| 129 | + $ combined_btype, fp, idump, record_length, | ||
| 130 | + $ idump_sub, rd_interval | ||
| 131 | + double precision sum(niter_default), xce_sub(5) | ||
| 132 | + common /btio/ sum, xce_sub | ||
| 133 | + integer*8 iseek | ||
| 134 | + common /btio/ iseek, element, combined_ftype | ||
| 135 | + | ||
| 136 | + | ||
| 137 | + integer t_total, t_io, t_rhs, t_xsolve, t_ysolve, t_zsolve, | ||
| 138 | + > t_bpack, t_exch, t_xcomm, t_ycomm, t_zcomm, t_last | ||
| 139 | + parameter (t_total=1, t_io=2, t_rhs=3, t_xsolve=4, t_ysolve=5, | ||
| 140 | + > t_zsolve=6, t_bpack=7, t_exch=8, t_xcomm=9, | ||
| 141 | + > t_ycomm=10, t_zcomm=11, t_last=11) | ||
| 142 | + logical timeron | ||
| 143 | + common /tflags/ timeron | ||
| 144 | + | ||
| 145 | + | ||
| 146 | + |
NPB3.3-MPI/BT/initialize.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/initialize.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine initialize | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | +c--------------------------------------------------------------------- | ||
| 10 | +c This subroutine initializes the field variable u using | ||
| 11 | +c tri-linear transfinite interpolation of the boundary values | ||
| 12 | +c--------------------------------------------------------------------- | ||
| 13 | + | ||
| 14 | + include 'header.h' | ||
| 15 | + | ||
| 16 | + integer c, i, j, k, m, ii, jj, kk, ix, iy, iz | ||
| 17 | + double precision xi, eta, zeta, Pface(5,3,2), Pxi, Peta, | ||
| 18 | + > Pzeta, temp(5) | ||
| 19 | + | ||
| 20 | +c--------------------------------------------------------------------- | ||
| 21 | +c Later (in compute_rhs) we compute 1/u for every element. A few of | ||
| 22 | +c the corner elements are not used, but it convenient (and faster) | ||
| 23 | +c to compute the whole thing with a simple loop. Make sure those | ||
| 24 | +c values are nonzero by initializing the whole thing here. | ||
| 25 | +c--------------------------------------------------------------------- | ||
| 26 | + do c = 1, ncells | ||
| 27 | + do kk = -1, KMAX | ||
| 28 | + do jj = -1, JMAX | ||
| 29 | + do ii = -1, IMAX | ||
| 30 | + do m = 1, 5 | ||
| 31 | + u(m, ii, jj, kk, c) = 1.0 | ||
| 32 | + end do | ||
| 33 | + end do | ||
| 34 | + end do | ||
| 35 | + end do | ||
| 36 | + end do | ||
| 37 | +c--------------------------------------------------------------------- | ||
| 38 | + | ||
| 39 | + | ||
| 40 | + | ||
| 41 | +c--------------------------------------------------------------------- | ||
| 42 | +c first store the "interpolated" values everywhere on the grid | ||
| 43 | +c--------------------------------------------------------------------- | ||
| 44 | + do c=1, ncells | ||
| 45 | + kk = 0 | ||
| 46 | + do k = cell_low(3,c), cell_high(3,c) | ||
| 47 | + zeta = dble(k) * dnzm1 | ||
| 48 | + jj = 0 | ||
| 49 | + do j = cell_low(2,c), cell_high(2,c) | ||
| 50 | + eta = dble(j) * dnym1 | ||
| 51 | + ii = 0 | ||
| 52 | + do i = cell_low(1,c), cell_high(1,c) | ||
| 53 | + xi = dble(i) * dnxm1 | ||
| 54 | + | ||
| 55 | + do ix = 1, 2 | ||
| 56 | + call exact_solution(dble(ix-1), eta, zeta, | ||
| 57 | + > Pface(1,1,ix)) | ||
| 58 | + enddo | ||
| 59 | + | ||
| 60 | + do iy = 1, 2 | ||
| 61 | + call exact_solution(xi, dble(iy-1) , zeta, | ||
| 62 | + > Pface(1,2,iy)) | ||
| 63 | + enddo | ||
| 64 | + | ||
| 65 | + do iz = 1, 2 | ||
| 66 | + call exact_solution(xi, eta, dble(iz-1), | ||
| 67 | + > Pface(1,3,iz)) | ||
| 68 | + enddo | ||
| 69 | + | ||
| 70 | + do m = 1, 5 | ||
| 71 | + Pxi = xi * Pface(m,1,2) + | ||
| 72 | + > (1.0d0-xi) * Pface(m,1,1) | ||
| 73 | + Peta = eta * Pface(m,2,2) + | ||
| 74 | + > (1.0d0-eta) * Pface(m,2,1) | ||
| 75 | + Pzeta = zeta * Pface(m,3,2) + | ||
| 76 | + > (1.0d0-zeta) * Pface(m,3,1) | ||
| 77 | + | ||
| 78 | + u(m,ii,jj,kk,c) = Pxi + Peta + Pzeta - | ||
| 79 | + > Pxi*Peta - Pxi*Pzeta - Peta*Pzeta + | ||
| 80 | + > Pxi*Peta*Pzeta | ||
| 81 | + | ||
| 82 | + enddo | ||
| 83 | + ii = ii + 1 | ||
| 84 | + enddo | ||
| 85 | + jj = jj + 1 | ||
| 86 | + enddo | ||
| 87 | + kk = kk+1 | ||
| 88 | + enddo | ||
| 89 | + enddo | ||
| 90 | + | ||
| 91 | +c--------------------------------------------------------------------- | ||
| 92 | +c now store the exact values on the boundaries | ||
| 93 | +c--------------------------------------------------------------------- | ||
| 94 | + | ||
| 95 | +c--------------------------------------------------------------------- | ||
| 96 | +c west face | ||
| 97 | +c--------------------------------------------------------------------- | ||
| 98 | + c = slice(1,1) | ||
| 99 | + ii = 0 | ||
| 100 | + xi = 0.0d0 | ||
| 101 | + kk = 0 | ||
| 102 | + do k = cell_low(3,c), cell_high(3,c) | ||
| 103 | + zeta = dble(k) * dnzm1 | ||
| 104 | + jj = 0 | ||
| 105 | + do j = cell_low(2,c), cell_high(2,c) | ||
| 106 | + eta = dble(j) * dnym1 | ||
| 107 | + call exact_solution(xi, eta, zeta, temp) | ||
| 108 | + do m = 1, 5 | ||
| 109 | + u(m,ii,jj,kk,c) = temp(m) | ||
| 110 | + enddo | ||
| 111 | + jj = jj + 1 | ||
| 112 | + enddo | ||
| 113 | + kk = kk + 1 | ||
| 114 | + enddo | ||
| 115 | + | ||
| 116 | +c--------------------------------------------------------------------- | ||
| 117 | +c east face | ||
| 118 | +c--------------------------------------------------------------------- | ||
| 119 | + c = slice(1,ncells) | ||
| 120 | + ii = cell_size(1,c)-1 | ||
| 121 | + xi = 1.0d0 | ||
| 122 | + kk = 0 | ||
| 123 | + do k = cell_low(3,c), cell_high(3,c) | ||
| 124 | + zeta = dble(k) * dnzm1 | ||
| 125 | + jj = 0 | ||
| 126 | + do j = cell_low(2,c), cell_high(2,c) | ||
| 127 | + eta = dble(j) * dnym1 | ||
| 128 | + call exact_solution(xi, eta, zeta, temp) | ||
| 129 | + do m = 1, 5 | ||
| 130 | + u(m,ii,jj,kk,c) = temp(m) | ||
| 131 | + enddo | ||
| 132 | + jj = jj + 1 | ||
| 133 | + enddo | ||
| 134 | + kk = kk + 1 | ||
| 135 | + enddo | ||
| 136 | + | ||
| 137 | +c--------------------------------------------------------------------- | ||
| 138 | +c south face | ||
| 139 | +c--------------------------------------------------------------------- | ||
| 140 | + c = slice(2,1) | ||
| 141 | + jj = 0 | ||
| 142 | + eta = 0.0d0 | ||
| 143 | + kk = 0 | ||
| 144 | + do k = cell_low(3,c), cell_high(3,c) | ||
| 145 | + zeta = dble(k) * dnzm1 | ||
| 146 | + ii = 0 | ||
| 147 | + do i = cell_low(1,c), cell_high(1,c) | ||
| 148 | + xi = dble(i) * dnxm1 | ||
| 149 | + call exact_solution(xi, eta, zeta, temp) | ||
| 150 | + do m = 1, 5 | ||
| 151 | + u(m,ii,jj,kk,c) = temp(m) | ||
| 152 | + enddo | ||
| 153 | + ii = ii + 1 | ||
| 154 | + enddo | ||
| 155 | + kk = kk + 1 | ||
| 156 | + enddo | ||
| 157 | + | ||
| 158 | + | ||
| 159 | +c--------------------------------------------------------------------- | ||
| 160 | +c north face | ||
| 161 | +c--------------------------------------------------------------------- | ||
| 162 | + c = slice(2,ncells) | ||
| 163 | + jj = cell_size(2,c)-1 | ||
| 164 | + eta = 1.0d0 | ||
| 165 | + kk = 0 | ||
| 166 | + do k = cell_low(3,c), cell_high(3,c) | ||
| 167 | + zeta = dble(k) * dnzm1 | ||
| 168 | + ii = 0 | ||
| 169 | + do i = cell_low(1,c), cell_high(1,c) | ||
| 170 | + xi = dble(i) * dnxm1 | ||
| 171 | + call exact_solution(xi, eta, zeta, temp) | ||
| 172 | + do m = 1, 5 | ||
| 173 | + u(m,ii,jj,kk,c) = temp(m) | ||
| 174 | + enddo | ||
| 175 | + ii = ii + 1 | ||
| 176 | + enddo | ||
| 177 | + kk = kk + 1 | ||
| 178 | + enddo | ||
| 179 | + | ||
| 180 | +c--------------------------------------------------------------------- | ||
| 181 | +c bottom face | ||
| 182 | +c--------------------------------------------------------------------- | ||
| 183 | + c = slice(3,1) | ||
| 184 | + kk = 0 | ||
| 185 | + zeta = 0.0d0 | ||
| 186 | + jj = 0 | ||
| 187 | + do j = cell_low(2,c), cell_high(2,c) | ||
| 188 | + eta = dble(j) * dnym1 | ||
| 189 | + ii = 0 | ||
| 190 | + do i =cell_low(1,c), cell_high(1,c) | ||
| 191 | + xi = dble(i) *dnxm1 | ||
| 192 | + call exact_solution(xi, eta, zeta, temp) | ||
| 193 | + do m = 1, 5 | ||
| 194 | + u(m,ii,jj,kk,c) = temp(m) | ||
| 195 | + enddo | ||
| 196 | + ii = ii + 1 | ||
| 197 | + enddo | ||
| 198 | + jj = jj + 1 | ||
| 199 | + enddo | ||
| 200 | + | ||
| 201 | +c--------------------------------------------------------------------- | ||
| 202 | +c top face | ||
| 203 | +c--------------------------------------------------------------------- | ||
| 204 | + c = slice(3,ncells) | ||
| 205 | + kk = cell_size(3,c)-1 | ||
| 206 | + zeta = 1.0d0 | ||
| 207 | + jj = 0 | ||
| 208 | + do j = cell_low(2,c), cell_high(2,c) | ||
| 209 | + eta = dble(j) * dnym1 | ||
| 210 | + ii = 0 | ||
| 211 | + do i =cell_low(1,c), cell_high(1,c) | ||
| 212 | + xi = dble(i) * dnxm1 | ||
| 213 | + call exact_solution(xi, eta, zeta, temp) | ||
| 214 | + do m = 1, 5 | ||
| 215 | + u(m,ii,jj,kk,c) = temp(m) | ||
| 216 | + enddo | ||
| 217 | + ii = ii + 1 | ||
| 218 | + enddo | ||
| 219 | + jj = jj + 1 | ||
| 220 | + enddo | ||
| 221 | + | ||
| 222 | + return | ||
| 223 | + end | ||
| 224 | + | ||
| 225 | + | ||
| 226 | +c--------------------------------------------------------------------- | ||
| 227 | +c--------------------------------------------------------------------- | ||
| 228 | + | ||
| 229 | + subroutine lhsinit | ||
| 230 | + | ||
| 231 | +c--------------------------------------------------------------------- | ||
| 232 | +c--------------------------------------------------------------------- | ||
| 233 | + | ||
| 234 | + include 'header.h' | ||
| 235 | + | ||
| 236 | + integer i, j, k, d, c, m, n | ||
| 237 | + | ||
| 238 | +c--------------------------------------------------------------------- | ||
| 239 | +c loop over all cells | ||
| 240 | +c--------------------------------------------------------------------- | ||
| 241 | + do c = 1, ncells | ||
| 242 | + | ||
| 243 | +c--------------------------------------------------------------------- | ||
| 244 | +c first, initialize the start and end arrays | ||
| 245 | +c--------------------------------------------------------------------- | ||
| 246 | + do d = 1, 3 | ||
| 247 | + if (cell_coord(d,c) .eq. 1) then | ||
| 248 | + start(d,c) = 1 | ||
| 249 | + else | ||
| 250 | + start(d,c) = 0 | ||
| 251 | + endif | ||
| 252 | + if (cell_coord(d,c) .eq. ncells) then | ||
| 253 | + end(d,c) = 1 | ||
| 254 | + else | ||
| 255 | + end(d,c) = 0 | ||
| 256 | + endif | ||
| 257 | + enddo | ||
| 258 | + | ||
| 259 | +c--------------------------------------------------------------------- | ||
| 260 | +c zero the whole left hand side for starters | ||
| 261 | +c--------------------------------------------------------------------- | ||
| 262 | + do k = 0, cell_size(3,c)-1 | ||
| 263 | + do j = 0, cell_size(2,c)-1 | ||
| 264 | + do i = 0, cell_size(1,c)-1 | ||
| 265 | + do m = 1,5 | ||
| 266 | + do n = 1, 5 | ||
| 267 | + lhsc(m,n,i,j,k,c) = 0.0d0 | ||
| 268 | + enddo | ||
| 269 | + enddo | ||
| 270 | + enddo | ||
| 271 | + enddo | ||
| 272 | + enddo | ||
| 273 | + | ||
| 274 | + enddo | ||
| 275 | + | ||
| 276 | + return | ||
| 277 | + end | ||
| 278 | + | ||
| 279 | + | ||
| 280 | +c--------------------------------------------------------------------- | ||
| 281 | +c--------------------------------------------------------------------- | ||
| 282 | + | ||
| 283 | + subroutine lhsabinit(lhsa, lhsb, size) | ||
| 284 | + implicit none | ||
| 285 | + | ||
| 286 | + integer size | ||
| 287 | + double precision lhsa(5, 5, -1:size), lhsb(5, 5, -1:size) | ||
| 288 | + | ||
| 289 | + integer i, m, n | ||
| 290 | + | ||
| 291 | +c--------------------------------------------------------------------- | ||
| 292 | +c next, set all diagonal values to 1. This is overkill, but convenient | ||
| 293 | +c--------------------------------------------------------------------- | ||
| 294 | + do i = 0, size | ||
| 295 | + do m = 1, 5 | ||
| 296 | + do n = 1, 5 | ||
| 297 | + lhsa(m,n,i) = 0.0d0 | ||
| 298 | + lhsb(m,n,i) = 0.0d0 | ||
| 299 | + enddo | ||
| 300 | + lhsb(m,m,i) = 1.0d0 | ||
| 301 | + enddo | ||
| 302 | + enddo | ||
| 303 | + | ||
| 304 | + return | ||
| 305 | + end | ||
| 306 | + | ||
| 307 | + | ||
| 308 | + |
NPB3.3-MPI/BT/inputbt.data.sample
0 โ 100644
NPB3.3-MPI/BT/make_set.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/make_set.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine make_set | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | +c--------------------------------------------------------------------- | ||
| 10 | +c This function allocates space for a set of cells and fills the set | ||
| 11 | +c such that communication between cells on different nodes is only | ||
| 12 | +c nearest neighbor | ||
| 13 | +c--------------------------------------------------------------------- | ||
| 14 | + | ||
| 15 | + include 'header.h' | ||
| 16 | + include 'mpinpb.h' | ||
| 17 | + | ||
| 18 | + | ||
| 19 | + integer p, i, j, c, dir, size, excess, ierr,ierrcode | ||
| 20 | + | ||
| 21 | +c--------------------------------------------------------------------- | ||
| 22 | +c compute square root; add small number to allow for roundoff | ||
| 23 | +c (note: this is computed in setup_mpi.f also, but prefer to do | ||
| 24 | +c it twice because of some include file problems). | ||
| 25 | +c--------------------------------------------------------------------- | ||
| 26 | + ncells = dint(dsqrt(dble(no_nodes) + 0.00001d0)) | ||
| 27 | + | ||
| 28 | +c--------------------------------------------------------------------- | ||
| 29 | +c this makes coding easier | ||
| 30 | +c--------------------------------------------------------------------- | ||
| 31 | + p = ncells | ||
| 32 | + | ||
| 33 | +c--------------------------------------------------------------------- | ||
| 34 | +c determine the location of the cell at the bottom of the 3D | ||
| 35 | +c array of cells | ||
| 36 | +c--------------------------------------------------------------------- | ||
| 37 | + cell_coord(1,1) = mod(node,p) | ||
| 38 | + cell_coord(2,1) = node/p | ||
| 39 | + cell_coord(3,1) = 0 | ||
| 40 | + | ||
| 41 | +c--------------------------------------------------------------------- | ||
| 42 | +c set the cell_coords for cells in the rest of the z-layers; | ||
| 43 | +c this comes down to a simple linear numbering in the z-direct- | ||
| 44 | +c ion, and to the doubly-cyclic numbering in the other dirs | ||
| 45 | +c--------------------------------------------------------------------- | ||
| 46 | + do c=2, p | ||
| 47 | + cell_coord(1,c) = mod(cell_coord(1,c-1)+1,p) | ||
| 48 | + cell_coord(2,c) = mod(cell_coord(2,c-1)-1+p,p) | ||
| 49 | + cell_coord(3,c) = c-1 | ||
| 50 | + end do | ||
| 51 | + | ||
| 52 | +c--------------------------------------------------------------------- | ||
| 53 | +c offset all the coordinates by 1 to adjust for Fortran arrays | ||
| 54 | +c--------------------------------------------------------------------- | ||
| 55 | + do dir = 1, 3 | ||
| 56 | + do c = 1, p | ||
| 57 | + cell_coord(dir,c) = cell_coord(dir,c) + 1 | ||
| 58 | + end do | ||
| 59 | + end do | ||
| 60 | + | ||
| 61 | +c--------------------------------------------------------------------- | ||
| 62 | +c slice(dir,n) contains the sequence number of the cell that is in | ||
| 63 | +c coordinate plane n in the dir direction | ||
| 64 | +c--------------------------------------------------------------------- | ||
| 65 | + do dir = 1, 3 | ||
| 66 | + do c = 1, p | ||
| 67 | + slice(dir,cell_coord(dir,c)) = c | ||
| 68 | + end do | ||
| 69 | + end do | ||
| 70 | + | ||
| 71 | + | ||
| 72 | +c--------------------------------------------------------------------- | ||
| 73 | +c fill the predecessor and successor entries, using the indices | ||
| 74 | +c of the bottom cells (they are the same at each level of k | ||
| 75 | +c anyway) acting as if full periodicity pertains; note that p is | ||
| 76 | +c added to those arguments to the mod functions that might | ||
| 77 | +c otherwise return wrong values when using the modulo function | ||
| 78 | +c--------------------------------------------------------------------- | ||
| 79 | + i = cell_coord(1,1)-1 | ||
| 80 | + j = cell_coord(2,1)-1 | ||
| 81 | + | ||
| 82 | + predecessor(1) = mod(i-1+p,p) + p*j | ||
| 83 | + predecessor(2) = i + p*mod(j-1+p,p) | ||
| 84 | + predecessor(3) = mod(i+1,p) + p*mod(j-1+p,p) | ||
| 85 | + successor(1) = mod(i+1,p) + p*j | ||
| 86 | + successor(2) = i + p*mod(j+1,p) | ||
| 87 | + successor(3) = mod(i-1+p,p) + p*mod(j+1,p) | ||
| 88 | + | ||
| 89 | +c--------------------------------------------------------------------- | ||
| 90 | +c now compute the sizes of the cells | ||
| 91 | +c--------------------------------------------------------------------- | ||
| 92 | + do dir= 1, 3 | ||
| 93 | +c--------------------------------------------------------------------- | ||
| 94 | +c set cell_coord range for each direction | ||
| 95 | +c--------------------------------------------------------------------- | ||
| 96 | + size = grid_points(dir)/p | ||
| 97 | + excess = mod(grid_points(dir),p) | ||
| 98 | + do c=1, ncells | ||
| 99 | + if (cell_coord(dir,c) .le. excess) then | ||
| 100 | + cell_size(dir,c) = size+1 | ||
| 101 | + cell_low(dir,c) = (cell_coord(dir,c)-1)*(size+1) | ||
| 102 | + cell_high(dir,c) = cell_low(dir,c)+size | ||
| 103 | + else | ||
| 104 | + cell_size(dir,c) = size | ||
| 105 | + cell_low(dir,c) = excess*(size+1)+ | ||
| 106 | + > (cell_coord(dir,c)-excess-1)*size | ||
| 107 | + cell_high(dir,c) = cell_low(dir,c)+size-1 | ||
| 108 | + endif | ||
| 109 | + if (cell_size(dir, c) .le. 2) then | ||
| 110 | + write(*,50) | ||
| 111 | + 50 format(' Error: Cell size too small. Min size is 3') | ||
| 112 | + ierrcode = 1 | ||
| 113 | + call MPI_Abort(mpi_comm_world,ierrcode,ierr) | ||
| 114 | + stop | ||
| 115 | + endif | ||
| 116 | + end do | ||
| 117 | + end do | ||
| 118 | + | ||
| 119 | + return | ||
| 120 | + end | ||
| 121 | + | ||
| 122 | +c--------------------------------------------------------------------- | ||
| 123 | +c--------------------------------------------------------------------- | ||
| 124 | + | ||
| 125 | + |
NPB3.3-MPI/BT/mpinpb.h
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/mpinpb.h | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + include 'mpif.h' | ||
| 6 | + | ||
| 7 | + integer node, no_nodes, total_nodes, root, comm_setup, | ||
| 8 | + > comm_solve, comm_rhs, dp_type | ||
| 9 | + logical active | ||
| 10 | + common /mpistuff/ node, no_nodes, total_nodes, root, comm_setup, | ||
| 11 | + > comm_solve, comm_rhs, dp_type, active | ||
| 12 | + |
NPB3.3-MPI/BT/rhs.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/rhs.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine compute_rhs | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | + include 'header.h' | ||
| 10 | + | ||
| 11 | + integer c, i, j, k, m | ||
| 12 | + double precision rho_inv, uijk, up1, um1, vijk, vp1, vm1, | ||
| 13 | + > wijk, wp1, wm1 | ||
| 14 | + | ||
| 15 | + | ||
| 16 | + if (timeron) call timer_start(t_rhs) | ||
| 17 | +c--------------------------------------------------------------------- | ||
| 18 | +c loop over all cells owned by this node | ||
| 19 | +c--------------------------------------------------------------------- | ||
| 20 | + do c = 1, ncells | ||
| 21 | + | ||
| 22 | +c--------------------------------------------------------------------- | ||
| 23 | +c compute the reciprocal of density, and the kinetic energy, | ||
| 24 | +c and the speed of sound. | ||
| 25 | +c--------------------------------------------------------------------- | ||
| 26 | + do k = -1, cell_size(3,c) | ||
| 27 | + do j = -1, cell_size(2,c) | ||
| 28 | + do i = -1, cell_size(1,c) | ||
| 29 | + rho_inv = 1.0d0/u(1,i,j,k,c) | ||
| 30 | + rho_i(i,j,k,c) = rho_inv | ||
| 31 | + us(i,j,k,c) = u(2,i,j,k,c) * rho_inv | ||
| 32 | + vs(i,j,k,c) = u(3,i,j,k,c) * rho_inv | ||
| 33 | + ws(i,j,k,c) = u(4,i,j,k,c) * rho_inv | ||
| 34 | + square(i,j,k,c) = 0.5d0* ( | ||
| 35 | + > u(2,i,j,k,c)*u(2,i,j,k,c) + | ||
| 36 | + > u(3,i,j,k,c)*u(3,i,j,k,c) + | ||
| 37 | + > u(4,i,j,k,c)*u(4,i,j,k,c) ) * rho_inv | ||
| 38 | + qs(i,j,k,c) = square(i,j,k,c) * rho_inv | ||
| 39 | + enddo | ||
| 40 | + enddo | ||
| 41 | + enddo | ||
| 42 | + | ||
| 43 | +c--------------------------------------------------------------------- | ||
| 44 | +c copy the exact forcing term to the right hand side; because | ||
| 45 | +c this forcing term is known, we can store it on the whole of every | ||
| 46 | +c cell, including the boundary | ||
| 47 | +c--------------------------------------------------------------------- | ||
| 48 | + | ||
| 49 | + do k = 0, cell_size(3,c)-1 | ||
| 50 | + do j = 0, cell_size(2,c)-1 | ||
| 51 | + do i = 0, cell_size(1,c)-1 | ||
| 52 | + do m = 1, 5 | ||
| 53 | + rhs(m,i,j,k,c) = forcing(m,i,j,k,c) | ||
| 54 | + enddo | ||
| 55 | + enddo | ||
| 56 | + enddo | ||
| 57 | + enddo | ||
| 58 | + | ||
| 59 | + | ||
| 60 | +c--------------------------------------------------------------------- | ||
| 61 | +c compute xi-direction fluxes | ||
| 62 | +c--------------------------------------------------------------------- | ||
| 63 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 64 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 65 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 66 | + uijk = us(i,j,k,c) | ||
| 67 | + up1 = us(i+1,j,k,c) | ||
| 68 | + um1 = us(i-1,j,k,c) | ||
| 69 | + | ||
| 70 | + rhs(1,i,j,k,c) = rhs(1,i,j,k,c) + dx1tx1 * | ||
| 71 | + > (u(1,i+1,j,k,c) - 2.0d0*u(1,i,j,k,c) + | ||
| 72 | + > u(1,i-1,j,k,c)) - | ||
| 73 | + > tx2 * (u(2,i+1,j,k,c) - u(2,i-1,j,k,c)) | ||
| 74 | + | ||
| 75 | + rhs(2,i,j,k,c) = rhs(2,i,j,k,c) + dx2tx1 * | ||
| 76 | + > (u(2,i+1,j,k,c) - 2.0d0*u(2,i,j,k,c) + | ||
| 77 | + > u(2,i-1,j,k,c)) + | ||
| 78 | + > xxcon2*con43 * (up1 - 2.0d0*uijk + um1) - | ||
| 79 | + > tx2 * (u(2,i+1,j,k,c)*up1 - | ||
| 80 | + > u(2,i-1,j,k,c)*um1 + | ||
| 81 | + > (u(5,i+1,j,k,c)- square(i+1,j,k,c)- | ||
| 82 | + > u(5,i-1,j,k,c)+ square(i-1,j,k,c))* | ||
| 83 | + > c2) | ||
| 84 | + | ||
| 85 | + rhs(3,i,j,k,c) = rhs(3,i,j,k,c) + dx3tx1 * | ||
| 86 | + > (u(3,i+1,j,k,c) - 2.0d0*u(3,i,j,k,c) + | ||
| 87 | + > u(3,i-1,j,k,c)) + | ||
| 88 | + > xxcon2 * (vs(i+1,j,k,c) - 2.0d0*vs(i,j,k,c) + | ||
| 89 | + > vs(i-1,j,k,c)) - | ||
| 90 | + > tx2 * (u(3,i+1,j,k,c)*up1 - | ||
| 91 | + > u(3,i-1,j,k,c)*um1) | ||
| 92 | + | ||
| 93 | + rhs(4,i,j,k,c) = rhs(4,i,j,k,c) + dx4tx1 * | ||
| 94 | + > (u(4,i+1,j,k,c) - 2.0d0*u(4,i,j,k,c) + | ||
| 95 | + > u(4,i-1,j,k,c)) + | ||
| 96 | + > xxcon2 * (ws(i+1,j,k,c) - 2.0d0*ws(i,j,k,c) + | ||
| 97 | + > ws(i-1,j,k,c)) - | ||
| 98 | + > tx2 * (u(4,i+1,j,k,c)*up1 - | ||
| 99 | + > u(4,i-1,j,k,c)*um1) | ||
| 100 | + | ||
| 101 | + rhs(5,i,j,k,c) = rhs(5,i,j,k,c) + dx5tx1 * | ||
| 102 | + > (u(5,i+1,j,k,c) - 2.0d0*u(5,i,j,k,c) + | ||
| 103 | + > u(5,i-1,j,k,c)) + | ||
| 104 | + > xxcon3 * (qs(i+1,j,k,c) - 2.0d0*qs(i,j,k,c) + | ||
| 105 | + > qs(i-1,j,k,c)) + | ||
| 106 | + > xxcon4 * (up1*up1 - 2.0d0*uijk*uijk + | ||
| 107 | + > um1*um1) + | ||
| 108 | + > xxcon5 * (u(5,i+1,j,k,c)*rho_i(i+1,j,k,c) - | ||
| 109 | + > 2.0d0*u(5,i,j,k,c)*rho_i(i,j,k,c) + | ||
| 110 | + > u(5,i-1,j,k,c)*rho_i(i-1,j,k,c)) - | ||
| 111 | + > tx2 * ( (c1*u(5,i+1,j,k,c) - | ||
| 112 | + > c2*square(i+1,j,k,c))*up1 - | ||
| 113 | + > (c1*u(5,i-1,j,k,c) - | ||
| 114 | + > c2*square(i-1,j,k,c))*um1 ) | ||
| 115 | + enddo | ||
| 116 | + enddo | ||
| 117 | + enddo | ||
| 118 | + | ||
| 119 | +c--------------------------------------------------------------------- | ||
| 120 | +c add fourth order xi-direction dissipation | ||
| 121 | +c--------------------------------------------------------------------- | ||
| 122 | + if (start(1,c) .gt. 0) then | ||
| 123 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 124 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 125 | + i = 1 | ||
| 126 | + do m = 1, 5 | ||
| 127 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c)- dssp * | ||
| 128 | + > ( 5.0d0*u(m,i,j,k,c) - 4.0d0*u(m,i+1,j,k,c) + | ||
| 129 | + > u(m,i+2,j,k,c)) | ||
| 130 | + enddo | ||
| 131 | + | ||
| 132 | + i = 2 | ||
| 133 | + do m = 1, 5 | ||
| 134 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 135 | + > (-4.0d0*u(m,i-1,j,k,c) + 6.0d0*u(m,i,j,k,c) - | ||
| 136 | + > 4.0d0*u(m,i+1,j,k,c) + u(m,i+2,j,k,c)) | ||
| 137 | + enddo | ||
| 138 | + enddo | ||
| 139 | + enddo | ||
| 140 | + endif | ||
| 141 | + | ||
| 142 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 143 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 144 | + do i = 3*start(1,c),cell_size(1,c)-3*end(1,c)-1 | ||
| 145 | + do m = 1, 5 | ||
| 146 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 147 | + > ( u(m,i-2,j,k,c) - 4.0d0*u(m,i-1,j,k,c) + | ||
| 148 | + > 6.0*u(m,i,j,k,c) - 4.0d0*u(m,i+1,j,k,c) + | ||
| 149 | + > u(m,i+2,j,k,c) ) | ||
| 150 | + enddo | ||
| 151 | + enddo | ||
| 152 | + enddo | ||
| 153 | + enddo | ||
| 154 | + | ||
| 155 | + | ||
| 156 | + if (end(1,c) .gt. 0) then | ||
| 157 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 158 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 159 | + i = cell_size(1,c)-3 | ||
| 160 | + do m = 1, 5 | ||
| 161 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 162 | + > ( u(m,i-2,j,k,c) - 4.0d0*u(m,i-1,j,k,c) + | ||
| 163 | + > 6.0d0*u(m,i,j,k,c) - 4.0d0*u(m,i+1,j,k,c) ) | ||
| 164 | + enddo | ||
| 165 | + | ||
| 166 | + i = cell_size(1,c)-2 | ||
| 167 | + do m = 1, 5 | ||
| 168 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 169 | + > ( u(m,i-2,j,k,c) - 4.d0*u(m,i-1,j,k,c) + | ||
| 170 | + > 5.d0*u(m,i,j,k,c) ) | ||
| 171 | + enddo | ||
| 172 | + enddo | ||
| 173 | + enddo | ||
| 174 | + endif | ||
| 175 | + | ||
| 176 | +c--------------------------------------------------------------------- | ||
| 177 | +c compute eta-direction fluxes | ||
| 178 | +c--------------------------------------------------------------------- | ||
| 179 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 180 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 181 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 182 | + vijk = vs(i,j,k,c) | ||
| 183 | + vp1 = vs(i,j+1,k,c) | ||
| 184 | + vm1 = vs(i,j-1,k,c) | ||
| 185 | + rhs(1,i,j,k,c) = rhs(1,i,j,k,c) + dy1ty1 * | ||
| 186 | + > (u(1,i,j+1,k,c) - 2.0d0*u(1,i,j,k,c) + | ||
| 187 | + > u(1,i,j-1,k,c)) - | ||
| 188 | + > ty2 * (u(3,i,j+1,k,c) - u(3,i,j-1,k,c)) | ||
| 189 | + rhs(2,i,j,k,c) = rhs(2,i,j,k,c) + dy2ty1 * | ||
| 190 | + > (u(2,i,j+1,k,c) - 2.0d0*u(2,i,j,k,c) + | ||
| 191 | + > u(2,i,j-1,k,c)) + | ||
| 192 | + > yycon2 * (us(i,j+1,k,c) - 2.0d0*us(i,j,k,c) + | ||
| 193 | + > us(i,j-1,k,c)) - | ||
| 194 | + > ty2 * (u(2,i,j+1,k,c)*vp1 - | ||
| 195 | + > u(2,i,j-1,k,c)*vm1) | ||
| 196 | + rhs(3,i,j,k,c) = rhs(3,i,j,k,c) + dy3ty1 * | ||
| 197 | + > (u(3,i,j+1,k,c) - 2.0d0*u(3,i,j,k,c) + | ||
| 198 | + > u(3,i,j-1,k,c)) + | ||
| 199 | + > yycon2*con43 * (vp1 - 2.0d0*vijk + vm1) - | ||
| 200 | + > ty2 * (u(3,i,j+1,k,c)*vp1 - | ||
| 201 | + > u(3,i,j-1,k,c)*vm1 + | ||
| 202 | + > (u(5,i,j+1,k,c) - square(i,j+1,k,c) - | ||
| 203 | + > u(5,i,j-1,k,c) + square(i,j-1,k,c)) | ||
| 204 | + > *c2) | ||
| 205 | + rhs(4,i,j,k,c) = rhs(4,i,j,k,c) + dy4ty1 * | ||
| 206 | + > (u(4,i,j+1,k,c) - 2.0d0*u(4,i,j,k,c) + | ||
| 207 | + > u(4,i,j-1,k,c)) + | ||
| 208 | + > yycon2 * (ws(i,j+1,k,c) - 2.0d0*ws(i,j,k,c) + | ||
| 209 | + > ws(i,j-1,k,c)) - | ||
| 210 | + > ty2 * (u(4,i,j+1,k,c)*vp1 - | ||
| 211 | + > u(4,i,j-1,k,c)*vm1) | ||
| 212 | + rhs(5,i,j,k,c) = rhs(5,i,j,k,c) + dy5ty1 * | ||
| 213 | + > (u(5,i,j+1,k,c) - 2.0d0*u(5,i,j,k,c) + | ||
| 214 | + > u(5,i,j-1,k,c)) + | ||
| 215 | + > yycon3 * (qs(i,j+1,k,c) - 2.0d0*qs(i,j,k,c) + | ||
| 216 | + > qs(i,j-1,k,c)) + | ||
| 217 | + > yycon4 * (vp1*vp1 - 2.0d0*vijk*vijk + | ||
| 218 | + > vm1*vm1) + | ||
| 219 | + > yycon5 * (u(5,i,j+1,k,c)*rho_i(i,j+1,k,c) - | ||
| 220 | + > 2.0d0*u(5,i,j,k,c)*rho_i(i,j,k,c) + | ||
| 221 | + > u(5,i,j-1,k,c)*rho_i(i,j-1,k,c)) - | ||
| 222 | + > ty2 * ((c1*u(5,i,j+1,k,c) - | ||
| 223 | + > c2*square(i,j+1,k,c)) * vp1 - | ||
| 224 | + > (c1*u(5,i,j-1,k,c) - | ||
| 225 | + > c2*square(i,j-1,k,c)) * vm1) | ||
| 226 | + enddo | ||
| 227 | + enddo | ||
| 228 | + enddo | ||
| 229 | + | ||
| 230 | +c--------------------------------------------------------------------- | ||
| 231 | +c add fourth order eta-direction dissipation | ||
| 232 | +c--------------------------------------------------------------------- | ||
| 233 | + if (start(2,c) .gt. 0) then | ||
| 234 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 235 | + j = 1 | ||
| 236 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 237 | + do m = 1, 5 | ||
| 238 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c)- dssp * | ||
| 239 | + > ( 5.0d0*u(m,i,j,k,c) - 4.0d0*u(m,i,j+1,k,c) + | ||
| 240 | + > u(m,i,j+2,k,c)) | ||
| 241 | + enddo | ||
| 242 | + enddo | ||
| 243 | + | ||
| 244 | + j = 2 | ||
| 245 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 246 | + do m = 1, 5 | ||
| 247 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 248 | + > (-4.0d0*u(m,i,j-1,k,c) + 6.0d0*u(m,i,j,k,c) - | ||
| 249 | + > 4.0d0*u(m,i,j+1,k,c) + u(m,i,j+2,k,c)) | ||
| 250 | + enddo | ||
| 251 | + enddo | ||
| 252 | + enddo | ||
| 253 | + endif | ||
| 254 | + | ||
| 255 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 256 | + do j = 3*start(2,c), cell_size(2,c)-3*end(2,c)-1 | ||
| 257 | + do i = start(1,c),cell_size(1,c)-end(1,c)-1 | ||
| 258 | + do m = 1, 5 | ||
| 259 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 260 | + > ( u(m,i,j-2,k,c) - 4.0d0*u(m,i,j-1,k,c) + | ||
| 261 | + > 6.0*u(m,i,j,k,c) - 4.0d0*u(m,i,j+1,k,c) + | ||
| 262 | + > u(m,i,j+2,k,c) ) | ||
| 263 | + enddo | ||
| 264 | + enddo | ||
| 265 | + enddo | ||
| 266 | + enddo | ||
| 267 | + | ||
| 268 | + if (end(2,c) .gt. 0) then | ||
| 269 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 270 | + j = cell_size(2,c)-3 | ||
| 271 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 272 | + do m = 1, 5 | ||
| 273 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 274 | + > ( u(m,i,j-2,k,c) - 4.0d0*u(m,i,j-1,k,c) + | ||
| 275 | + > 6.0d0*u(m,i,j,k,c) - 4.0d0*u(m,i,j+1,k,c) ) | ||
| 276 | + enddo | ||
| 277 | + enddo | ||
| 278 | + | ||
| 279 | + j = cell_size(2,c)-2 | ||
| 280 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 281 | + do m = 1, 5 | ||
| 282 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 283 | + > ( u(m,i,j-2,k,c) - 4.d0*u(m,i,j-1,k,c) + | ||
| 284 | + > 5.d0*u(m,i,j,k,c) ) | ||
| 285 | + enddo | ||
| 286 | + enddo | ||
| 287 | + enddo | ||
| 288 | + endif | ||
| 289 | + | ||
| 290 | +c--------------------------------------------------------------------- | ||
| 291 | +c compute zeta-direction fluxes | ||
| 292 | +c--------------------------------------------------------------------- | ||
| 293 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 294 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 295 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 296 | + wijk = ws(i,j,k,c) | ||
| 297 | + wp1 = ws(i,j,k+1,c) | ||
| 298 | + wm1 = ws(i,j,k-1,c) | ||
| 299 | + | ||
| 300 | + rhs(1,i,j,k,c) = rhs(1,i,j,k,c) + dz1tz1 * | ||
| 301 | + > (u(1,i,j,k+1,c) - 2.0d0*u(1,i,j,k,c) + | ||
| 302 | + > u(1,i,j,k-1,c)) - | ||
| 303 | + > tz2 * (u(4,i,j,k+1,c) - u(4,i,j,k-1,c)) | ||
| 304 | + rhs(2,i,j,k,c) = rhs(2,i,j,k,c) + dz2tz1 * | ||
| 305 | + > (u(2,i,j,k+1,c) - 2.0d0*u(2,i,j,k,c) + | ||
| 306 | + > u(2,i,j,k-1,c)) + | ||
| 307 | + > zzcon2 * (us(i,j,k+1,c) - 2.0d0*us(i,j,k,c) + | ||
| 308 | + > us(i,j,k-1,c)) - | ||
| 309 | + > tz2 * (u(2,i,j,k+1,c)*wp1 - | ||
| 310 | + > u(2,i,j,k-1,c)*wm1) | ||
| 311 | + rhs(3,i,j,k,c) = rhs(3,i,j,k,c) + dz3tz1 * | ||
| 312 | + > (u(3,i,j,k+1,c) - 2.0d0*u(3,i,j,k,c) + | ||
| 313 | + > u(3,i,j,k-1,c)) + | ||
| 314 | + > zzcon2 * (vs(i,j,k+1,c) - 2.0d0*vs(i,j,k,c) + | ||
| 315 | + > vs(i,j,k-1,c)) - | ||
| 316 | + > tz2 * (u(3,i,j,k+1,c)*wp1 - | ||
| 317 | + > u(3,i,j,k-1,c)*wm1) | ||
| 318 | + rhs(4,i,j,k,c) = rhs(4,i,j,k,c) + dz4tz1 * | ||
| 319 | + > (u(4,i,j,k+1,c) - 2.0d0*u(4,i,j,k,c) + | ||
| 320 | + > u(4,i,j,k-1,c)) + | ||
| 321 | + > zzcon2*con43 * (wp1 - 2.0d0*wijk + wm1) - | ||
| 322 | + > tz2 * (u(4,i,j,k+1,c)*wp1 - | ||
| 323 | + > u(4,i,j,k-1,c)*wm1 + | ||
| 324 | + > (u(5,i,j,k+1,c) - square(i,j,k+1,c) - | ||
| 325 | + > u(5,i,j,k-1,c) + square(i,j,k-1,c)) | ||
| 326 | + > *c2) | ||
| 327 | + rhs(5,i,j,k,c) = rhs(5,i,j,k,c) + dz5tz1 * | ||
| 328 | + > (u(5,i,j,k+1,c) - 2.0d0*u(5,i,j,k,c) + | ||
| 329 | + > u(5,i,j,k-1,c)) + | ||
| 330 | + > zzcon3 * (qs(i,j,k+1,c) - 2.0d0*qs(i,j,k,c) + | ||
| 331 | + > qs(i,j,k-1,c)) + | ||
| 332 | + > zzcon4 * (wp1*wp1 - 2.0d0*wijk*wijk + | ||
| 333 | + > wm1*wm1) + | ||
| 334 | + > zzcon5 * (u(5,i,j,k+1,c)*rho_i(i,j,k+1,c) - | ||
| 335 | + > 2.0d0*u(5,i,j,k,c)*rho_i(i,j,k,c) + | ||
| 336 | + > u(5,i,j,k-1,c)*rho_i(i,j,k-1,c)) - | ||
| 337 | + > tz2 * ( (c1*u(5,i,j,k+1,c) - | ||
| 338 | + > c2*square(i,j,k+1,c))*wp1 - | ||
| 339 | + > (c1*u(5,i,j,k-1,c) - | ||
| 340 | + > c2*square(i,j,k-1,c))*wm1) | ||
| 341 | + enddo | ||
| 342 | + enddo | ||
| 343 | + enddo | ||
| 344 | + | ||
| 345 | +c--------------------------------------------------------------------- | ||
| 346 | +c add fourth order zeta-direction dissipation | ||
| 347 | +c--------------------------------------------------------------------- | ||
| 348 | + if (start(3,c) .gt. 0) then | ||
| 349 | + k = 1 | ||
| 350 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 351 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 352 | + do m = 1, 5 | ||
| 353 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c)- dssp * | ||
| 354 | + > ( 5.0d0*u(m,i,j,k,c) - 4.0d0*u(m,i,j,k+1,c) + | ||
| 355 | + > u(m,i,j,k+2,c)) | ||
| 356 | + enddo | ||
| 357 | + enddo | ||
| 358 | + enddo | ||
| 359 | + | ||
| 360 | + k = 2 | ||
| 361 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 362 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 363 | + do m = 1, 5 | ||
| 364 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 365 | + > (-4.0d0*u(m,i,j,k-1,c) + 6.0d0*u(m,i,j,k,c) - | ||
| 366 | + > 4.0d0*u(m,i,j,k+1,c) + u(m,i,j,k+2,c)) | ||
| 367 | + enddo | ||
| 368 | + enddo | ||
| 369 | + enddo | ||
| 370 | + endif | ||
| 371 | + | ||
| 372 | + do k = 3*start(3,c), cell_size(3,c)-3*end(3,c)-1 | ||
| 373 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 374 | + do i = start(1,c),cell_size(1,c)-end(1,c)-1 | ||
| 375 | + do m = 1, 5 | ||
| 376 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 377 | + > ( u(m,i,j,k-2,c) - 4.0d0*u(m,i,j,k-1,c) + | ||
| 378 | + > 6.0*u(m,i,j,k,c) - 4.0d0*u(m,i,j,k+1,c) + | ||
| 379 | + > u(m,i,j,k+2,c) ) | ||
| 380 | + enddo | ||
| 381 | + enddo | ||
| 382 | + enddo | ||
| 383 | + enddo | ||
| 384 | + | ||
| 385 | + if (end(3,c) .gt. 0) then | ||
| 386 | + k = cell_size(3,c)-3 | ||
| 387 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 388 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 389 | + do m = 1, 5 | ||
| 390 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 391 | + > ( u(m,i,j,k-2,c) - 4.0d0*u(m,i,j,k-1,c) + | ||
| 392 | + > 6.0d0*u(m,i,j,k,c) - 4.0d0*u(m,i,j,k+1,c) ) | ||
| 393 | + enddo | ||
| 394 | + enddo | ||
| 395 | + enddo | ||
| 396 | + | ||
| 397 | + k = cell_size(3,c)-2 | ||
| 398 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 399 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 400 | + do m = 1, 5 | ||
| 401 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) - dssp * | ||
| 402 | + > ( u(m,i,j,k-2,c) - 4.d0*u(m,i,j,k-1,c) + | ||
| 403 | + > 5.d0*u(m,i,j,k,c) ) | ||
| 404 | + enddo | ||
| 405 | + enddo | ||
| 406 | + enddo | ||
| 407 | + endif | ||
| 408 | + | ||
| 409 | + do k = start(3,c), cell_size(3,c)-end(3,c)-1 | ||
| 410 | + do j = start(2,c), cell_size(2,c)-end(2,c)-1 | ||
| 411 | + do i = start(1,c), cell_size(1,c)-end(1,c)-1 | ||
| 412 | + do m = 1, 5 | ||
| 413 | + rhs(m,i,j,k,c) = rhs(m,i,j,k,c) * dt | ||
| 414 | + enddo | ||
| 415 | + enddo | ||
| 416 | + enddo | ||
| 417 | + enddo | ||
| 418 | + | ||
| 419 | + enddo | ||
| 420 | + | ||
| 421 | + if (timeron) call timer_stop(t_rhs) | ||
| 422 | + | ||
| 423 | + return | ||
| 424 | + end | ||
| 425 | + | ||
| 426 | + | ||
| 427 | + | ||
| 428 | + |
NPB3.3-MPI/BT/set_constants.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/set_constants.f | ||
| 1 | +c--------------------------------------------------------------------- | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | + | ||
| 4 | + subroutine set_constants | ||
| 5 | + | ||
| 6 | +c--------------------------------------------------------------------- | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | + | ||
| 9 | + include 'header.h' | ||
| 10 | + | ||
| 11 | + ce(1,1) = 2.0d0 | ||
| 12 | + ce(1,2) = 0.0d0 | ||
| 13 | + ce(1,3) = 0.0d0 | ||
| 14 | + ce(1,4) = 4.0d0 | ||
| 15 | + ce(1,5) = 5.0d0 | ||
| 16 | + ce(1,6) = 3.0d0 | ||
| 17 | + ce(1,7) = 0.5d0 | ||
| 18 | + ce(1,8) = 0.02d0 | ||
| 19 | + ce(1,9) = 0.01d0 | ||
| 20 | + ce(1,10) = 0.03d0 | ||
| 21 | + ce(1,11) = 0.5d0 | ||
| 22 | + ce(1,12) = 0.4d0 | ||
| 23 | + ce(1,13) = 0.3d0 | ||
| 24 | + | ||
| 25 | + ce(2,1) = 1.0d0 | ||
| 26 | + ce(2,2) = 0.0d0 | ||
| 27 | + ce(2,3) = 0.0d0 | ||
| 28 | + ce(2,4) = 0.0d0 | ||
| 29 | + ce(2,5) = 1.0d0 | ||
| 30 | + ce(2,6) = 2.0d0 | ||
| 31 | + ce(2,7) = 3.0d0 | ||
| 32 | + ce(2,8) = 0.01d0 | ||
| 33 | + ce(2,9) = 0.03d0 | ||
| 34 | + ce(2,10) = 0.02d0 | ||
| 35 | + ce(2,11) = 0.4d0 | ||
| 36 | + ce(2,12) = 0.3d0 | ||
| 37 | + ce(2,13) = 0.5d0 | ||
| 38 | + | ||
| 39 | + ce(3,1) = 2.0d0 | ||
| 40 | + ce(3,2) = 2.0d0 | ||
| 41 | + ce(3,3) = 0.0d0 | ||
| 42 | + ce(3,4) = 0.0d0 | ||
| 43 | + ce(3,5) = 0.0d0 | ||
| 44 | + ce(3,6) = 2.0d0 | ||
| 45 | + ce(3,7) = 3.0d0 | ||
| 46 | + ce(3,8) = 0.04d0 | ||
| 47 | + ce(3,9) = 0.03d0 | ||
| 48 | + ce(3,10) = 0.05d0 | ||
| 49 | + ce(3,11) = 0.3d0 | ||
| 50 | + ce(3,12) = 0.5d0 | ||
| 51 | + ce(3,13) = 0.4d0 | ||
| 52 | + | ||
| 53 | + ce(4,1) = 2.0d0 | ||
| 54 | + ce(4,2) = 2.0d0 | ||
| 55 | + ce(4,3) = 0.0d0 | ||
| 56 | + ce(4,4) = 0.0d0 | ||
| 57 | + ce(4,5) = 0.0d0 | ||
| 58 | + ce(4,6) = 2.0d0 | ||
| 59 | + ce(4,7) = 3.0d0 | ||
| 60 | + ce(4,8) = 0.03d0 | ||
| 61 | + ce(4,9) = 0.05d0 | ||
| 62 | + ce(4,10) = 0.04d0 | ||
| 63 | + ce(4,11) = 0.2d0 | ||
| 64 | + ce(4,12) = 0.1d0 | ||
| 65 | + ce(4,13) = 0.3d0 | ||
| 66 | + | ||
| 67 | + ce(5,1) = 5.0d0 | ||
| 68 | + ce(5,2) = 4.0d0 | ||
| 69 | + ce(5,3) = 3.0d0 | ||
| 70 | + ce(5,4) = 2.0d0 | ||
| 71 | + ce(5,5) = 0.1d0 | ||
| 72 | + ce(5,6) = 0.4d0 | ||
| 73 | + ce(5,7) = 0.3d0 | ||
| 74 | + ce(5,8) = 0.05d0 | ||
| 75 | + ce(5,9) = 0.04d0 | ||
| 76 | + ce(5,10) = 0.03d0 | ||
| 77 | + ce(5,11) = 0.1d0 | ||
| 78 | + ce(5,12) = 0.3d0 | ||
| 79 | + ce(5,13) = 0.2d0 | ||
| 80 | + | ||
| 81 | + c1 = 1.4d0 | ||
| 82 | + c2 = 0.4d0 | ||
| 83 | + c3 = 0.1d0 | ||
| 84 | + c4 = 1.0d0 | ||
| 85 | + c5 = 1.4d0 | ||
| 86 | + | ||
| 87 | + bt = dsqrt(0.5d0) | ||
| 88 | + | ||
| 89 | + dnxm1 = 1.0d0 / dble(grid_points(1)-1) | ||
| 90 | + dnym1 = 1.0d0 / dble(grid_points(2)-1) | ||
| 91 | + dnzm1 = 1.0d0 / dble(grid_points(3)-1) | ||
| 92 | + | ||
| 93 | + c1c2 = c1 * c2 | ||
| 94 | + c1c5 = c1 * c5 | ||
| 95 | + c3c4 = c3 * c4 | ||
| 96 | + c1345 = c1c5 * c3c4 | ||
| 97 | + | ||
| 98 | + conz1 = (1.0d0-c1c5) | ||
| 99 | + | ||
| 100 | + tx1 = 1.0d0 / (dnxm1 * dnxm1) | ||
| 101 | + tx2 = 1.0d0 / (2.0d0 * dnxm1) | ||
| 102 | + tx3 = 1.0d0 / dnxm1 | ||
| 103 | + | ||
| 104 | + ty1 = 1.0d0 / (dnym1 * dnym1) | ||
| 105 | + ty2 = 1.0d0 / (2.0d0 * dnym1) | ||
| 106 | + ty3 = 1.0d0 / dnym1 | ||
| 107 | + | ||
| 108 | + tz1 = 1.0d0 / (dnzm1 * dnzm1) | ||
| 109 | + tz2 = 1.0d0 / (2.0d0 * dnzm1) | ||
| 110 | + tz3 = 1.0d0 / dnzm1 | ||
| 111 | + | ||
| 112 | + dx1 = 0.75d0 | ||
| 113 | + dx2 = 0.75d0 | ||
| 114 | + dx3 = 0.75d0 | ||
| 115 | + dx4 = 0.75d0 | ||
| 116 | + dx5 = 0.75d0 | ||
| 117 | + | ||
| 118 | + dy1 = 0.75d0 | ||
| 119 | + dy2 = 0.75d0 | ||
| 120 | + dy3 = 0.75d0 | ||
| 121 | + dy4 = 0.75d0 | ||
| 122 | + dy5 = 0.75d0 | ||
| 123 | + | ||
| 124 | + dz1 = 1.0d0 | ||
| 125 | + dz2 = 1.0d0 | ||
| 126 | + dz3 = 1.0d0 | ||
| 127 | + dz4 = 1.0d0 | ||
| 128 | + dz5 = 1.0d0 | ||
| 129 | + | ||
| 130 | + dxmax = dmax1(dx3, dx4) | ||
| 131 | + dymax = dmax1(dy2, dy4) | ||
| 132 | + dzmax = dmax1(dz2, dz3) | ||
| 133 | + | ||
| 134 | + dssp = 0.25d0 * dmax1(dx1, dmax1(dy1, dz1) ) | ||
| 135 | + | ||
| 136 | + c4dssp = 4.0d0 * dssp | ||
| 137 | + c5dssp = 5.0d0 * dssp | ||
| 138 | + | ||
| 139 | + dttx1 = dt*tx1 | ||
| 140 | + dttx2 = dt*tx2 | ||
| 141 | + dtty1 = dt*ty1 | ||
| 142 | + dtty2 = dt*ty2 | ||
| 143 | + dttz1 = dt*tz1 | ||
| 144 | + dttz2 = dt*tz2 | ||
| 145 | + | ||
| 146 | + c2dttx1 = 2.0d0*dttx1 | ||
| 147 | + c2dtty1 = 2.0d0*dtty1 | ||
| 148 | + c2dttz1 = 2.0d0*dttz1 | ||
| 149 | + | ||
| 150 | + dtdssp = dt*dssp | ||
| 151 | + | ||
| 152 | + comz1 = dtdssp | ||
| 153 | + comz4 = 4.0d0*dtdssp | ||
| 154 | + comz5 = 5.0d0*dtdssp | ||
| 155 | + comz6 = 6.0d0*dtdssp | ||
| 156 | + | ||
| 157 | + c3c4tx3 = c3c4*tx3 | ||
| 158 | + c3c4ty3 = c3c4*ty3 | ||
| 159 | + c3c4tz3 = c3c4*tz3 | ||
| 160 | + | ||
| 161 | + dx1tx1 = dx1*tx1 | ||
| 162 | + dx2tx1 = dx2*tx1 | ||
| 163 | + dx3tx1 = dx3*tx1 | ||
| 164 | + dx4tx1 = dx4*tx1 | ||
| 165 | + dx5tx1 = dx5*tx1 | ||
| 166 | + | ||
| 167 | + dy1ty1 = dy1*ty1 | ||
| 168 | + dy2ty1 = dy2*ty1 | ||
| 169 | + dy3ty1 = dy3*ty1 | ||
| 170 | + dy4ty1 = dy4*ty1 | ||
| 171 | + dy5ty1 = dy5*ty1 | ||
| 172 | + | ||
| 173 | + dz1tz1 = dz1*tz1 | ||
| 174 | + dz2tz1 = dz2*tz1 | ||
| 175 | + dz3tz1 = dz3*tz1 | ||
| 176 | + dz4tz1 = dz4*tz1 | ||
| 177 | + dz5tz1 = dz5*tz1 | ||
| 178 | + | ||
| 179 | + c2iv = 2.5d0 | ||
| 180 | + con43 = 4.0d0/3.0d0 | ||
| 181 | + con16 = 1.0d0/6.0d0 | ||
| 182 | + | ||
| 183 | + xxcon1 = c3c4tx3*con43*tx3 | ||
| 184 | + xxcon2 = c3c4tx3*tx3 | ||
| 185 | + xxcon3 = c3c4tx3*conz1*tx3 | ||
| 186 | + xxcon4 = c3c4tx3*con16*tx3 | ||
| 187 | + xxcon5 = c3c4tx3*c1c5*tx3 | ||
| 188 | + | ||
| 189 | + yycon1 = c3c4ty3*con43*ty3 | ||
| 190 | + yycon2 = c3c4ty3*ty3 | ||
| 191 | + yycon3 = c3c4ty3*conz1*ty3 | ||
| 192 | + yycon4 = c3c4ty3*con16*ty3 | ||
| 193 | + yycon5 = c3c4ty3*c1c5*ty3 | ||
| 194 | + | ||
| 195 | + zzcon1 = c3c4tz3*con43*tz3 | ||
| 196 | + zzcon2 = c3c4tz3*tz3 | ||
| 197 | + zzcon3 = c3c4tz3*conz1*tz3 | ||
| 198 | + zzcon4 = c3c4tz3*con16*tz3 | ||
| 199 | + zzcon5 = c3c4tz3*c1c5*tz3 | ||
| 200 | + | ||
| 201 | + return | ||
| 202 | + end |
NPB3.3-MPI/BT/setup_mpi.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/setup_mpi.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine setup_mpi | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | +c--------------------------------------------------------------------- | ||
| 11 | +c set up MPI stuff | ||
| 12 | +c--------------------------------------------------------------------- | ||
| 13 | + | ||
| 14 | + implicit none | ||
| 15 | + include 'mpinpb.h' | ||
| 16 | + include 'npbparams.h' | ||
| 17 | + integer error, color, nc | ||
| 18 | + | ||
| 19 | + call mpi_init(error) | ||
| 20 | + | ||
| 21 | + call mpi_comm_size(MPI_COMM_WORLD, total_nodes, error) | ||
| 22 | + call mpi_comm_rank(MPI_COMM_WORLD, node, error) | ||
| 23 | + | ||
| 24 | + if (.not. convertdouble) then | ||
| 25 | + dp_type = MPI_DOUBLE_PRECISION | ||
| 26 | + else | ||
| 27 | + dp_type = MPI_REAL | ||
| 28 | + endif | ||
| 29 | + | ||
| 30 | +c--------------------------------------------------------------------- | ||
| 31 | +c compute square root; add small number to allow for roundoff | ||
| 32 | +c--------------------------------------------------------------------- | ||
| 33 | + nc = dint(dsqrt(dble(total_nodes) + 0.00001d0)) | ||
| 34 | + | ||
| 35 | +c--------------------------------------------------------------------- | ||
| 36 | +c We handle a non-square number of nodes by making the excess nodes | ||
| 37 | +c inactive. However, we can never handle more cells than were compiled | ||
| 38 | +c in. | ||
| 39 | +c--------------------------------------------------------------------- | ||
| 40 | + | ||
| 41 | + if (nc .gt. maxcells) nc = maxcells | ||
| 42 | + if (node .ge. nc*nc) then | ||
| 43 | + active = .false. | ||
| 44 | + color = 1 | ||
| 45 | + else | ||
| 46 | + active = .true. | ||
| 47 | + color = 0 | ||
| 48 | + end if | ||
| 49 | + | ||
| 50 | + call mpi_comm_split(MPI_COMM_WORLD,color,node,comm_setup,error) | ||
| 51 | + if (.not. active) return | ||
| 52 | + | ||
| 53 | + call mpi_comm_size(comm_setup, no_nodes, error) | ||
| 54 | + call mpi_comm_dup(comm_setup, comm_solve, error) | ||
| 55 | + call mpi_comm_dup(comm_setup, comm_rhs, error) | ||
| 56 | + | ||
| 57 | +c--------------------------------------------------------------------- | ||
| 58 | +c let node 0 be the root for the group (there is only one) | ||
| 59 | +c--------------------------------------------------------------------- | ||
| 60 | + root = 0 | ||
| 61 | + | ||
| 62 | + return | ||
| 63 | + end | ||
| 64 | + |
NPB3.3-MPI/BT/simple_mpiio.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/simple_mpiio.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine setup_btio | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | + include 'header.h' | ||
| 11 | + include 'mpinpb.h' | ||
| 12 | + | ||
| 13 | + integer m, ierr | ||
| 14 | + | ||
| 15 | + iseek=0 | ||
| 16 | + | ||
| 17 | + if (node .eq. root) then | ||
| 18 | + call MPI_File_delete(filenm, MPI_INFO_NULL, ierr) | ||
| 19 | + endif | ||
| 20 | + | ||
| 21 | + call MPI_Barrier(comm_solve, ierr) | ||
| 22 | + | ||
| 23 | + call MPI_File_open(comm_solve, | ||
| 24 | + $ filenm, | ||
| 25 | + $ MPI_MODE_RDWR + MPI_MODE_CREATE, | ||
| 26 | + $ MPI_INFO_NULL, | ||
| 27 | + $ fp, | ||
| 28 | + $ ierr) | ||
| 29 | + | ||
| 30 | + call MPI_File_set_view(fp, | ||
| 31 | + $ iseek, MPI_DOUBLE_PRECISION, MPI_DOUBLE_PRECISION, | ||
| 32 | + $ 'native', MPI_INFO_NULL, ierr) | ||
| 33 | + | ||
| 34 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 35 | + print *, 'Error opening file' | ||
| 36 | + stop | ||
| 37 | + endif | ||
| 38 | + | ||
| 39 | + do m = 1, 5 | ||
| 40 | + xce_sub(m) = 0.d0 | ||
| 41 | + end do | ||
| 42 | + | ||
| 43 | + idump_sub = 0 | ||
| 44 | + | ||
| 45 | + return | ||
| 46 | + end | ||
| 47 | + | ||
| 48 | +c--------------------------------------------------------------------- | ||
| 49 | +c--------------------------------------------------------------------- | ||
| 50 | + | ||
| 51 | + subroutine output_timestep | ||
| 52 | + | ||
| 53 | +c--------------------------------------------------------------------- | ||
| 54 | +c--------------------------------------------------------------------- | ||
| 55 | + include 'header.h' | ||
| 56 | + include 'mpinpb.h' | ||
| 57 | + | ||
| 58 | + integer count, jio, kio, cio, aio | ||
| 59 | + integer ierr | ||
| 60 | + integer mstatus(MPI_STATUS_SIZE) | ||
| 61 | + | ||
| 62 | + do cio=1,ncells | ||
| 63 | + do kio=0, cell_size(3,cio)-1 | ||
| 64 | + do jio=0, cell_size(2,cio)-1 | ||
| 65 | + iseek=5*(cell_low(1,cio) + | ||
| 66 | + $ PROBLEM_SIZE*((cell_low(2,cio)+jio) + | ||
| 67 | + $ PROBLEM_SIZE*((cell_low(3,cio)+kio) + | ||
| 68 | + $ PROBLEM_SIZE*idump_sub))) | ||
| 69 | + | ||
| 70 | + count=5*cell_size(1,cio) | ||
| 71 | + | ||
| 72 | + call MPI_File_write_at(fp, iseek, | ||
| 73 | + $ u(1,0,jio,kio,cio), | ||
| 74 | + $ count, MPI_DOUBLE_PRECISION, | ||
| 75 | + $ mstatus, ierr) | ||
| 76 | + | ||
| 77 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 78 | + print *, 'Error writing to file' | ||
| 79 | + stop | ||
| 80 | + endif | ||
| 81 | + enddo | ||
| 82 | + enddo | ||
| 83 | + enddo | ||
| 84 | + | ||
| 85 | + idump_sub = idump_sub + 1 | ||
| 86 | + if (rd_interval .gt. 0) then | ||
| 87 | + if (idump_sub .ge. rd_interval) then | ||
| 88 | + | ||
| 89 | + call acc_sub_norms(idump+1) | ||
| 90 | + | ||
| 91 | + idump_sub = 0 | ||
| 92 | + endif | ||
| 93 | + endif | ||
| 94 | + | ||
| 95 | + return | ||
| 96 | + end | ||
| 97 | + | ||
| 98 | +c--------------------------------------------------------------------- | ||
| 99 | +c--------------------------------------------------------------------- | ||
| 100 | + | ||
| 101 | + subroutine acc_sub_norms(idump_cur) | ||
| 102 | + | ||
| 103 | + include 'header.h' | ||
| 104 | + include 'mpinpb.h' | ||
| 105 | + | ||
| 106 | + integer idump_cur | ||
| 107 | + | ||
| 108 | + integer count, jio, kio, cio, ii, m, ichunk | ||
| 109 | + integer ierr | ||
| 110 | + integer mstatus(MPI_STATUS_SIZE) | ||
| 111 | + double precision xce_single(5) | ||
| 112 | + | ||
| 113 | + ichunk = idump_cur - idump_sub + 1 | ||
| 114 | + do ii=0, idump_sub-1 | ||
| 115 | + do cio=1,ncells | ||
| 116 | + do kio=0, cell_size(3,cio)-1 | ||
| 117 | + do jio=0, cell_size(2,cio)-1 | ||
| 118 | + iseek=5*(cell_low(1,cio) + | ||
| 119 | + $ PROBLEM_SIZE*((cell_low(2,cio)+jio) + | ||
| 120 | + $ PROBLEM_SIZE*((cell_low(3,cio)+kio) + | ||
| 121 | + $ PROBLEM_SIZE*ii))) | ||
| 122 | + | ||
| 123 | + count=5*cell_size(1,cio) | ||
| 124 | + | ||
| 125 | + call MPI_File_read_at(fp, iseek, | ||
| 126 | + $ u(1,0,jio,kio,cio), | ||
| 127 | + $ count, MPI_DOUBLE_PRECISION, | ||
| 128 | + $ mstatus, ierr) | ||
| 129 | + | ||
| 130 | + if (ierr .ne. MPI_SUCCESS) then | ||
| 131 | + print *, 'Error reading back file' | ||
| 132 | + call MPI_File_close(fp, ierr) | ||
| 133 | + stop | ||
| 134 | + endif | ||
| 135 | + enddo | ||
| 136 | + enddo | ||
| 137 | + enddo | ||
| 138 | + | ||
| 139 | + if (node .eq. root) print *, 'Reading data set ', ii+ichunk | ||
| 140 | + | ||
| 141 | + call error_norm(xce_single) | ||
| 142 | + do m = 1, 5 | ||
| 143 | + xce_sub(m) = xce_sub(m) + xce_single(m) | ||
| 144 | + end do | ||
| 145 | + enddo | ||
| 146 | + | ||
| 147 | + return | ||
| 148 | + end | ||
| 149 | + | ||
| 150 | +c--------------------------------------------------------------------- | ||
| 151 | +c--------------------------------------------------------------------- | ||
| 152 | + | ||
| 153 | + subroutine btio_cleanup | ||
| 154 | + | ||
| 155 | +c--------------------------------------------------------------------- | ||
| 156 | +c--------------------------------------------------------------------- | ||
| 157 | + | ||
| 158 | + include 'header.h' | ||
| 159 | + include 'mpinpb.h' | ||
| 160 | + | ||
| 161 | + integer ierr | ||
| 162 | + | ||
| 163 | + call MPI_File_close(fp, ierr) | ||
| 164 | + | ||
| 165 | + return | ||
| 166 | + end | ||
| 167 | + | ||
| 168 | +c--------------------------------------------------------------------- | ||
| 169 | +c--------------------------------------------------------------------- | ||
| 170 | + | ||
| 171 | + subroutine accumulate_norms(xce_acc) | ||
| 172 | + | ||
| 173 | +c--------------------------------------------------------------------- | ||
| 174 | +c--------------------------------------------------------------------- | ||
| 175 | + | ||
| 176 | + include 'header.h' | ||
| 177 | + include 'mpinpb.h' | ||
| 178 | + | ||
| 179 | + double precision xce_acc(5) | ||
| 180 | + integer m, ierr | ||
| 181 | + | ||
| 182 | + if (rd_interval .gt. 0) goto 20 | ||
| 183 | + | ||
| 184 | + call MPI_File_open(comm_solve, | ||
| 185 | + $ filenm, | ||
| 186 | + $ MPI_MODE_RDONLY, | ||
| 187 | + $ MPI_INFO_NULL, | ||
| 188 | + $ fp, | ||
| 189 | + $ ierr) | ||
| 190 | + | ||
| 191 | + iseek = 0 | ||
| 192 | + call MPI_File_set_view(fp, | ||
| 193 | + $ iseek, MPI_DOUBLE_PRECISION, MPI_DOUBLE_PRECISION, | ||
| 194 | + $ 'native', MPI_INFO_NULL, ierr) | ||
| 195 | + | ||
| 196 | +c clear the last time step | ||
| 197 | + | ||
| 198 | + call clear_timestep | ||
| 199 | + | ||
| 200 | +c read back the time steps and accumulate norms | ||
| 201 | + | ||
| 202 | + call acc_sub_norms(idump) | ||
| 203 | + | ||
| 204 | + call MPI_File_close(fp, ierr) | ||
| 205 | + | ||
| 206 | + 20 continue | ||
| 207 | + do m = 1, 5 | ||
| 208 | + xce_acc(m) = xce_sub(m) / dble(idump) | ||
| 209 | + end do | ||
| 210 | + | ||
| 211 | + return | ||
| 212 | + end | ||
| 213 | + |
NPB3.3-MPI/BT/solve_subs.f
0 โ 100644
| 1 | +++ a/NPB3.3-MPI/BT/solve_subs.f | ||
| 1 | + | ||
| 2 | +c--------------------------------------------------------------------- | ||
| 3 | +c--------------------------------------------------------------------- | ||
| 4 | + | ||
| 5 | + subroutine matvec_sub(ablock,avec,bvec) | ||
| 6 | + | ||
| 7 | +c--------------------------------------------------------------------- | ||
| 8 | +c--------------------------------------------------------------------- | ||
| 9 | + | ||
| 10 | +c--------------------------------------------------------------------- | ||
| 11 | +c subtracts bvec=bvec - ablock*avec | ||
| 12 | +c--------------------------------------------------------------------- | ||
| 13 | + | ||
| 14 | + implicit none | ||
| 15 | + | ||
| 16 | + double precision ablock,avec,bvec | ||
| 17 | + dimension ablock(5,5),avec(5),bvec(5) | ||
| 18 | + | ||
| 19 | +c--------------------------------------------------------------------- | ||
| 20 | +c rhs(i,ic,jc,kc,ccell) = rhs(i,ic,jc,kc,ccell) | ||
| 21 | +c $ - lhs(i,1,ablock,ia,ja,ka,acell)* | ||
| 22 | +c--------------------------------------------------------------------- | ||
| 23 | + bvec(1) = bvec(1) - ablock(1,1)*avec(1) | ||
| 24 | + > - ablock(1,2)*avec(2) | ||
| 25 | + > - ablock(1,3)*avec(3) | ||
| 26 | + > - ablock(1,4)*avec(4) | ||
| 27 | + > - ablock(1,5)*avec(5) | ||
| 28 | + bvec(2) = bvec(2) - ablock(2,1)*avec(1) | ||
| 29 | + > - ablock(2,2)*avec(2) | ||
| 30 | + > - ablock(2,3)*avec(3) | ||
| 31 | + > - ablock(2,4)*avec(4) | ||
| 32 | + > - ablock(2,5)*avec(5) | ||
| 33 | + bvec(3) = bvec(3) - ablock(3,1)*avec(1) | ||
| 34 | + > - ablock(3,2)*avec(2) | ||
| 35 | + > - ablock(3,3)*avec(3) | ||
| 36 | + > - ablock(3,4)*avec(4) | ||
| 37 | + > - ablock(3,5)*avec(5) | ||
| 38 | + bvec(4) = bvec(4) - ablock(4,1)*avec(1) | ||
| 39 | + > - ablock(4,2)*avec(2) | ||
| 40 | + > - ablock(4,3)*avec(3) | ||
| 41 | + > - ablock(4,4)*avec(4) | ||
| 42 | + > - ablock(4,5)*avec(5) | ||
| 43 | + bvec(5) = bvec(5) - ablock(5,1)*avec(1) | ||
| 44 | + > - ablock(5,2)*avec(2) | ||
| 45 | + > - ablock(5,3)*avec(3) | ||
| 46 | + > - ablock(5,4)*avec(4) | ||
| 47 | + > - ablock(5,5)*avec(5) | ||
| 48 | + | ||
| 49 | + | ||
| 50 | + return | ||
| 51 | + end | ||
| 52 | + | ||
| 53 | +c--------------------------------------------------------------------- | ||
| 54 | +c--------------------------------------------------------------------- | ||
| 55 | + | ||
| 56 | + subroutine matmul_sub(ablock, bblock, cblock) | ||
| 57 | + | ||
| 58 | +c--------------------------------------------------------------------- | ||
| 59 | +c--------------------------------------------------------------------- | ||
| 60 | + | ||
| 61 | +c--------------------------------------------------------------------- | ||
| 62 | +c subtracts a(i,j,k) X b(i,j,k) from c(i,j,k) | ||
| 63 | +c--------------------------------------------------------------------- | ||
| 64 | + | ||
| 65 | + implicit none | ||
| 66 | + | ||
| 67 | + double precision ablock, bblock, cblock | ||
| 68 | + dimension ablock(5,5), bblock(5,5), cblock(5,5) | ||
| 69 | + | ||
| 70 | + | ||
| 71 | + cblock(1,1) = cblock(1,1) - ablock(1,1)*bblock(1,1) | ||
| 72 | + > - ablock(1,2)*bblock(2,1) | ||
| 73 | + > - ablock(1,3)*bblock(3,1) | ||
| 74 | + > - ablock(1,4)*bblock(4,1) | ||
| 75 | + > - ablock(1,5)*bblock(5,1) | ||
| 76 | + cblock(2,1) = cblock(2,1) - ablock(2,1)*bblock(1,1) | ||
| 77 | + > - ablock(2,2)*bblock(2,1) | ||
| 78 | + > - ablock(2,3)*bblock(3,1) | ||
| 79 | + > - ablock(2,4)*bblock(4,1) | ||
| 80 | + > - ablock(2,5)*bblock(5,1) | ||
| 81 | + cblock(3,1) = cblock(3,1) - ablock(3,1)*bblock(1,1) | ||
| 82 | + > - ablock(3,2)*bblock(2,1) | ||
| 83 | + > - ablock(3,3)*bblock(3,1) | ||
| 84 | + > - ablock(3,4)*bblock(4,1) | ||
| 85 | + > - ablock(3,5)*bblock(5,1) | ||
| 86 | + cblock(4,1) = cblock(4,1) - ablock(4,1)*bblock(1,1) | ||
| 87 | + > - ablock(4,2)*bblock(2,1) | ||
| 88 | + > - ablock(4,3)*bblock(3,1) | ||
| 89 | + > - ablock(4,4)*bblock(4,1) | ||
| 90 | + > - ablock(4,5)*bblock(5,1) | ||
| 91 | + cblock(5,1) = cblock(5,1) - ablock(5,1)*bblock(1,1) | ||
| 92 | + > - ablock(5,2)*bblock(2,1) | ||
| 93 | + > - ablock(5,3)*bblock(3,1) | ||
| 94 | + > - ablock(5,4)*bblock(4,1) | ||
| 95 | + > - ablock(5,5)*bblock(5,1) | ||
| 96 | + cblock(1,2) = cblock(1,2) - ablock(1,1)*bblock(1,2) | ||
| 97 | + > - ablock(1,2)*bblock(2,2) | ||
| 98 | + > - ablock(1,3)*bblock(3,2) | ||
| 99 | + > - ablock(1,4)*bblock(4,2) | ||
| 100 | + > - ablock(1,5)*bblock(5,2) | ||
| 101 | + cblock(2,2) = cblock(2,2) - ablock(2,1)*bblock(1,2) | ||
| 102 | + > - ablock(2,2)*bblock(2,2) | ||
| 103 | + > - ablock(2,3)*bblock(3,2) | ||
| 104 | + > - ablock(2,4)*bblock(4,2) | ||
| 105 | + > - ablock(2,5)*bblock(5,2) | ||
| 106 | + cblock(3,2) = cblock(3,2) - ablock(3,1)*bblock(1,2) | ||
| 107 | + > - ablock(3,2)*bblock(2,2) | ||
| 108 | + > - ablock(3,3)*bblock(3,2) | ||
| 109 | + > - ablock(3,4)*bblock(4,2) | ||
| 110 | + > - ablock(3,5)*bblock(5,2) | ||
| 111 | + cblock(4,2) = cblock(4,2) - ablock(4,1)*bblock(1,2) | ||
| 112 | + > - ablock(4,2)*bblock(2,2) | ||
| 113 | + > - ablock(4,3)*bblock(3,2) | ||
| 114 | + > - ablock(4,4)*bblock(4,2) | ||
| 115 | + > - ablock(4,5)*bblock(5,2) | ||
| 116 | + cblock(5,2) = cblock(5,2) - ablock(5,1)*bblock(1,2) | ||
| 117 | + > - ablock(5,2)*bblock(2,2) | ||
| 118 | + > - ablock(5,3)*bblock(3,2) | ||
| 119 | + > - ablock(5,4)*bblock(4,2) | ||
| 120 | + > - ablock(5,5)*bblock(5,2) | ||
| 121 | + cblock(1,3) = cblock(1,3) - ablock(1,1)*bblock(1,3) | ||
| 122 | + > - ablock(1,2)*bblock(2,3) | ||
| 123 | + > - ablock(1,3)*bblock(3,3) | ||
| 124 | + > - ablock(1,4)*bblock(4,3) | ||
| 125 | + > - ablock(1,5)*bblock(5,3) | ||
| 126 | + cblock(2,3) = cblock(2,3) - ablock(2,1)*bblock(1,3) | ||
| 127 | + > - ablock(2,2)*bblock(2,3) | ||
| 128 | + > - ablock(2,3)*bblock(3,3) | ||
| 129 | + > - ablock(2,4)*bblock(4,3) | ||
| 130 | + > - ablock(2,5)*bblock(5,3) | ||
| 131 | + cblock(3,3) = cblock(3,3) - ablock(3,1)*bblock(1,3) | ||
| 132 | + > - ablock(3,2)*bblock(2,3) | ||
| 133 | + > - ablock(3,3)*bblock(3,3) | ||
| 134 | + > - ablock(3,4)*bblock(4,3) | ||
| 135 | + > - ablock(3,5)*bblock(5,3) | ||
| 136 | + cblock(4,3) = cblock(4,3) - ablock(4,1)*bblock(1,3) | ||
| 137 | + > - ablock(4,2)*bblock(2,3) | ||
| 138 | + > - ablock(4,3)*bblock(3,3) | ||
| 139 | + > - ablock(4,4)*bblock(4,3) | ||
| 140 | + > - ablock(4,5)*bblock(5,3) | ||
| 141 | + cblock(5,3) = cblock(5,3) - ablock(5,1)*bblock(1,3) | ||
| 142 | + > - ablock(5,2)*bblock(2,3) | ||
| 143 | + > - ablock(5,3)*bblock(3,3) | ||
| 144 | + > - ablock(5,4)*bblock(4,3) | ||
| 145 | + > - ablock(5,5)*bblock(5,3) | ||
| 146 | + cblock(1,4) = cblock(1,4) - ablock(1,1)*bblock(1,4) | ||
| 147 | + > - ablock(1,2)*bblock(2,4) | ||
| 148 | + > - ablock(1,3)*bblock(3,4) | ||
| 149 | + > - ablock(1,4)*bblock(4,4) | ||
| 150 | + > - ablock(1,5)*bblock(5,4) | ||
| 151 | + cblock(2,4) = cblock(2,4) - ablock(2,1)*bblock(1,4) | ||
| 152 | + > - ablock(2,2)*bblock(2,4) | ||
| 153 | + > - ablock(2,3)*bblock(3,4) | ||
| 154 | + > - ablock(2,4)*bblock(4,4) | ||
| 155 | + > - ablock(2,5)*bblock(5,4) | ||
| 156 | + cblock(3,4) = cblock(3,4) - ablock(3,1)*bblock(1,4) | ||
| 157 | + > - ablock(3,2)*bblock(2,4) | ||
| 158 | + > - ablock(3,3)*bblock(3,4) | ||
| 159 | + > - ablock(3,4)*bblock(4,4) | ||
| 160 | + > - ablock(3,5)*bblock(5,4) | ||
| 161 | + cblock(4,4) = cblock(4,4) - ablock(4,1)*bblock(1,4) | ||
| 162 | + > - ablock(4,2)*bblock(2,4) | ||
| 163 | + > - ablock(4,3)*bblock(3,4) | ||
| 164 | + > - ablock(4,4)*bblock(4,4) | ||
| 165 | + > - ablock(4,5)*bblock(5,4) | ||
| 166 | + cblock(5,4) = cblock(5,4) - ablock(5,1)*bblock(1,4) | ||
| 167 | + > - ablock(5,2)*bblock(2,4) | ||
| 168 | + > - ablock(5,3)*bblock(3,4) | ||
| 169 | + > - ablock(5,4)*bblock(4,4) | ||
| 170 | + > - ablock(5,5)*bblock(5,4) | ||
| 171 | + cblock(1,5) = cblock(1,5) - ablock(1,1)*bblock(1,5) | ||
| 172 | + > - ablock(1,2)*bblock(2,5) | ||
| 173 | + > - ablock(1,3)*bblock(3,5) | ||
| 174 | + > - ablock(1,4)*bblock(4,5) | ||
| 175 | + > - ablock(1,5)*bblock(5,5) | ||
| 176 | + cblock(2,5) = cblock(2,5) - ablock(2,1)*bblock(1,5) | ||
| 177 | + > - ablock(2,2)*bblock(2,5) | ||
| 178 | + > - ablock(2,3)*bblock(3,5) | ||
| 179 | + > - ablock(2,4)*bblock(4,5) | ||
| 180 | + > - ablock(2,5)*bblock(5,5) | ||
| 181 | + cblock(3,5) = cblock(3,5) - ablock(3,1)*bblock(1,5) | ||
| 182 | + > - ablock(3,2)*bblock(2,5) | ||
| 183 | + > - ablock(3,3)*bblock(3,5) | ||
| 184 | + > - ablock(3,4)*bblock(4,5) | ||
| 185 | + > - ablock(3,5)*bblock(5,5) | ||
| 186 | + cblock(4,5) = cblock(4,5) - ablock(4,1)*bblock(1,5) | ||
| 187 | + > - ablock(4,2)*bblock(2,5) | ||
| 188 | + > - ablock(4,3)*bblock(3,5) | ||
| 189 | + > - ablock(4,4)*bblock(4,5) | ||
| 190 | + > - ablock(4,5)*bblock(5,5) | ||
| 191 | + cblock(5,5) = cblock(5,5) - ablock(5,1)*bblock(1,5) | ||
| 192 | + > - ablock(5,2)*bblock(2,5) | ||
| 193 | + > - ablock(5,3)*bblock(3,5) | ||
| 194 | + > - ablock(5,4)*bblock(4,5) | ||
| 195 | + > - ablock(5,5)*bblock(5,5) | ||
| 196 | + | ||
| 197 | + | ||
| 198 | + return | ||
| 199 | + end | ||
| 200 | + | ||
| 201 | + | ||
| 202 | + | ||
| 203 | +c--------------------------------------------------------------------- | ||
| 204 | +c--------------------------------------------------------------------- | ||
| 205 | + | ||
| 206 | + subroutine binvcrhs( lhs,c,r ) | ||
| 207 | + | ||
| 208 | +c--------------------------------------------------------------------- | ||
| 209 | +c--------------------------------------------------------------------- | ||
| 210 | + | ||
| 211 | +c--------------------------------------------------------------------- | ||
| 212 | +c | ||
| 213 | +c--------------------------------------------------------------------- | ||
| 214 | + | ||
| 215 | + implicit none | ||
| 216 | + | ||
| 217 | + double precision pivot, coeff, lhs | ||
| 218 | + dimension lhs(5,5) | ||
| 219 | + double precision c(5,5), r(5) | ||
| 220 | + | ||
| 221 | +c--------------------------------------------------------------------- | ||
| 222 | +c | ||
| 223 | +c--------------------------------------------------------------------- | ||
| 224 | + | ||
| 225 | + pivot = 1.00d0/lhs(1,1) | ||
| 226 | + lhs(1,2) = lhs(1,2)*pivot | ||
| 227 | + lhs(1,3) = lhs(1,3)*pivot | ||
| 228 | + lhs(1,4) = lhs(1,4)*pivot | ||
| 229 | + lhs(1,5) = lhs(1,5)*pivot | ||
| 230 | + c(1,1) = c(1,1)*pivot | ||
| 231 | + c(1,2) = c(1,2)*pivot | ||
| 232 | + c(1,3) = c(1,3)*pivot | ||
| 233 | + c(1,4) = c(1,4)*pivot | ||
| 234 | + c(1,5) = c(1,5)*pivot | ||
| 235 | + r(1) = r(1) *pivot | ||
| 236 | + | ||
| 237 | + coeff = lhs(2,1) | ||
| 238 | + lhs(2,2)= lhs(2,2) - coeff*lhs(1,2) | ||
| 239 | + lhs(2,3)= lhs(2,3) - coeff*lhs(1,3) | ||
| 240 | + lhs(2,4)= lhs(2,4) - coeff*lhs(1,4) | ||
| 241 | + lhs(2,5)= lhs(2,5) - coeff*lhs(1,5) | ||
| 242 | + c(2,1) = c(2,1) - coeff*c(1,1) | ||
| 243 | + c(2,2) = c(2,2) - coeff*c(1,2) | ||
| 244 | + c(2,3) = c(2,3) - coeff*c(1,3) | ||
| 245 | + c(2,4) = c(2,4) - coeff*c(1,4) | ||
| 246 | + c(2,5) = c(2,5) - coeff*c(1,5) | ||
| 247 | + r(2) = r(2) - coeff*r(1) | ||
| 248 | + | ||
| 249 | + coeff = lhs(3,1) | ||
| 250 | + lhs(3,2)= lhs(3,2) - coeff*lhs(1,2) | ||
| 251 | + lhs(3,3)= lhs(3,3) - coeff*lhs(1,3) | ||
| 252 | + lhs(3,4)= lhs(3,4) - coeff*lhs(1,4) | ||
| 253 | + lhs(3,5)= lhs(3,5) - coeff*lhs(1,5) | ||
| 254 | + c(3,1) = c(3,1) - coeff*c(1,1) | ||
| 255 | + c(3,2) = c(3,2) - coeff*c(1,2) | ||
| 256 | + c(3,3) = c(3,3) - coeff*c(1,3) | ||
| 257 | + c(3,4) = c(3,4) - coeff*c(1,4) | ||
| 258 | + c(3,5) = c(3,5) - coeff*c(1,5) | ||
| 259 | + r(3) = r(3) - coeff*r(1) | ||
| 260 | + | ||
| 261 | + coeff = lhs(4,1) | ||
| 262 | + lhs(4,2)= lhs(4,2) - coeff*lhs(1,2) | ||
| 263 | + lhs(4,3)= lhs(4,3) - coeff*lhs(1,3) | ||
| 264 | + lhs(4,4)= lhs(4,4) - coeff*lhs(1,4) | ||
| 265 | + lhs(4,5)= lhs(4,5) - coeff*lhs(1,5) | ||
| 266 | + c(4,1) = c(4,1) - coeff*c(1,1) | ||
| 267 | + c(4,2) = c(4,2) - coeff*c(1,2) | ||
| 268 | + c(4,3) = c(4,3) - coeff*c(1,3) | ||
| 269 | + c(4,4) = c(4,4) - coeff*c(1,4) | ||
| 270 | + c(4,5) = c(4,5) - coeff*c(1,5) | ||
| 271 | + r(4) = r(4) - coeff*r(1) | ||
| 272 | + | ||
| 273 | + coeff = lhs(5,1) | ||
| 274 | + lhs(5,2)= lhs(5,2) - coeff*lhs(1,2) | ||
| 275 | + lhs(5,3)= lhs(5,3) - coeff*lhs(1,3) | ||
| 276 | + lhs(5,4)= lhs(5,4) - coeff*lhs(1,4) | ||
| 277 | + lhs(5,5)= lhs(5,5) - coeff*lhs(1,5) | ||
| 278 | + c(5,1) = c(5,1) - coeff*c(1,1) | ||
| 279 | + c(5,2) = c(5,2) - coeff*c(1,2) | ||
| 280 | + c(5,3) = c(5,3) - coeff*c(1,3) | ||
| 281 | + c(5,4) = c(5,4) - coeff*c(1,4) | ||
| 282 | + c(5,5) = c(5,5) - coeff*c(1,5) | ||
| 283 | + r(5) = r(5) - coeff*r(1) | ||
| 284 | + | ||
| 285 | + | ||
| 286 | + pivot = 1.00d0/lhs(2,2) | ||
| 287 | + lhs(2,3) = lhs(2,3)*pivot | ||
| 288 | + lhs(2,4) = lhs(2,4)*pivot | ||
| 289 | + lhs(2,5) = lhs(2,5)*pivot | ||
| 290 | + c(2,1) = c(2,1)*pivot | ||
| 291 | + c(2,2) = c(2,2)*pivot | ||
| 292 | + c(2,3) = c(2,3)*pivot | ||
| 293 | + c(2,4) = c(2,4)*pivot | ||
| 294 | + c(2,5) = c(2,5)*pivot | ||
| 295 | + r(2) = r(2) *pivot | ||
| 296 | + | ||
| 297 | + coeff = lhs(1,2) | ||
| 298 | + lhs(1,3)= lhs(1,3) - coeff*lhs(2,3) | ||
| 299 | + lhs(1,4)= lhs(1,4) - coeff*lhs(2,4) | ||
| 300 | + lhs(1,5)= lhs(1,5) - coeff*lhs(2,5) | ||
| 301 | + c(1,1) = c(1,1) - coeff*c(2,1) | ||
| 302 | + c(1,2) = c(1,2) - coeff*c(2,2) | ||
| 303 | + c(1,3) = c(1,3) - coeff*c(2,3) | ||
| 304 | + c(1,4) = c(1,4) - coeff*c(2,4) | ||
| 305 | + c(1,5) = c(1,5) - coeff*c(2,5) | ||
| 306 | + r(1) = r(1) - coeff*r(2) | ||
| 307 | + | ||
| 308 | + coeff = lhs(3,2) | ||
| 309 | + lhs(3,3)= lhs(3,3) - coeff*lhs(2,3) | ||
| 310 | + lhs(3,4)= lhs(3,4) - coeff*lhs(2,4) | ||
| 311 | + lhs(3,5)= lhs(3,5) - coeff*lhs(2,5) | ||
| 312 | + c(3,1) = c(3,1) - coeff*c(2,1) | ||
| 313 | + c(3,2) = c(3,2) - coeff*c(2,2) | ||
| 314 | + c(3,3) = c(3,3) - coeff*c(2,3) | ||
| 315 | + c(3,4) = c(3,4) - coeff*c(2,4) | ||
| 316 | + c(3,5) = c(3,5) - coeff*c(2,5) | ||
| 317 | + r(3) = r(3) - coeff*r(2) | ||
| 318 | + | ||
| 319 | + coeff = lhs(4,2) | ||
| 320 | + lhs(4,3)= lhs(4,3) - coeff*lhs(2,3) | ||
| 321 | + lhs(4,4)= lhs(4,4) - coeff*lhs(2,4) | ||
| 322 | + lhs(4,5)= lhs(4,5) - coeff*lhs(2,5) | ||
| 323 | + c(4,1) = c(4,1) - coeff*c(2,1) | ||
| 324 | + c(4,2) = c(4,2) - coeff*c(2,2) | ||
| 325 | + c(4,3) = c(4,3) - coeff*c(2,3) | ||
| 326 | + c(4,4) = c(4,4) - coeff*c(2,4) | ||
| 327 | + c(4,5) = c(4,5) - coeff*c(2,5) | ||
| 328 | + r(4) = r(4) - coeff*r(2) | ||
| 329 | + | ||
| 330 | + coeff = lhs(5,2) | ||
| 331 | + lhs(5,3)= lhs(5,3) - coeff*lhs(2,3) | ||
| 332 | + lhs(5,4)= lhs(5,4) - coeff*lhs(2,4) | ||
| 333 | + lhs(5,5)= lhs(5,5) - coeff*lhs(2,5) | ||
| 334 | + c(5,1) = c(5,1) - coeff*c(2,1) | ||
| 335 | + c(5,2) = c(5,2) - coeff*c(2,2) | ||
| 336 | + c(5,3) = c(5,3) - coeff*c(2,3) | ||
| 337 | + c(5,4) = c(5,4) - coeff*c(2,4) | ||
| 338 | + c(5,5) = c(5,5) - coeff*c(2,5) | ||
| 339 | + r(5) = r(5) - coeff*r(2) | ||
| 340 | + | ||
| 341 | + | ||
| 342 | + pivot = 1.00d0/lhs(3,3) | ||
| 343 | + lhs(3,4) = lhs(3,4)*pivot | ||
| 344 | + lhs(3,5) = lhs(3,5)*pivot | ||
| 345 | + c(3,1) = c(3,1)*pivot | ||
| 346 | + c(3,2) = c(3,2)*pivot | ||
| 347 | + c(3,3) = c(3,3)*pivot | ||
| 348 | + c(3,4) = c(3,4)*pivot | ||
| 349 | + c(3,5) = c(3,5)*pivot | ||
| 350 | + r(3) = r(3) *pivot | ||
| 351 | + | ||
| 352 | + coeff = lhs(1,3) | ||
| 353 | + lhs(1,4)= lhs(1,4) - coeff*lhs(3,4) | ||
| 354 | + lhs(1,5)= lhs(1,5) - coeff*lhs(3,5) | ||
| 355 | + c(1,1) = c(1,1) - coeff*c(3,1) | ||
| 356 | + c(1,2) = c(1,2) - coeff*c(3,2) | ||
| 357 | + c(1,3) = c(1,3) - coeff*c(3,3) | ||
| 358 | + c(1,4) = c(1,4) - coeff*c(3,4) | ||
| 359 | + c(1,5) = c(1,5) - coeff*c(3,5) | ||
| 360 | + r(1) = r(1) - coeff*r(3) | ||
| 361 | + | ||
| 362 | + coeff = lhs(2,3) | ||
| 363 | + lhs(2,4)= lhs(2,4) - coeff*lhs(3,4) | ||
| 364 | + lhs(2,5)= lhs(2,5) - coeff*lhs(3,5) | ||
| 365 | + c(2,1) = c(2,1) - coeff*c(3,1) | ||
| 366 | + c(2,2) = c(2,2) - coeff*c(3,2) | ||
| 367 | + c(2,3) = c(2,3) - coeff*c(3,3) | ||
| 368 | + c(2,4) = c(2,4) - coeff*c(3,4) | ||
| 369 | + c(2,5) = c(2,5) - coeff*c(3,5) | ||
| 370 | + r(2) = r(2) - coeff*r(3) | ||
| 371 | + | ||
| 372 | + coeff = lhs(4,3) | ||
| 373 | + lhs(4,4)= lhs(4,4) - coeff*lhs(3,4) | ||
| 374 | + lhs(4,5)= lhs(4,5) - coeff*lhs(3,5) | ||
| 375 | + c(4,1) = c(4,1) - coeff*c(3,1) | ||
| 376 | + c(4,2) = c(4,2) - coeff*c(3,2) | ||
| 377 | + c(4,3) = c(4,3) - coeff*c(3,3) | ||
| 378 | + c(4,4) = c(4,4) - coeff*c(3,4) | ||
| 379 | + c(4,5) = c(4,5) - coeff*c(3,5) | ||
| 380 | + r(4) = r(4) - coeff*r(3) | ||
| 381 | + | ||
| 382 | + coeff = lhs(5,3) | ||
| 383 | + lhs(5,4)= lhs(5,4) - coeff*lhs(3,4) | ||
| 384 | + lhs(5,5)= lhs(5,5) - coeff*lhs(3,5) | ||
| 385 | + c(5,1) = c(5,1) - coeff*c(3,1) | ||
| 386 | + c(5,2) = c(5,2) - coeff*c(3,2) | ||
| 387 | + c(5,3) = c(5,3) - coeff*c(3,3) | ||
| 388 | + c(5,4) = c(5,4) - coeff*c(3,4) | ||
| 389 | + c(5,5) = c(5,5) - coeff*c(3,5) | ||
| 390 | + r(5) = r(5) - coeff*r(3) | ||
| 391 | + | ||
| 392 | + | ||
| 393 | + pivot = 1.00d0/lhs(4,4) | ||
| 394 | + lhs(4,5) = lhs(4,5)*pivot | ||
| 395 | + c(4,1) = c(4,1)*pivot | ||
| 396 | + c(4,2) = c(4,2)*pivot | ||
| 397 | + c(4,3) = c(4,3)*pivot | ||
| 398 | + c(4,4) = c(4,4)*pivot | ||
| 399 | + c(4,5) = c(4,5)*pivot | ||
| 400 | + r(4) = r(4) *pivot | ||
| 401 | + | ||
| 402 | + coeff = lhs(1,4) | ||
| 403 | + lhs(1,5)= lhs(1,5) - coeff*lhs(4,5) | ||
| 404 | + c(1,1) = c(1,1) - coeff*c(4,1) | ||
| 405 | + c(1,2) = c(1,2) - coeff*c(4,2) | ||
| 406 | + c(1,3) = c(1,3) - coeff*c(4,3) | ||
| 407 | + c(1,4) = c(1,4) - coeff*c(4,4) | ||
| 408 | + c(1,5) = c(1,5) - coeff*c(4,5) | ||
| 409 | + r(1) = r(1) - coeff*r(4) | ||
| 410 | + | ||
| 411 | + coeff = lhs(2,4) | ||
| 412 | + lhs(2,5)= lhs(2,5) - coeff*lhs(4,5) | ||
| 413 | + c(2,1) = c(2,1) - coeff*c(4,1) | ||
| 414 | + c(2,2) = c(2,2) - coeff*c(4,2) | ||
| 415 | + c(2,3) = c(2,3) - coeff*c(4,3) | ||
| 416 | + c(2,4) = c(2,4) - coeff*c(4,4) | ||
| 417 | + c(2,5) = c(2,5) - coeff*c(4,5) | ||
| 418 | + r(2) = r(2) - coeff*r(4) | ||
| 419 | + | ||
| 420 | + coeff = lhs(3,4) | ||
| 421 | + lhs(3,5)= lhs(3,5) - coeff*lhs(4,5) | ||
| 422 | + c(3,1) = c(3,1) - coeff*c(4,1) | ||
| 423 | + c(3,2) = c(3,2) - coeff*c(4,2) | ||
| 424 | + c(3,3) = c(3,3) - coeff*c(4,3) | ||
| 425 | + c(3,4) = c(3,4) - coeff*c(4,4) | ||
| 426 | + c(3,5) = c(3,5) - coeff*c(4,5) | ||
| 427 | + r(3) = r(3) - coeff*r(4) | ||
| 428 | + | ||
| 429 | + coeff = lhs(5,4) | ||
| 430 | + lhs(5,5)= lhs(5,5) - coeff*lhs(4,5) | ||
| 431 | + c(5,1) = c(5,1) - coeff*c(4,1) | ||
| 432 | + c(5,2) = c(5,2) - coeff*c(4,2) | ||
| 433 | + c(5,3) = c(5,3) - coeff*c(4,3) | ||
| 434 | + c(5,4) = c(5,4) - coeff*c(4,4) | ||
| 435 | + c(5,5) = c(5,5) - coeff*c(4,5) | ||
| 436 | + r(5) = r(5) - coeff*r(4) | ||
| 437 | + | ||
| 438 | + | ||
| 439 | + pivot = 1.00d0/lhs(5,5) | ||
| 440 | + c(5,1) = c(5,1)*pivot | ||
| 441 | + c(5,2) = c(5,2)*pivot | ||
| 442 | + c(5,3) = c(5,3)*pivot | ||
| 443 | + c(5,4) = c(5,4)*pivot | ||
| 444 | + c(5,5) = c(5,5)*pivot | ||
| 445 | + r(5) = r(5) *pivot | ||
| 446 | + | ||
| 447 | + coeff = lhs(1,5) | ||
| 448 | + c(1,1) = c(1,1) - coeff*c(5,1) | ||
| 449 | + c(1,2) = c(1,2) - coeff*c(5,2) | ||
| 450 | + c(1,3) = c(1,3) - coeff*c(5,3) | ||
| 451 | + c(1,4) = c(1,4) - coeff*c(5,4) | ||
| 452 | + c(1,5) = c(1,5) - coeff*c(5,5) | ||
| 453 | + r(1) = r(1) - coeff*r(5) | ||
| 454 | + | ||
| 455 | + coeff = lhs(2,5) | ||
| 456 | + c(2,1) = c(2,1) - coeff*c(5,1) | ||
| 457 | + c(2,2) = c(2,2) - coeff*c(5,2) | ||
| 458 | + c(2,3) = c(2,3) - coeff*c(5,3) | ||
| 459 | + c(2,4) = c(2,4) - coeff*c(5,4) | ||
| 460 | + c(2,5) = c(2,5) - coeff*c(5,5) | ||
| 461 | + r(2) = r(2) - coeff*r(5) | ||
| 462 | + | ||
| 463 | + coeff = lhs(3,5) | ||
| 464 | + c(3,1) = c(3,1) - coeff*c(5,1) | ||
| 465 | + c(3,2) = c(3,2) - coeff*c(5,2) | ||
| 466 | + c(3,3) = c(3,3) - coeff*c(5,3) | ||
| 467 | + c(3,4) = c(3,4) - coeff*c(5,4) | ||
| 468 | + c(3,5) = c(3,5) - coeff*c(5,5) | ||
| 469 | + r(3) = r(3) - coeff*r(5) | ||
| 470 | + | ||
| 471 | + coeff = lhs(4,5) | ||
| 472 | + c(4,1) = c(4,1) - coeff*c(5,1) | ||
| 473 | + c(4,2) = c(4,2) - coeff*c(5,2) | ||
| 474 | + c(4,3) = c(4,3) - coeff*c(5,3) | ||
| 475 | + c(4,4) = c(4,4) - coeff*c(5,4) | ||
| 476 | + c(4,5) = c(4,5) - coeff*c(5,5) | ||
| 477 | + r(4) = r(4) - coeff*r(5) | ||
| 478 | + | ||
| 479 | + | ||
| 480 | + return | ||
| 481 | + end | ||
| 482 | + | ||
| 483 | + | ||
| 484 | + | ||
| 485 | +c--------------------------------------------------------------------- | ||
| 486 | +c--------------------------------------------------------------------- | ||
| 487 | + | ||
| 488 | + subroutine binvrhs( lhs,r ) | ||
| 489 | + | ||
| 490 | +c--------------------------------------------------------------------- | ||
| 491 | +c--------------------------------------------------------------------- | ||
| 492 | + | ||
| 493 | +c--------------------------------------------------------------------- | ||
| 494 | +c | ||
| 495 | +c--------------------------------------------------------------------- | ||
| 496 | + | ||
| 497 | + implicit none | ||
| 498 | + | ||
| 499 | + double precision pivot, coeff, lhs | ||
| 500 | + dimension lhs(5,5) | ||
| 501 | + double precision r(5) | ||
| 502 | + | ||
| 503 | +c--------------------------------------------------------------------- | ||
| 504 | +c | ||
| 505 | +c--------------------------------------------------------------------- | ||
| 506 | + | ||
| 507 | + | ||
| 508 | + pivot = 1.00d0/lhs(1,1) | ||
| 509 | + lhs(1,2) = lhs(1,2)*pivot | ||
| 510 | + lhs(1,3) = lhs(1,3)*pivot | ||
| 511 | + lhs(1,4) = lhs(1,4)*pivot | ||
| 512 | + lhs(1,5) = lhs(1,5)*pivot | ||
| 513 | + r(1) = r(1) *pivot | ||
| 514 | + | ||
| 515 | + coeff = lhs(2,1) | ||
| 516 | + lhs(2,2)= lhs(2,2) - coeff*lhs(1,2) | ||
| 517 | + lhs(2,3)= lhs(2,3) - coeff*lhs(1,3) | ||
| 518 | + lhs(2,4)= lhs(2,4) - coeff*lhs(1,4) | ||
| 519 | + lhs(2,5)= lhs(2,5) - coeff*lhs(1,5) | ||
| 520 | + r(2) = r(2) - coeff*r(1) | ||
| 521 | + | ||
| 522 | + coeff = lhs(3,1) | ||
| 523 | + lhs(3,2)= lhs(3,2) - coeff*lhs(1,2) | ||
| 524 | + lhs(3,3)= lhs(3,3) - coeff*lhs(1,3) | ||
| 525 | + lhs(3,4)= lhs(3,4) - coeff*lhs(1,4) | ||
| 526 | + lhs(3,5)= lhs(3,5) - coeff*lhs(1,5) | ||
| 527 | + r(3) = r(3) - coeff*r(1) | ||
| 528 | + | ||
| 529 | + coeff = lhs(4,1) | ||
| 530 | + lhs(4,2)= lhs(4,2) - coeff*lhs(1,2) | ||
| 531 | + lhs(4,3)= lhs(4,3) - coeff*lhs(1,3) | ||
| 532 | + lhs(4,4)= lhs(4,4) - coeff*lhs(1,4) | ||
| 533 | + lhs(4,5)= lhs(4,5) - coeff*lhs(1,5) | ||
| 534 | + r(4) = r(4) - coeff*r(1) | ||
| 535 | + | ||
| 536 | + coeff = lhs(5,1) | ||
| 537 | + lhs(5,2)= lhs(5,2) - coeff*lhs(1,2) | ||
| 538 | + lhs(5,3)= lhs(5,3) - coeff*lhs(1,3) | ||
| 539 | + lhs(5,4)= lhs(5,4) - coeff*lhs(1,4) | ||
| 540 | + lhs(5,5)= lhs(5,5) - coeff*lhs(1,5) | ||
| 541 | + r(5) = r(5) - coeff*r(1) | ||
| 542 | + | ||
| 543 | + | ||
| 544 | + pivot = 1.00d0/lhs(2,2) | ||
| 545 | + lhs(2,3) = lhs(2,3)*pivot | ||
| 546 | + lhs(2,4) = lhs(2,4)*pivot | ||
| 547 | + lhs(2,5) = lhs(2,5)*pivot | ||
| 548 | + r(2) = r(2) *pivot | ||
| 549 | + | ||
| 550 | + coeff = lhs(1,2) | ||
| 551 | + lhs(1,3)= lhs(1,3) - coeff*lhs(2,3) | ||
| 552 | + lhs(1,4)= lhs(1,4) - coeff*lhs(2,4) | ||
| 553 | + lhs(1,5)= lhs(1,5) - coeff*lhs(2,5) | ||
| 554 | + r(1) = r(1) - coeff*r(2) | ||
| 555 | + | ||
| 556 | + coeff = lhs(3,2) | ||
| 557 | + lhs(3,3)= lhs(3,3) - coeff*lhs(2,3) | ||
| 558 | + lhs(3,4)= lhs(3,4) - coeff*lhs(2,4) | ||
| 559 | + lhs(3,5)= lhs(3,5) - coeff*lhs(2,5) | ||
| 560 | + r(3) = r(3) - coeff*r(2) | ||
| 561 | + | ||
| 562 | + coeff = lhs(4,2) | ||
| 563 | + lhs(4,3)= lhs(4,3) - coeff*lhs(2,3) | ||
| 564 | + lhs(4,4)= lhs(4,4) - coeff*lhs(2,4) | ||
| 565 | + lhs(4,5)= lhs(4,5) - coeff*lhs(2,5) | ||
| 566 | + r(4) = r(4) - coeff*r(2) | ||
| 567 | + | ||
| 568 | + coeff = lhs(5,2) | ||
| 569 | + lhs(5,3)= lhs(5,3) - coeff*lhs(2,3) | ||
| 570 | + lhs(5,4)= lhs(5,4) - coeff*lhs(2,4) | ||
| 571 | + lhs(5,5)= lhs(5,5) - coeff*lhs(2,5) | ||
| 572 | + r(5) = r(5) - coeff*r(2) | ||
| 573 | + | ||
| 574 | + | ||
| 575 | + pivot = 1.00d0/lhs(3,3) | ||
| 576 | + lhs(3,4) = lhs(3,4)*pivot | ||
| 577 | + lhs(3,5) = lhs(3,5)*pivot | ||
| 578 | + r(3) = r(3) *pivot | ||
| 579 | + | ||
| 580 | + coeff = lhs(1,3) | ||
| 581 | + lhs(1,4)= lhs(1,4) - coeff*lhs(3,4) | ||
| 582 | + lhs(1,5)= lhs(1,5) - coeff*lhs(3,5) | ||
| 583 | + r(1) = r(1) - coeff*r(3) | ||
| 584 | + | ||
| 585 | + coeff = lhs(2,3) | ||
| 586 | + lhs(2,4)= lhs(2,4) - coeff*lhs(3,4) | ||
| 587 | + lhs(2,5)= lhs(2,5) - coeff*lhs(3,5) | ||
| 588 | + r(2) = r(2) - coeff*r(3) | ||
| 589 | + | ||
| 590 | + coeff = lhs(4,3) | ||
| 591 | + lhs(4,4)= lhs(4,4) - coeff*lhs(3,4) | ||
| 592 | + lhs(4,5)= lhs(4,5) - coeff*lhs(3,5) | ||
| 593 | + r(4) = r(4) - coeff*r(3) | ||
| 594 | + | ||
| 595 | + coeff = lhs(5,3) | ||
| 596 | + lhs(5,4)= lhs(5,4) - coeff*lhs(3,4) | ||
| 597 | + lhs(5,5)= lhs(5,5) - coeff*lhs(3,5) | ||
| 598 | + r(5) = r(5) - coeff*r(3) | ||
| 599 | + | ||
| 600 | + | ||
| 601 | + pivot = 1.00d0/lhs(4,4) | ||
| 602 | + lhs(4,5) = lhs(4,5)*pivot | ||
| 603 | + r(4) = r(4) *pivot | ||
| 604 | + | ||
| 605 | + coeff = lhs(1,4) | ||
| 606 | + lhs(1,5)= lhs(1,5) - coeff*lhs(4,5) | ||
| 607 | + r(1) = r(1) - coeff*r(4) | ||
| 608 | + | ||
| 609 | + coeff = lhs(2,4) | ||
| 610 | + lhs(2,5)= lhs(2,5) - coeff*lhs(4,5) | ||
| 611 | + r(2) = r(2) - coeff*r(4) | ||
| 612 | + | ||
| 613 | + coeff = lhs(3,4) | ||
| 614 | + lhs(3,5)= lhs(3,5) - coeff*lhs(4,5) | ||
| 615 | + r(3) = r(3) - coeff*r(4) | ||
| 616 | + | ||
| 617 | + coeff = lhs(5,4) | ||
| 618 | + lhs(5,5)= lhs(5,5) - coeff*lhs(4,5) | ||
| 619 | + r(5) = r(5) - coeff*r(4) | ||
| 620 | + | ||
| 621 | + | ||
| 622 | + pivot = 1.00d0/lhs(5,5) | ||
| 623 | + r(5) = r(5) *pivot | ||
| 624 | + | ||
| 625 | + coeff = lhs(1,5) | ||
| 626 | + r(1) = r(1) - coeff*r(5) | ||
| 627 | + | ||
| 628 | + coeff = lhs(2,5) | ||
| 629 | + r(2) = r(2) - coeff*r(5) | ||
| 630 | + | ||
| 631 | + coeff = lhs(3,5) | ||
| 632 | + r(3) = r(3) - coeff*r(5) | ||
| 633 | + | ||
| 634 | + coeff = lhs(4,5) | ||
| 635 | + r(4) = r(4) - coeff*r(5) | ||
| 636 | + | ||
| 637 | + | ||
| 638 | + return | ||
| 639 | + end | ||
| 640 | + | ||
| 641 | + | ||
| 642 | + |