Changes.log 14.9 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428
###########################################
# Modification History of NPB3.x          #
# ------------------------------          #
#   NPB development team                  #
#   NASA Ames Research Center             #
#   npb@nas.nasa.gov                      #
#   http://www.nas.nasa.gov/Software/NPB/ #
###########################################

------------------------------------------------------
Changes in NPB3.3.1
      (NPB3.3-SER, NPB3.3-OMP, NPB3.3-MPI )
------------------------------------------------------
[17-Feb-09]

This is a bug fixing release of NPB3.3.

1. All versions

 - sys/setparams.c: fixed a problem in dealing with quoted (") flags
   from make.def when producing npbparams.h for C.

 - CG: ensure 'implicit none' used in all subroutines.

2. MPI version

 - Additional timers can be used for profiling purpose, similar
   to those already included in the OMP and SER versions.

 - LU:
   * code clean up (suggested by Rob Van der Wijngaart)
      > avoid using MPI_ANY_SOURCE in exchange_*.f, which might 
        alter performance in some cases.
      > delete references to sethyper and 'icomm*', which are 
        no longer used since NPB2.2.
   * change the low-bound limit on the sub-domain size in subdomain.f
     from 4 to 3 in order to increase allowable process counts.
   * allow number of processes other than power of two.

 - FT: fix a non-portable way of broadcasting input parameters
      (pointed out by Art Lazanoff)

 - BT: include 'btio_cleanup' as part of the I/O timing

3. OMP and SER versions

 - DC: fix access to out-of-bound array elements in adc.c
      Reported by Per Larsen of Demark <pl@imm.dtu.dk>

 - UA: fix the use of uninitialized array 'sje' in mortar_vertex() by
      adding "call nr_init[_omp](sje,4*6*nelt,0)" in the main program.

 - MG, UA: include additional timers for profiling purpose.

 - Executables now use ".x" as a name extension


------------------------------------------------------
Changes in NPB3.3
      (NPB3.3-SER, NPB3.3-OMP, NPB3.3-MPI )
------------------------------------------------------
[02-Aug-07]

1. New and improvements

 - The Class E problem has been introduced in seven of the benchmarks
   (BT, SP, LU, CG, MG, FT, and EP) in all three implementations.

 - The Class D problem has been added to the IS benchmark in all 
   three implementations.  It requires the compiler support of 
   64-bit "long" type in C.  The MPI version of IS now allows runs 
   up to 1024 processes.

 - The Bucket Sort option (USE_BUCKETS) has been added to
   the OpenMP version of IS and made as the default.

 - Introduced the "twiddle" array in the OpenMP FT benchmark,
   which has been used in the MPI and SER versions and seems 
   to improve performance for larger problem sizes.

 - Merged vector codes for the BT and LU benchmarks into
   the release.

 - Updates to BTIO (MPI/BT with IO subtypes):
    * added I/O stats (I/O timing, data size written, I/O data rate)
    * added an option for interleaving reads between writes through
      the inputbt.data file.  Although the data file size would be
      smaller as a result, the total amount of data written is still
      the same.

 - Made documents more consistent throughout different versions
   (README and README.install).

2. Bug fixes

 - MPI/FT: fixed a verification failure for cases where NX/=NY 
   and the 2D decomposition are used.  The bug occurred at least
   for (Class D, NPROCS=2048) and (Class B, NPROCS=512).

   fixed an output printing format problem occurred when 
   the number of processes >= 1000.

 - MPI/SP: fixed a performance regression due to improper
   padding of array dimensions.

 - MPI/IS: minor fix to support large processor counts (>=512).

 - OMP/UA: fixed a race condition in mason.f, avoided the use 
   of the LASTPRIVATE directive.

 - OMP/LU: minor fix in data flushing for pipelining.

 - DC: There are a number of fixes -
   * fixed segmentation fault in both OMP and SER versions
     caused by accessing zero-length array elements.
     Reported by Jeff Odom <jodom@cs.umd.edu>.

   * fixed a race in reporting benchmark timing in the OMP version

   * fixed the use of timer in the OMP version, which limited
     the number of threads to 64.  The number of threads is now
     lifted to a maximum of MAX_NUMBER_OF_TASKS (=256).

   * made the benchmark output consistent with other NPBs.

 - fixed a use of uninitialized variable in MPI/sys/setparams.c.
   setparams in all three versions was updated to deal with 
   make.def that contains carriage-return character ('\r').

 - SER/FT: added 'implicit none' to all missing places.

 - SER/IS: fixed missing variable declarations for the Bucket 
   Sort option (when USE_BUCKETS is defined).

3. Others

 - The default value for collbuf_nodes in the BT I/O benchmark
   is now set to 0, indicating no file hints will be used.
   The setting can be changed by using the "inputbt.data" file.

 - The hyperplane version of LU (LU-HP) is no longer included 
   in the distribution.


------------------------------------------------------
Changes in NPB3.2.1
      (NPB3.2-SER, NPB3.2-OMP, NPB3.2-MPI )
------------------------------------------------------
[27-Jul-05]

This is a bug fixing release of NPB3.2.

1. MPI version
  - sys/setparams.c: removed a duplicated statement for writing
      FT parameters and made invalid SUBTYPE as an error condition.
      The 'duplicated statement' problem was fixed in NPB3.2 (See 
      the note below).  However, during the final updating process, 
      the fix was left out, even though the log file was updated.

  - BT: included SUBTYPE=EPIO in the I/O verification.

  - LU: bcast_inputs.f: fixed wrong data type (dp_type) used for 
      communicating integers (nx0,ny0,nz0) with the correct type 
      MPI_INTEGER.

  - MG: fixed a mis-calculation of parameter "nr" in globals.h 
      that caused run-time failure for NPROCS >= 512 
      (reported by Donald Ferry of Cray).  Expanded to limit to 
      131072 processes and added an error checking code.

      The use of MPI_ANY_SOURCE for MPI_Irecv inside subroutine
      ready() could cause MPI_Wait return a message meant for
      the wrong k.  The problem is fixed with nbr(axis,-dir,k)
      in place of MPI_ANY_SOURCE in the call to MPI_Irecv
      (reported and suggested by Hideo Saito).

2. OpenMP version
  - EP: use THREADPRIVATE for working array storage. It should not
      change performance but made some compiler happier.

  - LU: add variable "v" to FLUSH to ensure solution data properly 
      flushed for pipeline.  This change is needed according to
      the OpenMP 2.5 standard.

  - IS: reorganized working buffers so that the count for key 
      population could be more naturally performed.  This version
      uses much less stack space.

  - UA: implemented atomic updates with locks in order to achieve
      better scaling on those systems that have an inefficient
      (or even buggy) ATOMIC implementation.


------------------------------------------------------
Changes in NPB3.2
      (NPB3.2-SER, NPB3.2-OMP, NPB3.2-MPI )
------------------------------------------------------
[07-Jan-05]

1. DC version in NPB3.2-SER was converted to C from C++
   (CLASSES S, W, A, B). 
   sys/setparams.c file was changed appropriately.
   
2. OpenMP version of DC was added to NPB3.2-OMP.

3. Data Traffic benchmark DT was added to NPB3.2-MPI.

[24-May-04]

All versions:
   - use assumed shape "(*)" declaration in CG
   - fixed the use of an uninitialized variable in EP
   - avoid using integer array for assumed shape dimensions in FT
   - fix in UA:
      * fix the reference to file "inputua.data"
      * avoid overindexing
      * avoid reference to out-of-bound array elements
      * change declaration "real*8" to "double precision"

OMP version:
   - explicitly added "SCHEDULE(STATIC)" to the OMP version
   - use the "omp_get_wtime()" function for timer if available
   - removed the call to "getenv" for portability
   - change in UA:
      * implemented an alternative approach for atomic update

MPI version:
   - removed a duplicated declaration in FT (from setparams.c)
   - removed a duplicated declaration in BT/full_mpiio.f
   - fixed a missing "NPROCS=" in sys/suite.awk


------------------------------------------------------
Changes in NPB3.1
      (NPB3.1-MPI, NPB3.1-SER, NPB3.1-OMP)
------------------------------------------------------
[22-Apr-04] NPB3.1-MPI

Merged the NPB2.4-MPI branch into NPB3.1 with the following changes.

  - Optimized the BT memory usage.  The new version is about 1/3 of
    the memory used in NPB2.x.
  - Fixed a bug in CG for running on a large number of processes
  - Redefined the Class W size in MG so that the verification value
    will not be too small. (see below for SER & OMP versions)
  - Use the relative errors for verification in both CG and MG
  - Fixed a race in 'make suite'

[08-Apr-04] NPB3.1-SER and NPB3.1-OMP

The following changes are made in both NPB3.1-SER and NPB3.1-OMP.

1. Added the Class D problem
   - verification values taken from NPB2.4-MPI
   - modified variables to fit in large problem

2. Improvements for LU and LU-HP:
   - reduced the memory usage for the 'tv' variable in LU and LU-HP
   - a more efficient memory access for variables "a,b,c,d" in LU-HP
   - a dummy iteration added before the time step loop for consistency
     with other benchmarks

3. Improvement and fix in MG:
   - verification in MG now uses the relative error
     (instead of the absolute error).  This will avoid incorrect
     verification for small reference values.
   - redefined the class size for Class W so that the verification
     value will not be too small.
     In version 3.0 and earlier: 64x64x64,    40 iters
     New size in version 3.1   : 128x128x128, 4 iters
   - fixed incorrect verification values for Classes A and C.

4. CG:
   - use relative error for verification
   - clean up codes for matrix initialization (makea).
     The new code uses about 1/2 memory of the previous version.

5. Fixed makefile related issues
   - fixed dependence on make.def for files in common.
   - fixed a race in 'make suite'
   - added 'LU-HP' as a valid benchmark option in makefiles

The following changes are made in NPB3.1-OMP.

1. Included a hyper-plane version of the LU benchmark: LU-HP
   - based on the serial version

2. The dummy 'omp_lib_dum' library is not longer used for compilation 
   without an OpenMP compiler. Conditional compilation is now used.

3. Parallelization of the initialization part of MG.
   It improves the turn-around time quite a bit for the larger
   classes, such as class D.

4. Parallelize codes for matrix initialization (makea) in CG.
   The new code uses about 2/3 memory of the version in NPB3.0-OMP.

5. Code clean up in SP so that the structure is more consistent
   with the serial version.



------------------------------------------------------
Changes in NPB2.x MPI version
------------------------------------------------------

Changes in 2.4.1
- fixed error in BT/Makefile (replaced "==" with "=")
- added stub function accumulate_norms in BT/btio.f
- changed type of Class B verification constants in BT/verify.f from 
  single to double precision
                                                       
Changes in 2.4
- Added I/O benchmark (subtype of BT).
- Added Class D for all benchmarks except IS.
- Reduced size of tabulated exponentials in FT.
- Made minor changes to FT to prevent integer overflow for class D on 
  systems with 32-bit integers. FT class D will not run on small 
  numbers of processors anymore.


------------------------------------------------------
Changes in non-MPI versions of NPB (previously PBN3.0)
      (NPB3.0-SER, NPB3.0-HPF, NPB3.0-OMP, NPB3.0-JAV)
------------------------------------------------------

[01-Mar-99] Initial Beta Release.

[06-Apr-99] Based on report from Charles Grassl and Ramesh Menon (SGI).

   1. NPB-SER, FT: file auxfnct.f -
      lines 74 and 75 were interchanged:

      double complex u0(d1+1,d2,d3), tmp(maxdim)
      integer d1,d2,d3

   2. NPB-OMP: The OpenMP standards requires reduction variable be scalars,
      thus, changes made to remove the use of array variable for reduction.
      Relevant modifications in EP, CG, LU, SP, and BT

   3. NPB-OMP: Remove compiler warnings of "Referenced scalar variables 
      use defaults" by declaring explicitly as shared.
      Relevant modifications in FT, LU, and BT

   4. NPB-OMP, README.openmp: Explicitly spell out the requirement of
      the static scheduling (setenv OMP_SCHEDULE "static").


[05-Oct-99] NPB3.0-non-MPI Beta Release (02)

General change to all (NPB-SER, NPB-HPF, NPB-OMP) -
   1. Update header information for all benchmarks.

   2. Allow continuation lines in 'make.def' (modification done
      in sys/setparams.c).

Change made in NPB-OMP -
   1. 'print_results' now prints Number-Of-Threads and Mflops/s/thread.
      The printed number is the activated threads during the run, which
      may not be the same as what's requested.

   2. A initial data touch loop for array A is added in CG.

   3. 'CRITICAL' section is used for reduction with array.
      Relevant changes in EP, CG, LU, SP, and BT.

   4. Reconfigure 'make.def' such that 'omp_lib_dum' can be activated
      from the file for no directive compilation.

   5. The "!$OMP END DO" seems needed before "!$OMP MASTER" in rhs.f
      for both BT and SP for some f90 compilers.

   6. "SCHEDULE(STATIC)" are used for the pipeline in LU to ensure
      compliance with the OMP standard.

Change made in NPB-HPF -
   1. 'print_results' now prints Number-Of-Processes and Mflops/s/process.

   2. Use more consistent output format (via print_results).

   3. More consistent makefiles (via config/make.def).


[04-Apr-00] NPB3.0-non-MPI Beta Release (03)

Change made in NPB-OMP -
   1. The OpenMP-C version of IS has been added, including more timers.

   2. 'cprint_results' includes Number-Of-Threads and Mflops/s/thread.

Change made in NPB-SER -
   1. More timers included in IS.

NPB-JAV has been included in NPB3.0-non-MPI.


[31-May-01] NPB3.0-non-MPI Beta Release (04)

Change made in NPB-OMP -
   1. NPB-OMP/LU: Failure in verification for number of threads greater 
      than the problem size is now fixed.

   2. If OMP_NUM_THREADS is unset, the printout will report as "unset"
      instead of "1"

   3. NPB-OMP/IS: Allocating work_buff on the stack seems to cause problem
      for large problem size (CLASS C).  "work_buff" is now allocated
      by "malloc" on the heap for CLASS C.

   4. NPB-OMP/IS: Reported by <RaeLyn.Crowell@compaq.com> - potential
      synchronization problem could arise due to the use of "static"
      variables inside "randlc()".  Declaration of these static variables
      are moved out of randlc() and put in the threadprivate directive.

General change to all (NPB-SER, NPB-HPF, NPB-OMP) -
   1. Cleanup in makefiles


[28-Aug-02] The Official NPB3.0 Release

Change made in all -
   1. Fixed a bogus verification for "NaN".

   2. Name change from "PBN3.0" to "NPB3.0". Updated all the banners.

   3. NPB-SER/FT: use a derived version from NPB2.3-serial.

   4. NPB-HPF/FT: use a consistent printing format.