![]() |
Scalasca
(Scalasca 2.2.2, revision 13327)
Scalable Performance Analysis of Large-Scale Applications
|
As a first step of every performance analysis, a reference execution using an uninstrumented executable should be performed. On the one hand, this step verifies that the code executes cleanly and produces correct results, and on the other hand later allows to assess the overhead introduced by instrumentation and measurement. At this stage an appropriate test configuration should be chosen, such that it is both repeatable and long enough to be representative. (Note that excessively long execution durations can make measurement analysis inconvenient or even prohibitive, and therefore should be avoided.)
After unpacking the NPB-MPI source archive, the build system has to be adjusted to the respective environment. For the NAS benchmarks, this is accomplished by a Makefile snippet defining a number of variables used by a generic Makefile. This snippet is called make.def
and has to reside in the config/
subdirectory, which already contains a template file that can be copied and adjusted appropriately. In particular, the MPI Fortran compiler wrapper and flags need to be specified, for example:
MPIF77 = mpif77 FFLAGS = -O2 FLINKFLAGS = -O2
Note that the MPI C compiler wrapper and flags are not used for building BT, but may also be set accordingly to experiment with other NPB benchmarks.
Next, the benchmark can be built from the top-level directory by running make
, specifying the number of MPI ranks to use (for BT, this is required to be a square number) as well as the problem size on the command line:
% make bt NPROCS=64 CLASS=D ========================================= = NAS Parallel Benchmarks 3.3 = = MPI/F77/C = ========================================= cd BT; make NPROCS=64 CLASS=D SUBTYPE= VERSION= make[1]: Entering directory `/tmp/NPB3.3-MPI/BT' make[2]: Entering directory `/tmp/NPB3.3-MPI/sys' cc -g -o setparams setparams.c make[2]: Leaving directory `/tmp/NPB3.3-MPI/sys' ../sys/setparams bt 64 D make[2]: Entering directory `/tmp/NPB3.3-MPI/BT' mpif77 -c -O2 bt.f mpif77 -c -O2 make_set.f mpif77 -c -O2 initialize.f mpif77 -c -O2 exact_solution.f mpif77 -c -O2 exact_rhs.f mpif77 -c -O2 set_constants.f mpif77 -c -O2 adi.f mpif77 -c -O2 define.f mpif77 -c -O2 copy_faces.f mpif77 -c -O2 rhs.f mpif77 -c -O2 solve_subs.f mpif77 -c -O2 x_solve.f mpif77 -c -O2 y_solve.f mpif77 -c -O2 z_solve.f mpif77 -c -O2 add.f mpif77 -c -O2 error.f mpif77 -c -O2 verify.f mpif77 -c -O2 setup_mpi.f cd ../common; mpif77 -c -O2 print_results.f cd ../common; mpif77 -c -O2 timers.f make[3]: Entering directory `/tmp/NPB3.3-MPI/BT' mpif77 -c -O2 btio.f mpif77 -O2 -o ../bin/bt.D.64 bt.o make_set.o initialize.o exact_solution.o exact_rhs.o set_constants.o adi.o define.o copy_faces.o rhs.o solve_subs.o x_solve.o y_solve.o z_solve.o add.o error.o verify.o setup_mpi.o ../common/print_results.o ../common/timers.o btio.o make[3]: Leaving directory `/tmp/NPB3.3-MPI/BT' make[2]: Leaving directory `/tmp/NPB3.3-MPI/BT' make[1]: Leaving directory `/tmp/NPB3.3-MPI/BT'
Valid problem classes (of increasing size) are W, S, A, B, C, D and E, and can be used to adjust the benchmark runtime to the execution environment. For example, class W or S is appropriate for execution on a single-core laptop with 4 MPI ranks, while the other problem sizes are more suitable for "real" configurations.
The resulting executable encodes the benchmark configuration in its name and is placed into the bin/
subdirectory. For the example make
command above, it is named bt.D.64
. This binary can now be executed, either via submitting an appropriate batch job (which is beyond the scope of this user guide) or directly in an interactive session.
% cd bin % mpiexec -n 64 ./bt.D.64 NAS Parallel Benchmarks 3.3 -- BT Benchmark No input file inputbt.data. Using compiled defaults Size: 408x 408x 408 Iterations: 250 dt: 0.0000200 Number of active processes: 64 Time step 1 Time step 20 Time step 40 Time step 60 Time step 80 Time step 100 Time step 120 Time step 140 Time step 160 Time step 180 Time step 200 Time step 220 Time step 240 Time step 250 Verification being performed for class D accuracy setting for epsilon = 0.1000000000000E-07 Comparison of RMS-norms of residual 1 0.2533188551738E+05 0.2533188551738E+05 0.1479210131727E-12 2 0.2346393716980E+04 0.2346393716980E+04 0.8488743310506E-13 3 0.6294554366904E+04 0.6294554366904E+04 0.3034271788588E-14 4 0.5352565376030E+04 0.5352565376030E+04 0.8597827149538E-13 5 0.3905864038618E+05 0.3905864038618E+05 0.6650300273080E-13 Comparison of RMS-norms of solution error 1 0.3100009377557E+03 0.3100009377557E+03 0.1373406191445E-12 2 0.2424086324913E+02 0.2424086324913E+02 0.1582835864248E-12 3 0.7782212022645E+02 0.7782212022645E+02 0.4053872777553E-13 4 0.6835623860116E+02 0.6835623860116E+02 0.3762882153975E-13 5 0.6065737200368E+03 0.6065737200368E+03 0.2474004739002E-13 Verification Successful BT Benchmark Completed. Class = D Size = 408x 408x 408 Iterations = 250 Time in seconds = 462.95 Total processes = 64 Compiled procs = 64 Mop/s total = 126009.74 Mop/s/process = 1968.90 Operation type = floating point Verification = SUCCESSFUL Version = 3.3 Compile date = 29 Jan 2015 Compile options: MPIF77 = mpif77 FLINK = $(MPIF77) FMPI_LIB = (none) FMPI_INC = (none) FFLAGS = -O2 FLINKFLAGS = -O2 RAND = (none) Please send the results of this run to: NPB Development Team Internet: npb@nas.nasa.gov If email is not available, send this to: MS T27A-1 NASA Ames Research Center Moffett Field, CA 94035-1000 Fax: 650-604-3957
Note that this application verified its successful calculation and reported the associated wall-clock execution time for the core computation.
![]() |
Copyright © 1998–2015 Forschungszentrum Jülich GmbH,
Jülich Supercomputing Centre
Copyright © 2009–2015 German Research School for Simulation Sciences GmbH, Laboratory for Parallel Programming |