|
Scalasca
(Scalasca 2.2.2, revision 13327)
Scalable Performance Analysis of Large-Scale Applications
|
The instrumented executable prepared in the previous step can now be executed under the control of the scalasca -analyze (or short scan) convenience command to perform an initial summary measurement:
% cd bin
% scalasca -analyze mpiexec -n 64 ./bt.D.64
S=C=A=N: Scalasca 2.2 runtime summarization
S=C=A=N: ./scorep_bt_64_sum experiment archive
S=C=A=N: Thu Jan 29 22:05:09 2015: Collect start
mpiexec -n 64 [...] ./bt.D.64
NAS Parallel Benchmarks 3.3 -- BT Benchmark
No input file inputbt.data. Using compiled defaults
Size: 408x 408x 408
Iterations: 250 dt: 0.0000200
Number of active processes: 64
Time step 1
Time step 20
Time step 40
Time step 60
Time step 80
Time step 100
Time step 120
Time step 140
Time step 160
Time step 180
Time step 200
Time step 220
Time step 240
Time step 250
Verification being performed for class D
accuracy setting for epsilon = 0.1000000000000E-07
Comparison of RMS-norms of residual
1 0.2533188551738E+05 0.2533188551738E+05 0.1479210131727E-12
2 0.2346393716980E+04 0.2346393716980E+04 0.8488743310506E-13
3 0.6294554366904E+04 0.6294554366904E+04 0.3034271788588E-14
4 0.5352565376030E+04 0.5352565376030E+04 0.8597827149538E-13
5 0.3905864038618E+05 0.3905864038618E+05 0.6650300273080E-13
Comparison of RMS-norms of solution error
1 0.3100009377557E+03 0.3100009377557E+03 0.1373406191445E-12
2 0.2424086324913E+02 0.2424086324913E+02 0.1582835864248E-12
3 0.7782212022645E+02 0.7782212022645E+02 0.4053872777553E-13
4 0.6835623860116E+02 0.6835623860116E+02 0.3762882153975E-13
5 0.6065737200368E+03 0.6065737200368E+03 0.2474004739002E-13
Verification Successful
BT Benchmark Completed.
Class = D
Size = 408x 408x 408
Iterations = 250
Time in seconds = 940.81
Total processes = 64
Compiled procs = 64
Mop/s total = 62006.00
Mop/s/process = 968.84
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3
Compile date = 29 Jan 2015
Compile options:
MPIF77 = scorep mpif77
FLINK = $(MPIF77)
FMPI_LIB = (none)
FMPI_INC = (none)
FFLAGS = -O2
FLINKFLAGS = -O2
RAND = (none)
Please send the results of this run to:
NPB Development Team
Internet: npb@nas.nasa.gov
If email is not available, send this to:
MS T27A-1
NASA Ames Research Center
Moffett Field, CA 94035-1000
Fax: 650-604-3957
S=C=A=N: Thu Jan 29 22:20:59 2015: Collect done (status=0) 950s
S=C=A=N: ./scorep_bt_64_sum complete.
% ls scorep_bt_64_sum
profile.cubex scorep.cfg scorep.log
As can be seen, the measurement run successfully produced an experiment directory scorep_bt_64_sum containing
profile.cubex,scorep.cfg, andscorep.log.However, application execution took about twice as long as the reference run (940.81 vs. 462.95 seconds). That is, instrumentation and associated measurements introduced a non-negligible amount of overhead. While it is possible to interactively examine the generated summary result file using the Cube report browser, this should only be done with great caution since the substantial overhead negatively impacts the accuracy of the measurement.
![]() |
Copyright © 1998–2015 Forschungszentrum Jülich GmbH,
Jülich Supercomputing Centre
Copyright © 2009–2015 German Research School for Simulation Sciences GmbH, Laboratory for Parallel Programming |