Scalasca  (Scalasca 2.2.2, revision 13327)
Scalable Performance Analysis of Large-Scale Applications
Initial summary measurement

The instrumented executable prepared in the previous step can now be executed under the control of the scalasca -analyze (or short scan) convenience command to perform an initial summary measurement:

  % cd bin
  % scalasca -analyze mpiexec -n 64 ./bt.D.64
  S=C=A=N: Scalasca 2.2 runtime summarization
  S=C=A=N: ./scorep_bt_64_sum experiment archive
  S=C=A=N: Thu Jan 29 22:05:09 2015: Collect start
  mpiexec -n 64  [...]  ./bt.D.64

   NAS Parallel Benchmarks 3.3 -- BT Benchmark

   No input file Using compiled defaults
   Size:  408x 408x 408
   Iterations:  250    dt:   0.0000200
   Number of active processes:    64

   Time step    1
   Time step   20
   Time step   40
   Time step   60
   Time step   80
   Time step  100
   Time step  120
   Time step  140
   Time step  160
   Time step  180
   Time step  200
   Time step  220
   Time step  240
   Time step  250
   Verification being performed for class D
   accuracy setting for epsilon =  0.1000000000000E-07
   Comparison of RMS-norms of residual
             1 0.2533188551738E+05 0.2533188551738E+05 0.1479210131727E-12
             2 0.2346393716980E+04 0.2346393716980E+04 0.8488743310506E-13
             3 0.6294554366904E+04 0.6294554366904E+04 0.3034271788588E-14
             4 0.5352565376030E+04 0.5352565376030E+04 0.8597827149538E-13
             5 0.3905864038618E+05 0.3905864038618E+05 0.6650300273080E-13
   Comparison of RMS-norms of solution error
             1 0.3100009377557E+03 0.3100009377557E+03 0.1373406191445E-12
             2 0.2424086324913E+02 0.2424086324913E+02 0.1582835864248E-12
             3 0.7782212022645E+02 0.7782212022645E+02 0.4053872777553E-13
             4 0.6835623860116E+02 0.6835623860116E+02 0.3762882153975E-13
             5 0.6065737200368E+03 0.6065737200368E+03 0.2474004739002E-13
   Verification Successful

   BT Benchmark Completed.
   Class           =                        D
   Size            =            408x 408x 408
   Iterations      =                      250
   Time in seconds =                   940.81
   Total processes =                       64
   Compiled procs  =                       64
   Mop/s total     =                 62006.00
   Mop/s/process   =                   968.84
   Operation type  =           floating point
   Verification    =               SUCCESSFUL
   Version         =                      3.3
   Compile date    =              29 Jan 2015

   Compile options:
      MPIF77       = scorep mpif77
      FLINK        = $(MPIF77)
      FMPI_LIB     = (none)
      FMPI_INC     = (none)
      FFLAGS       = -O2
      FLINKFLAGS   = -O2
      RAND         = (none)

   Please send the results of this run to:

   NPB Development Team

   If email is not available, send this to:

   MS T27A-1
   NASA Ames Research Center
   Moffett Field, CA  94035-1000

   Fax: 650-604-3957

  S=C=A=N: Thu Jan 29 22:20:59 2015: Collect done (status=0) 950s
  S=C=A=N: ./scorep_bt_64_sum complete.

  % ls scorep_bt_64_sum
  profile.cubex  scorep.cfg  scorep.log

As can be seen, the measurement run successfully produced an experiment directory scorep_bt_64_sum containing

However, application execution took about twice as long as the reference run (940.81 vs. 462.95 seconds). That is, instrumentation and associated measurements introduced a non-negligible amount of overhead. While it is possible to interactively examine the generated summary result file using the Cube report browser, this should only be done with great caution since the substantial overhead negatively impacts the accuracy of the measurement.

Scalasca    Copyright © 1998–2015 Forschungszentrum Jülich GmbH, Jülich Supercomputing Centre
Copyright © 2009–2015 German Research School for Simulation Sciences GmbH, Laboratory for Parallel Programming