SCALASCA v2.6.2 OPEN ISSUES
                        ===========================
                           (Status:  April 2025)

This file lists known limitations and unimplemented features of various
Scalasca components.

-----------------------------------------------------------------------------

* Platform support

   - Scalasca is intended to run on ppc64, x86_64, or aarch64 architectures.

   - Scalasca has been tested on the following platforms:
      + HPE/Cray XC and EX systems with PrgEnvs Cray, GNU, AMD, AOCC, and
        Intel.
      + various Linux (Intel, AMD, ARM) clusters with GNU, Clang, Intel,
        NVIDIA, and AMD compilers.  Fujitsu compilers haven't been tested
        thoroughly, Fujitsu cross-compilers have not been tested at all.

   - The following platforms have not been tested recently, however, the
     supplied build system might still work on those systems:
      + IBM Blue Gene/Q
      + IBM Blue Gene/P
      + AIX-based clusters
      + Cray XT, XE, XK series
      + Fujitsu FX10, FX100, and K computer
      + Intel Xeon Phi (KNC, KNL), with Intel compilers & Intel MPI only
      + Oracle/Sun Solaris/SPARC-based clusters

   - Building Scalasca on 32-bit systems or other platforms/architectures
     might work, but your mileage may vary.  The available configure options
     (see INSTALL) may provide a good basis for building and testing on such
     systems.  Please report success/failure on other platforms to the
     Scalasca development team.

-----------------------------------------------------------------------------

* SCOUT parallel trace analysis

  - If it is not possible to build the required versions of SCOUT, or it
    fails to run reliably, it may be possible to substitute a version built
    with a different compiler (such as GCC) when doing measurement collection
    & analysis (e.g., in a batch job).

  - The MPI and hybrid MPI/OpenMP versions of the SCOUT parallel analyzer
    must be run as an MPI program with exactly the same number of processes
    as contained in the experiment to analyze: typically it will be
    convenient to launch SCOUT immediately following the measurement in a
    single batch script so that the MPI launch command can be configured
    similarly for both steps.  The SCAN nexus executes SCOUT with the
    appropriate launch configuration when automatic trace analysis is
    specified.

  - If the appropriate variant of SCOUT (e.g., scout.hyb for hybrid
    OpenMP/MPI) is not located by SCAN, it attempts to substitute an
    alternative variant which will generally result in only partial trace
    analysis (e.g., scout.mpi will ignore OpenMP events in hybrid traces).

  - SCOUT is unable to analyze traces of applications using MPI inter-
    communicators.

  - SCOUT is unable to analyze hybrid OpenMP/MPI traces of applications using
    MPI_THREAD_MULTIPLE and generally unable to handle MPI_THREAD_SERIALIZED,
    therefore it is necessary to enforce use of MPI_THREAD_FUNNELED.

  - SCOUT is unable to analyze traces of OpenMP or hybrid OpenMP/MPI
    applications using nested parallelism, untied tasks, or using varying
    numbers of threads for parallel regions (e.g., due to "num_threads(#)" or
    "if" clauses).

  - Although SCOUT is able to handle traces including OpenMP tasking events,
    no tasking-specific analysis is performed yet.  Also, calculated wait
    states in OpenMP barriers as well as results of the root-cause and
    critical-path analysis are currently incorrect in the presence of OpenMP
    tasks.

  - For traces including OpenMP tasking events, the CubeGUI and CubeLib
    command-line tools up to and including version 4.5 show incorrect metric
    values for the whole-program call path (existent for traces generated by
    Score-P v5.0 and above).  This is due to accounting tasks twice in the
    inclusify/exclusify calculations, and also applies to derived metrics,
    for example, calculated during the report post-processing.  This issue
    has been addressed in later versions of CubeLib/CubeGUI.

  - Analysis of traces using POSIX threads is performed using the OpenMP and
    OpenMP/MPI versions of SCOUT, which will try to launch one analysis
    thread for every POSIX thread recorded in the trace.  That is, if the
    application repeatedly creates and joins POSIX threads, the analysis is
    likely to oversubscribe the system (thus, potentially increasing analysis
    times significantly), and the OpenMP runtime may not even be able to
    create the required number of analysis threads.

  - SCOUT is unable to analyze traces of applications using CUDA, HIP,
    OpenMP target offloading, or OpenCL that contain events for compute
    kernels and/or memory transfers recorded on separate device locations.
    Yet, analysis of traces including only host-side events from these
    parallel programming APIs as well as OpenACC is possible, also when
    used in combination with MPI and/or multi-threading.  However, care
    has to be taken when interpreting analysis results of such experiments,
    as the critical-path and root-cause analysis only consider host-side
    events.  For example, SCOUT may identify a GPU synchronization operation
    as the delay causing some MPI wait states, while the real culprit are
    the compute kernels causing the synchronization operation to block.

  - SCOUT is unable to analyze traces of applications using SHMEM, even when
    used in combination with MPI and/or OpenMP.

  - SCOUT is unable to analyze traces that include per-process metric events
    on separate locations.  In particular, this applies to memory tracking
    events measured using, for example, "SCOREP_MEMORY_RECORDING=true" or
    "SCOREP_MPI_MEMORY_RECORDING=true".

  - SCOUT is unable to analyze incomplete traces or traces that it is unable
    to load entirely into memory.  Experiment archives are portable to other
    systems where sufficient processors with additional memory are available
    and a compatible version of SCOUT is installed, however, the size of such
    experiment archives typically prohibits this.

  - In rare cases, the SCOUT analysis of traces generated with Score-P v6.0
    or earlier built against OTF2 v2.2 or higher may produce negative time
    values.  This is due to a code inconsistency in the given versions,
    which is fixed in Score-P v7.0 and later (using OTF2 2.2 or higher).

  - SCOUT requires user-specified instrumentation blocks to correctly nest
    and match for it to be able to analyze resulting measurement traces.
    Similarly, collective operations must be recorded by all participating
    processes and messages recorded as sent (or received) by one process must
    be recorded as received (or sent) by another process, otherwise SCOUT can
    be expected to deadlock during trace analysis.

  - SCOUT ignores hardware counter measurements recorded in traces.  If
    measurement included simultaneous runtime summarization and tracing, the
    two reports are automatically combined during experiment post-processing.

  - SCOUT is unable to handle old EPILOG trace files stored in SIONlib
    containers.  Also, the lock contention analysis will not be performed
    for EPILOG traces.

  - SCOUT may deadlock and be unable to analyze measurement experiments:
    should you suspect this to be the case, please save the experiment
    archive and contact the Scalasca development team for it to be
    investigated (see instructions in the User Guide with details on which
    information to include in bug reports).

  - Traces that SCOUT is unable to analyze may still be visualized and
    interactively analyzed by 3rd-party tools such as VAMPIR.

  - Issues related to Score-P's sampling/unwinding feature:

      - The current implementation of handling traces generated with
        sampling/unwinding is known to be memory inefficient.  That is,
        it may only be possible to analyze small trace files.

      - Obviously, limitations of the Score-P measurement system also
        propagate to Scalasca via the generated trace data.  In particular,
        analysis results of programs using OpenMP are likely to include
        unexpected callpaths for worker threads.

      - The aforementioned limitation also leads to the calculation of
        bogus OpenMP thread management times.

-----------------------------------------------------------------------------

* SCAN collection & analysis launcher

  This utility attempts to parse MPI launcher commands to be able to launch
  measurement collection along with subsequent trace analysis when
  appropriate.  It also attempts to determine whether measurement and
  analysis are likely to be blocked by various configuration issues, before
  performing the actual launch(es).  Such basic checks might be invalid in
  certain circumstances, and inhibit legitimate measurement and analysis
  launches.

  While it has been tested with a selection of MPI launchers (on different
  systems, interactively and via batch systems), it is not possible to test
  all versions, combinations and configuration/launch arguments, and if the
  current support is inadequate for a particular setup, details should be
  sent to the developers for investigation.  In general, launcher flags that
  require one or more arguments can be ignored by SCAN if they are quoted,
  e.g., $MPIEXEC -np 32 "-ignore arg1 arg2" target arglist would ignore the
  "-ignore arg1 arg2" flag and arguments.

  Although SCAN parses launcher arguments from the given command-line (and in
  certain cases also launcher environment variables), it does not parse
  launcher configurations from command-files (regardless of whether they are
  specified on the command-line or otherwise).  Since the part of the
  launcher configuration specified in this way is ignored by SCAN, but will
  be used for the measurement and analysis steps launched, this may lead to
  undesirable discrepancies.  If command-files are used for launcher
  configuration, it may therefore be necessary or desirable to repeat some of
  their specifications on the command-line to make it visible to SCAN.

  SCAN only parses the command-line as far as the target executable, assuming
  that subsequent flags/parameters are intended solely for the target itself.
  Unfortunately, some launchers (notably POE) allow MPI configuration options
  after the target executable, where SCAN won't find them and therefore won't
  use them when launching the parallel trace analyzer.  A workaround is to
  specify POE configuration options via environment variables instead, e.g.,
  specify MP_PROCS instead of -procs.

  SCAN uses getopt_long_only() (either from the system's C library or GNU
  libiberty) to parse launcher options.  Older versions seem to have a bug
  that fails to stop parsing when the first non-option (typically the target
  executable) is encountered: a workaround in such cases is to insert "--"
  in the commandline before the target executable, e.g., scan -t mpirun -np 4
  -- target.exe arglist.

  If an MPI launcher is used that is not recognized by SCAN, such as one that
  has been locally customized, it can be specified via an environment
  variable, e.g., SCAN_MPI_LAUNCHER=mympirun, to have SCAN accept it.
  Warning: In such a case, SCAN's parsing of the launcher's arguments may
  fail.

  Some MPI launchers result in some or all program output being buffered
  until execution terminates.  In such cases, SCAN_MPI_REDIRECT can be set to
  redirect program standard and error output to separate files in the
  experiment archive.

  If necessary, or preferred, measurement and analysis launches can be
  performed without using SCAN, resulting in "default" measurement collection
  or explicit trace analysis (based on the effective Score-P configuration).

  SCAN automatic trace analysis of hybrid MPI/OpenMP applications primarily
  is done with an MPI/OpenMP version of the SCOUT trace analyzer
  (scout.hyb).  When this is not available, or when the MPI-only version of
  the trace analyzer (scout.mpi) is specified, analysis results are provided
  for the master threads only.

-----------------------------------------------------------------------------

 * Other known issues

  - In contrast to Score-P v6.0 and above, the Cube metric hierarchy
    remapping specification file shipped with Scalasca currently classifies
    all MPI request finalization calls (i.e., MPI_Test[all|any|some] and
    MPI_Wait[all|any|some]) as point-to-point, regardless of whether they
    actually finalize a point-to-point request, a file I/O request, or a
    non-blocking collective operation.  This causes incorrect results
    when merging already post-processed runtime summary and trace analysis
    reports.  However, this is not an issue when post-processing reports
    after merging.  This inconsistency will be addressed in a future
    release.