/**************************************************************************** ** SCALASCA http://www.scalasca.org/ ** ***************************************************************************** ** Copyright (c) 1998-2013 ** ** Forschungszentrum Juelich GmbH, Juelich Supercomputing Centre ** ** ** ** Copyright (c) 2009-2013 ** ** German Research School for Simulation Sciences GmbH ** ** ** ** Copyright (c) 2003-2008 ** ** University of Tennessee, Innovative Computing Laboratory ** ** ** ** See the file COPYRIGHT in the package base directory for details ** ****************************************************************************/ SCALASCA v1.4.3 OPEN ISSUES =========================== Status: March 2013 This file lists known limitations and unimplemented features of various SCALASCA and KOJAK components. -------------------------------------------------------------------------------- * Platform support - SCALASCA has been tested on the following platforms: + IBM Blue Gene/P & Blue Gene/Q + IBM SP & BladeCenter clusters + Cray XT series (including XE & XK) + SGI Altix/ICE + NEC SX-9 + Fujitsu FX10 & K computer + various Linux/Intel (x86/x64) clusters (including Intel Xeon Phi) The supplied Makefile definition files may provide a good basis for building and testing the toolset on other systems. - The following platforms have not been tested recently: + IBM Blue Gene/L + Cray XT3/4 + Sun Solaris/SPARC-based clusters + SiCortex systems + other NEC SX systems However the supplied makefile definition files might still work on these systems. - Only the IBM XL compilers are currently supported on Blue Gene systems. Using the GCC compilers is not supported on this platform. - CCE/Cray Fortran compiler (crayftn) limitations may preclude using OPARI2 to instrument OpenMP (and POMP-annotated) sources. In such cases, measurement of OpenMP and hybrid OpenMP/MPI applications is not possible. - Automatic hardware topology recording is currently only implemented for IBM Blue Gene and Cray XT systems. - Each toolset installation can only support one MPI implementation (because MPI is only source-code but not binary compatible). If your systems support more than one MPI implementation (e.g. Linux clusters often do), separate builds for each MPI implementation have to be installed. - The same is true if your system features more than one compiler supporting automatic function instrumentation. (see also next section) - When using IBM XL Fortran compilers (on AIX or Linux PPC): As the IBM XL Fortran compilers encode subroutine names in lower case without additional underscores, SCALASCA/KOJAK measurement (which is implemented in C) of Fortran applications will fail if this application uses Fortran subroutine names which are the same as common C standard routines (e.g., open, close, fopen, fclose, mkdir, rename). - On platforms where we need to generate wrappers for MPI Fortran routines, such as IBM POE, SGI MPT and Intel MPI, the wrappers may result in value corruption when using LOGICAL parameters. - SCALASCA provides only static measurement libraries, and instrumentation or measurement may fail, be incomplete or otherwise unreliable if used with shared objects or dynamically-linked application libraries. Static linking improves efficiency and reliability of performance measurements in general, and is therefore highly recommended. -------------------------------------------------------------------------------- * SKIN application executable instrumenter Automatic instrumentation via "skin" or "kinst" is based on (often undocumented) compiler switches - GNU : tested with GCC 3 and higher - PGI : on Cray XT, cannot handle compiling and linking in one step - CCE : on Cray XT, tested with version 7.1 and higher - Sun : only works for Fortran (not C/C++) - IBM : only works for xlc/xlC version 7.0 and xlf version 9.1 and higher and corresponding bgxl compilers on BlueGene systems - Intel: only works with Intel icc/ifort version 10 and higher compilers - NEC : tested on NEC SX-8/9 - Fujitsu: tested on Fujitsu FX10 & K computer - Pathscale: works as for GCC4 Support for Intel 10+ and PGI 8+ compilers is based on (older) vendor-specific interfaces, which are configured by default. These newer compilers also support the GNU instrumentation interface which can be configured manually by copying the compiler interface configuration section of mf/Makefile.defs.linux-gnu into the generated Makefile.defs. (Intel 9 compilers may be able to use the Intel interface, but would be restricted to a single compilation unit.) IBM XL compilers have two incompatible instrumentation interfaces. When Scalasca is configured using XLC/XLF < 11/13, the old undocumented interface is selected which works with any XLC/XLF > 7.1.1/9.1.1 but has significantly higher overhead. If Scalasca is configured & installed using XLC/XLF >= 11/13 a new interface with significanly lower runtime overhead is used, however, it is incompatible and can't be used with older versions of the compilers. Measurements of instrumented C++ applications will show decorated/mangled rather than demangled routine names if demangling with libiberty is disabled. Measurement filtering can only be applied to functions instrumented by the CCE, Fujitsu, IBM, GNU/Pathscale, Intel and PGI compilers. (Filtering of MPI functions, OpenMP and user-instrumented regions is always ineffective, however, EPK_MPI_ENABLED can be set to desired categories of MPI operations.) Function instrumentation based on using the GNU interface has the limitation that instrumented functions in dynamically loaded (shared) libraries are not measured (i.e., implicitly filtered). When using the Intel interface, instrumented functions that are in dynamically loaded (shared) libraries are measured, however, they cannot be filtered (i.e., filters for them are ineffective). The GNU Fortran compiler versions 4.6.0 and 4.6.1 have a bug which leads to an internal compiler error when using automatic function instrumentation. It is therefore recommended to either use an older/newer version of the compiler or to work around this issue by using manual instrumentation or automatic source-code instrumentation based on PDToolkit. Because not all compilers support function instrumentation, and most of them have various limitations, an alternative is to use "skin -pomp" together with "POMP directives" for function instrumentation (see Scalasca User Guide, instrumentation section) which portably works on most supported platforms. "skin -pomp" (or "skin -comp=none") can also be used when automatic function instrumentation is not desired, such as for measurement of only MPI routines. (In this case, only the final link command needs to be prefixed.) For MPI+OpenMP codes, link-only instrumentation will result in measurements containing only MPI routines: the OpenMP threads are included without any measurement data. Generally, it will be preferable to ignore the OpenMP threads entirely by explicitly using "skin -comp=none -mode=MPI". The instrumenter utilities ("skin", "kinst" & "kinst-pomp") attempt to determine the appropriate EPIK measurement library to link by parsing the compilation/link commands. If the compiler/linker is not recognised as an MPI compiler front-end and no MPI library is explicitly linked, an EPIK measurement library without MPI support is used, and measurement will appear to consist of independent (non-MPI) processes. A workaround in this case may be to explicitly specify a redundant "-lmpi" or "-lmpich" when linking (which needs to be specified in such a location on the link line that it comes before a second one implicitly inserted by the linker). This workaround is not required on Cray XT systems, as these are explicitly configured to *always* link an EPIK measurement library with MPI support. For pure OpenMP (without MPI) on Cray systems it may therefore be necessary to explicitly specify "-mode=omp". For compilers which enable OpenMP by default (such as CCE compilers), it is necessary to explicitly include the appropriate OpenMP activation switch (e.g., -homp) so that the instrumenter also enables OpenMP instrumentation. During installation, a default build mode (32- or 64-bit) is determined, and this is then implied when a build mode is not explicitly specified during compilation and linking. On systems with different defaults for MPI and non-MPI builds, the build mode determined by the instrumenter may be wrong and linking will fail with incompatible object formats. A workaround in this case is to explicitly specify the build mode when linking to ensure that the correct measurement library version is chosen. Using SKIN as a compiler/linker preposition is sometimes not supported by application build systems, e.g., CMake. In such cases, it may be appropriate to create scripts which invoke the compiler/linker with the SKIN preposition defined. For example, specify skin_ftn as the compiler and linker during application configuration, where this is a script executing your Fortran compiler as "skin ftn $*". -------------------------------------------------------------------------------- * SCAN collection & analysis launcher This utility attempts to parse MPI launcher commands to be able to launch measurement collection along with subsequent trace analysis when appropriate. It also attempts to determine whether measurement and analysis are likely to be blocked by various configuration issues, before performing the actual launch(es). Such basic checks might be invalid in certain circumstances, and inhibit legitimate measurement and analysis launches. While it has been tested with a selection of MPI launchers (on different systems, interactively and via batch systems), it is not possible to test all versions, combinations and configuration/launch arguments, and if the current support is inadequate for a particular setup, details should be sent to the developers for investigation. In general, launcher flags that require one or more arguments can be ignored by SCAN if they are quoted, e.g., $MPIEXEC -np 32 "-ignore arg1 arg2" target arglist would ignore the "-ignore arg1 arg2" flag and arguments. Although SCAN parses launcher arguments from the given command-line (and in certain cases also launcher environment variables), it does not parse launcher configurations from command-files (regardless of whether they are specified on the command-line or otherwise). Since the part of the launcher configuration specified in this way is ignored by SCAN, but will be used for the measurement and analysis steps launched, this may lead to undesirable discrepancies. If command-files are used for launcher configuration, it may therefore be necessary or desirable to repeat some of their specifications on the command-line to make it visible to SCAN. SCAN only parses the command-line as far as the target executable, assuming that subsequent flags/parameters are intended solely for the target itself. Unfortunately, some launchers (notably POE) allow MPI configuration options after the target executable, where SCAN won't find them and therefore won't use them when launching the parallel trace analyzer. A workaround is to specify POE configuration options via environment variables instead, e.g., specify MP_PROCS instead of -procs. SCAN uses getopt_long_only typically from "liberty" to parse launcher options. Older versions seem to have a bug that fails to stop parsing when the first non-option (typically the target executable) is encountered: a workaround in such cases is to insert "-- " in the commandline before the target executable, e.g., scan -t mpirun -np 4 -- target.exe arglist. If an MPI launcher is used that is not recognised by SCAN, such as one that has been locally customized, it can be specified via an environment variable, e.g., SCAN_MPI_LAUNCHER=mympirun, to have SCAN accept it. Warning: In such a case, SCAN's parsing of the launcher's arguments may fail. Some MPI launchers result in some or all program output being buffered until execution terminates. In such cases, SCAN_MPI_REDIRECT can be set to redirect program standard and error output to separate files in the experiment archive. If necessary, or preferred, measurement and analysis launches can be performed without using SCAN, resulting in "default" measurement collection or explicit trace analysis (based on the effective EPIK configuration). If environment variables are neither automatically nor explicitly exported to MPI processes, the EPIK.CONF configuration file needs to be used. -------------------------------------------------------------------------------- * EPIK measurement system - The EPIK runtime measurement system produces experiment archives that can only be analyzed with the SCOUT parallel analyzer (by default). For sequential analysis with EXPERT, the generated per-process traces have to merged first using the "elg_merge" utility. - The EPK_GDIR configuration variable specifies the directory containing the EPIK measurement archive (epik_). An additional variable, EPK_LDIR, allows a temporary location to be used as intermediate storage, before the data is finally archived in EPK_GDIR. Generally, the file I/O overhead of transfering data from the intermediate storage location is best avoided by leaving EPK_LDIR set to the same location as EPK_GDIR, so that files are written directly into the experiment archive. - The buffers used by EPIK for definitions records are sized according to the configuration variable ESD_BUFFER_SIZE. If any of these buffers fill during measurement, the resulting experiment archive will not be analyzable: in such cases, it will be necessary to repeat measurement having configured a larger ESD_BUFFER_SIZE as indicated by the associated EPIK message during measurement finalization. (In some cases, even this larger ESD_BUFFER_SIZE may need to be increased a second time.) - The storage capacity for call-path tracking and associated summary measurement data is controlled via the configuation variable ESD_PATHS. If additional call-paths are encountered during measurement, these are distinguished with an "unknown" marker, however, the precise number of such call-paths cannot be determined. If unknown call-paths are reported at measurement finalization or appear in the resulting analysis report, it is advisable to increase (e.g., double) ESD_PATHS and re-measure. After a successful measurement (with no unknown call-paths), ESD_PATHS can be reduced to the actual number of unique call-paths to reduce memory requirements for subsequent measurements. - Note that function filtering (where supported) or selective function instrumentation often significantly reduces the number of unique call-paths. It is therefore often advisable to examine measurements reports containing unknown callpaths for undesirable functions. Highly-recursive functions are particularly undesirable since they result in significant measurement bloat. - If call-path tracking inconsistencies are reported during measurement, these may need careful examination by one of the toolset developers, as they can indicate problems with the compiler-generated instrumentation. On the other hand, applications which abort or explicitly exit prematurely (without returning from "main") will also result in measurement warnings which can often be ignored. - No measurement is possible for MPI applications which abort or otherwise fail to call MPI_Finalize on all processes. - Measurement of MPI application processes which do not call MPI_Init (or MPI_Init_thread) or where the EPIK library linkage has not allowed interposition on MPI calls will abort with the message "MPI_Init interposition unsuccessful!" - The C++ compilers on Cray XT systems fork additional processes which also result in the above abort messages (which can be ignored if the measurement otherwise appears complete). - The EPIK MPI adapter does not support the MPI C++ language bindings (deprecated since MPI 2.2) directly. Measurement of applications using the MPI C++ bindings will only work if the MPI library implements the C++ bindings as a lightweight wrapper on top of the MPI C bindings. Failure is often reported as "MPI_Init interposition unsuccessful!" - The EPIK MPI adapter can only handle precompiled numbers of simultaneous communicators, windows and window accesses (set to 50 in the sources), and measurement will be aborted if this limit is exceeded. - The EPIK MPI adapter does not distinguish time in MPI_Wait*/Test* operations for non-blocking file I/O from communication requests. In summary and trace analysis reports, all such time is attributed to MPI Point-to-point Communication time (and none to MPI File I/O time). - The EPIK MPI adapter is not able to disable the MINI routines at runtime, if they were enabled during Scalasca configuration: they are not enabled unless --enable-all-mpi-wrappers is used when configuring, and can then only be disabled adding --disable-mpi-wrappers=MINI. - PAPI native counter names which include colon characters need to be escaped ("\:") to distinguish them from the colons used to separate counter names in EPIK metric specifications. - EPIK and EPILOG library interfaces and file/directory formats are *UNSTABLE* and very likely to change. -------------------------------------------------------------------------------- * EPILOG trace library and tools - The EPILOG trace tools (e.g., elg_print, elg_stat, elg_timecorrect, elg_merge, etc.) don't yet support EPIK experiment archives. KOJAK components (i.e., EARL and EXPERT) also lack this support. - Care is required when selecting an appropriate buffer size for tracing. The default trace buffer size (ELG_BUFFER_SIZE) for each process is rather small and typically only adequate for very short traces. It is therefore recommended to set the trace buffer size as large as available memory permits: if too large a size is specified, the application will be unable to run or fail to acquire memory. - Whenever a process fills its trace buffer, the buffer contents are automatically flushed to file and the buffer emptied to allow tracing to continue. While this flushing is marked as Overhead in subsequent analysis, generally all other processes will block at their next synchronisation and this is not distinguished in analysis. Furthermore, processes typically don't all fill their trace buffers simultaneously, but their behaviour is often sufficiently similar that immediately following one flush, a chain reaction of (sequential) flushes occurs as each process in turn fills, flushes and empties its trace buffer, yet must subsequently block on a synchronisation with a process that is flushing. This exponential perturbation typically compromises all timings in the resulting measurement/analysis, though it may still help to identify excessively visited callpaths and/or a more appropriate buffer size for subsequent instrumentation/measurement configuration. - elg_timecorrect attempts to correct logical time inconsistencies in EPILOG trace fles, however, it currently only recognizes point-to-point communication. Traces containing collective communication and/or (OpenMP) multithreading may therefore be corrupted by elg_timecorrect. Measurements made on systems with accurately synchronised high resolution timers should not need post-measurement time correction. - By default, separate trace files are created for every thread and process, Even with a SIONlib-enabled version of Scalasca, separate files are used equivalent to ELG_SION_FILES=0. ELG_SION_FILES should be set to specify the number of SION files to be used. If more SION files are requested than MPI processes, it is reduced to the total number of MPI processes. If -1 is specified, an appropriate number of SION files may be determined automatically (e.g., one file per IONode or bridge on BlueGene systems, or one file per MPI process for MPI+OpenMP executions). - SIONlib is not used by Scalasca for serial (non-OpenMP, non-MPI) traces. Event trace data for several MPI processes is typically also stored in separate SION files. For OpenMP and hybrid OpenMP+MPI traces, SIONlib stores trace data for each OpenMP thread (of an MPI process) in a single SION file: in such cases, merging of events from each thread and re-writing trace files is avoided when using SIONlib, resulting in much better scalability and I/O efficiency even at small scale. - EPILOG utilities (elg_print, elg_merge, etc.) do not work with SION files. As a consequence, EXPERT can also not be used to analyze SION traces. - The Vampir interactive trace visualizer [www.vampir.eu] is not yet able to handle event traces in SION files. When desired, SION files can be unpacked using sionsplit for analysis with recent versions of Vampir. -------------------------------------------------------------------------------- * SCOUT parallel trace analysis - The OpenMP and hybrid MPI/OpenMP versions of the SCOUT parallel trace analyzer (and its associated libraries) have been found not to build with PGI pgCC versions 10.5 and later, and while earlier versions can be used to build SCOUT they are unlikely to execute correctly. Consequently, OMPCXX and HYBCXX are commented out in Makefile.defs and the corresponding scout.omp and scout.hyb executables are not built. Pathscale and other compilers may have similar problems and also need the same treatment. - If it is not possible to build the required versions of SCOUT, or it fails to run reliably, it may be possible to substitute a version built with a different compiler (such as GCC) when doing measurement collection & analysis (e.g., in a batch job). - The MPI and hybrid MPI/OpenMP versions of the SCOUT parallel analyzer must be run as an MPI program with exactly the same number of processes as contained in the experiment to analyse: typically it will be convenient to launch SCOUT immediately following the measurement in a single batch script so that the MPI launch command can be configured similarly for both steps. The SCAN nexus executes SCOUT with the appropriate launch configuration when automatic trace analysis is specified. - If the appropriate variant of SCOUT (e.g., scout.hyb for hybrid OpenMP/MPI) is not located by SCAN, it attempts to substitute an alternative variant which will generally result in only partial trace analysis (e.g., scout.mpi will ignore OpenMP events in hybrid traces). - SCOUT is unable to analyze hybrid OpenMP/MPI traces of applications using MPI_THREAD_MULTIPLE and generally unable to handle MPI_THREAD_SERIALIZED, therefore it is typically necessary to enforce use of MPI_THREAD_FUNNELED. - SCOUT is unable to analyse incomplete traces or traces that it is unable to load entirely into memory. Experiment archives are portable to other systems where sufficient processors with additional memory are available and a compatible version of SCOUT is installed, however, the size of such experiment archives typically prohibits this. - SCOUT requires user-specified instrumentation blocks to correctly nest and match for it to be able to analyse resulting measurement traces. Similarly, collective operations must be recorded by all participating processes and messages recorded as sent (or received) by one process must be recorded as received (or sent) by another process, otherwise SCOUT can be expected to deadlock during trace analysis. - SCOUT ignores hardware counter measurements recorded in traces. If measurement included simultaneous runtime summarization and tracing, the two reports are automatically combined during experiment post-processing. - SCOUT does not calculate "MPI File Bytes Transferred" metrics: these are only available in runtime summary measurements. - SCOUT may deadlock and be unable to analyse measurement experiments: should you suspect this to be the case, please save the experiment archive and contact the Scalasca development team for it to be investigated. - SCOUT can only handle SION files in the same mode (OpenMP, OpenMP+MPI or MPI): in particular, scout.mpi is unable to handle OpenMP+MPI SION files. Traces from OpenMP+MPI applications with only MPI (linking) instrumentation are also not handled and may result in scout.hyb crashes or deadlocks. Other utilities (e.g., clc_synchronize) are similarly limited to handling only MPI SION files in MPI mode. (Where there is a separate SION file for each MPI process, hybrid OpenMP+MPI is also expected to work.) - Scalasca traces that SCOUT is unable to analyse may still be visualized and interactively analyzed by 3rd-party tools such as VAMPIR (7.3 or later). -------------------------------------------------------------------------------- * CUBE3 analysis report explorer - CUBE3 consists of libraries for producing analysis reports (cubefiles) and an (optional) GUI for analysis report browsing. These library APIs and the resulting cubefiles are incompatible with previous CUBE versions, i.e., earlier versions of the GUI and tools cannot read the cubefiles produced by the CUBE3-based tools in this release. However, the CUBE3 GUI and algebra tools provided in this release are able to process cubefiles produced by the previous version CUBE2. - It may be necessary to explicity set a locale (e.g., LANG=C) and/or run "dbus-launch" when using the Qt-based GUI. - Only one topology can be visible at a time: to switch to another topology in the WX-based GUI return to the System Tree and then select Topology View again. - On Cray XT systems, the physical machine topology is generated during the remapping of intermediate cube reports. The number of cores per node is automatically determined, but can be modified by setting environment variable XT_NODE_CORES as appropriate. - CUBE4 is able to read CUBE3 files (albeit much more slowly than CUBE3), however, CUBE4 files are incompatible with CUBE3. Library interfaces and file formats are *UNSTABLE* and likely to change. -------------------------------------------------------------------------------- * Multi-executable MPMD analysis - Measurement and analysis of applications consisting of multiple executables is possible but typically requires following a few rules and providing additional assistance to Scalasca. Symmetric executions on Intel Xeon hosts and Xeon Phi coprocessors are a special case of heterogeneous multi-executable measurements. - If MPI is not used to launch the executables, separate experiments will be produced for each instrumented executable. If they produce experiments in the same directory, they will need to have unique experiment titles. - If executables are launched on hosts that don't share a common filesystem, trace experiments require that appropriate paths to the experiment archive (or working) directory are provided for each MPI process. - If MPI is used to launch and connect a set of executables, then each must be instrumented by (the same version of) Scalasca. If any of the executables additionally uses OpenMP, then all of the executables should be instrumented specifying MPI+OMP mode when linking. (If the executable that provides MPI rank 0 uses OpenMP+MPI it may be sufficient.) - While each executable may be provided with a distinct EPIK measurement configuration, they must all be configured for the same type of experiment, i.e., runtime summarization and/or tracing. If hardware counter metrics are specified, EPK_METRICS should be identical for all processes. - EPIK reports of routines filtered and required buffer sizes may not be accurate (as they are likely to differ for each executable). - The SCAN measurement collection and analysis nexus typically only considers the first executable when checking for Scalasca instrumentation and setting the default experiment title. The total number of processes in the title may not be accurate: SCAN_MPI_RANKS can be set to specify the actual number when performing a measurement experiment. - When a command file is used to specify the executables to be launched (e.g., POE -cmdfile) its contents are not examined by SCAN. It may be therefore be necessary to specify one of the instrumented executables (which will title the experiment) on the command line after a double-dash, e.g., % poe -pgmmodel mpmd -cmdfile exe.lst % scalasca -analyze poe -pgmmodel mpmd -cmdfile exe.lst -- master.exe - Scalasca analysis reports containing parts from multiple executables can be partitioned using the cube3_part utility. -------------------------------------------------------------------------------- * SHMEM communication analysis - Support for SHMEM communication analysis is based on the serial EXPERT trace analyzer. - SHMEM is currently only supported with IBM TurboSHMEM. Especially Cray SHMEM support is NOT implemented yet. - Due to the lack of freely available applications using the one-sided communication paradigm, these toolset components tend to be less well tested than others. It is planned to address these limitations in future releases. -------------------------------------------------------------------------------- * OpenMP analysis - The measurement and analysis components cannot handle OpenMP programs which use nested or task parallelism. Even disabling nesting may not help. - The same team of threads are expected to be used throughout execution, i.e., OMP_NUM_THREADS threads. If a larger number is used for any parallel region (e.g., via the "num_threads(#)" clause or "set_num_threads(#)" intrinsic function) these are not included in the measurement by default. In such cases, ESD_MAX_THREADS may be used to specify the maximum number of threads (on each process) for which measurements will be recorded. Automatic trace analysis will not be possible even in this case, nor if smaller teams are used or regions are not executed in parallel due to an "if" clause. - SCAN automatic trace analysis of hybrid MPI/OpenMP applications primarily is done with an MPI/OpenMP version of the SCOUT trace analyzer (scout.hyb). When this is not available, or when the MPI-only version of the trace analyzer (scout.mpi) is specified, analysis results are provided for the master threads only. Alternatively, if the trace files can be merged (via elg_merge), the EXPERT trace analyzer may be used, and its report includes analysis results for all threads and additional OpenMP-specific performance metrics. - OpenMP 3.0 is only partially supported. TASK directives in the code lead to an error during instrumentation and/or measurement. - The OPARI2 preprocessor is used for instrumenting OpenMP applications. OPARI2, being a simple source-to-source transformation tool, has several OpenMP related restrictions. See next section. It is planned to address these limitations in future releases. -------------------------------------------------------------------------------- * OPARI2 This section provides some background on how OPARI2 works, so you can better understand how to use it for instrumenting "real" applications. Unfortunately, in its current state, it does not always work as automatically as would be desirable, and various workarounds are required. NOTE: In the following description "kinst" is used as a synonym for "scalasca -instrument" or "skin" and "kinst-pomp" as a synonym for "scalasca -instrument -pomp" or "skin -pomp". OPARI2 Basic Description ------------------------ OPARI2 is used for two purposes: 1) Instrumentation of OpenMP constructs 2) Activation of manual instrumentation using "POMP directives" For each source file ".", OPARI2 as called by the instrumenter scripts does the following: 1) Create a modified (instrumented) file ".mod.". 2) Create an instrumentation descriptor file named "..opari.inc" which contains the corresponding instrumentation descriptor definitions. 3) All temporary intermediate files are automatically removed when they are no longer needed (unless the verbose flag was specified when instrumenting). Note to users of earlier versions of Scalasca --------------------------------------------- In the previous version, OPARI ran into problems when the source files were distributed over multiple directories or multiple targets were built inside a directory. The options -rcfile and -table to work around these limitations are now deprecated. Known issues ------------ - All languages + OPARI2 processes source files before the compiler preprocessor, so macros and included files are not processed. Conditionally compiled source code is also not resolved and can therefore result in erroneous instrumentation of partial OpenMP constructs. + Sources containing OpenMP used within the scope of Intel "offload" pragmas/directives are not instrumented correctly and will fail to compile: this may be fixed with OPARI2-1.0.8 or a later release. + The instrumented source files generated by OPARI2 may confuse automatic dependency tracking by "make", "autotools", etc. For autotools, configure with "--disable-dependency-tracking". + Literal file-filter rules like "INCLUDE bt.f" for files that will be processed by OPARI2 do not work, as OPARI2 changes the file name (e.g., to bt.mod.F). + Some OpenMP compilers (e.g. PGI) are non-standard-conforming in the way they process OpenMP directives by not allowing macro replacement of OpenMP directive parameters. This results in error messages containing references to POMP_DLIST_##### where ##### is a five-digit number. In this case, try to use the OPARI2 option "--nodecl". This is unfortunately not a perfect workaround, as this can trigger other errors in some rare cases. + Sometimes instrumentation of OpenMP source files work, but the traces get enormously large because the application is using large numbers (millions) of small OpenMP synchronisation operations like atomic, locks or flushes which are instrumented by default. Also, in that case, the instrumentation overhead might become excessive. In that case, you can tell OPARI2 not to instrument these constructs by using the "--disable=[,]..." option. Valid values for constructs are: atomic, critical, flush, locks, master, ordered, single or "sync" which disables all of the above. Of course, then these constructs are not measured and you have to keep this in mind later, when you analyze the results, that although they do not show up in the analysis report that the application might have some performance problem because of too many OpenMP synchronisation calls! - Fortran: + The !$OMP END DO and !$OMP END PARALLEL DO directives are required (and are not optional as described in the OpenMP specification). + The atomic expression controlled by a !$OMP ATOMIC directive has to be on a line all by itself. + The Fortran95 statement terminator (";") is not handled correctly when it is used within parallel loops. + Identifiers containing Fortran keywords (such as "do", "function", "module" or "subroutine") may result in incorrect instrumentation. + If an #ifdef block is used at the beginning of the variable definition part, instrumentation is incorrectly inserted within the block and not compiled later when the evaluation is false. (A workaround is to move the conditional variable definition after some unconditional definitions.) + Some Fortran compilers (e.g., Sun) don't fully support C preprocessor commands, especially the "#line" commands. In case you track a compilation error on a OPARI2 modified/instrumented file down to such a statement, try using "--nosrc" as this suppresses the generation of "#line" statements. (With the Sun Fortran compiler, using "-xpp=cpp" is a better workaround.) + Some Fortran compilers (e.g., IBM XLF) don't disable C trigraph substitution when preprocessing and leading to compilation errors. (With IBM XF compilers, adding "-d -WF,-qlanglvl=classic" can be used as a workaround, however, line numbers will be inaccurate.) + The first SECTION directive inside a SECTIONS workshare directive is required (and is not optional as described in the OpenMP specification). + OPARI instrumentation includes declarations of unnecessary variables (particularly in routines which don't use OpenMP) which result in numerous warnings if -Wunused or -Wall are used, or compilation errors in conjunction with -Werror. - C/C++: + Structured blocks describing the extend of an OpenMP pragma need to be either compound statements {....}, while loops, or simple statements. In addition, for loops are supported after omp for and omp parallel for pragmas. Complex statements like if-then-else or do-while need to be enclosed in a block ( {....} ). + C99 6.10.9 _Pragma operators are not supported. It is planned to address these limitations in future releases. --------------------------------------------------------------------------------