Cube GUI User Guide  (CubeGUI 4.5, revision release-4.5)
Introduction in Cube GUI and its usage
KNL Memory usage analysis

With Score-P, we measure the bandwidth values per code-region outside of OpenMP parallel regions, due the given uncore counter restrictions. Depending on the application, there might be a lot of code regions that show a high band- width value. To find the most bandwidth sensitive candidates among these regions, we need to sort them by their last-level cache-misses (LLC). This gives us the MCDRAM candidate metric per code region, as shown in Figure 4. We derive the MCDRAM candidate metric, i.e., we sort the high bandwith callpaths by their last- level cache misses, in the Cube plugin KNL advisor (see also 5.2). As input we use the PAPI-measured access counts for each DDR4 memory channel and the PAPI-SCIPHI Score-P and Cube extensions for Intel Phi measured LLC counts. We take care of measuring the memory accesses only per- process while running exclusively on a single KNL node. As Score-P and Cube purely work on code regions, the MCDRAM candidates are also code regions. As a drawback, if a candidate code region accesses several data structures, we cannot point to the most bandwidth sensitive structure. Vtune [1], HPCToolkit [3][12] or ScaAnalyzer [13] might provide more detailed insight. In addition to this drawback, the above approach is not generally applicable for tools as accessing counters from the uncore requires priviledged access to a ma- chine, either by setting the paranoia flag or by providing a special kernel module. On production machines, this access is, for security reasons, often not granted. This does not only apply to memory accesses, but to all uncore counters.


Cube Writer Library    Copyright © 1998–2020 Forschungszentrum Jülich GmbH, Jülich Supercomputing Centre
Copyright © 2009–2015 German Research School for Simulation Sciences GmbH, Laboratory for Parallel Programming