Frequently Asked Questions

Parameter groups

Within JUBE you can define parameter groups to allow only specific parameter combinations.

E.g. you have two parameters:

<parameter name="foo">10,100</parameter>
<parameter name="bar">20,200</parameter>
parameter:
  - { name: foo,  _: '10,100' }
  - { name: bar,  _: '20,200' }

Without any additional change, JUBE will run four paramater combinations ( foo=10,bar=20, foo=100,bar=20, foo=10,bar=200, foo=100,bar=200). But maybe within your configuration only foo=10,bar=20 and foo=100,bar=200 make sense. For this you can use the parameter dependencies feature and small Python snippets (Parameter dependencies) to split the four combinations into two groups, by using a dummy index value:

<parameter name="i">0,1</parameter>
<parameter name="foo" mode="python">[10,100][$i]</parameter>
<parameter name="bar" mode="python">[20,200][$i]</parameter>
parameter:
  - { name: i,                 _: '0,1' }
  - { name: foo, mode: python, _: '[10,100][$i]' }
  - { name: bar, mode: python, _: '[20,200][$i]' }

Instead of using a numerical index, you can also use a string value for selection:

<parameter name="key">tick,tock</parameter>
<parameter name="foo" mode="python">
   {"tick" : 10,
    "tock" : 100}["${key}"]
</parameter>
<parameter name="bar" mode="python">
   {"tick" : 20,
    "tock" : 200}["${key}"]
</parameter>
parameter:
  - { name: key, _: 'tick,tock' }
  - name: foo
    mode: python
    _: |
        {
          "tick" : 10,
          "tock" : 100
        }["${key}"]
  - name: bar
    mode: python
    _: |
        {
          "tick" : 20,
          "tock" : 200
        }["${key}"]

Also default values are possible:

<parameter name="foo" mode="python">
   {"tick" : 10,
    "tock" : 100}.get("${key}",0)
</parameter>
parameter:
  - name: foo
    mode: python
    _: |
        {
          "tick" : 10,
          "tock" : 100
        }.get("${key}",0)

Workdir change

Sometimes you want to execute a step outside of the normal JUBE directory structure. This can be done by using the work_dir-attribute inside the <step>-tag. If you use the work_dir JUBE does not create a unique directory structure. So you have to create this structure on your own if you need unique directories e.g. by using the jube_variables.

<step name="a_step" work_dir="path_to_dir/${jube_benchmark_padid}/${jube_wp_padid}_${jube_step_name}">
   ...
</step>
step:
  name: a_step
  work_dir: "bench_run/${jube_benchmark_padid}/${jube_wp_padid}_${jube_step_name}"

Using the *_padid variables will help to create a sorted directory structure.

JUBE does not create any symbolic links inside the changed work directories. If you want to access files, out of a dependend step, you can use a <fileset> and the rel_path_ref-attribute.

<fileset name="needed_files">
   <link rel_path_ref="internal">dependent_step_name/a_file</link>
</files>
fileset:
  name: needed_files
  link:
    - {rel_path_ref: internal, _: dependent_step_name/a_file}

This will create a link inside your alternative working dir and the link target path will be seen relative towards the original JUBE directory structure. So here you can use the normal automatic created link to access all dependend files.

To access files out of an alternative working directory in a following step and if you created this working directory by using the jube_variables, you can use jube_wp_parent_<parent_name>_id to get the id of the parent step to use it within a path definition.

XML character handling

The JUBE XML based input format bases on the general XML rules. Here some hints for typical XML problems:

Linebreaks are not allowed inside a tag-option (e.g. <sub ... dest="...\n..."> is not possible). Inside a tag multiple lines are no problem (e.g. inside of <parameter>...</parameter>). Often multiple lines are also needed inside a <sub>. Linebreaks are possible for the dest="" part, by switching to the alternative <sub> syntax:

<sub source="...">
...
</sub>

Whitespaces will only be removed in the beginning and in the end of the whole string. So indentation of a multiline string can create some problems.

Some characters are not allowed inside an XML script or at least not inside a tag-option. Here are some of the typcial replacments:

  • < : &lt;

  • > : &gt;

  • & : &amp;

  • " : &quot;

  • ' : &apos;

YAML character handling

The JUBE YAML based input format bases on the general YAML rules.

Instead of tags in the XML format the YAML format uses keys which values are a list of elements or other keys.

The files used for this example can be found inside examples/yaml.

The input file hello_world.yaml:

benchmark: # having only a single benchmark, this key is optional
  name: hello_world
  outpath: bench_run
  comment: A simple hello world in yaml

  #Configuration
  parameterset:
    name: hello_parameter
    parameter: {name: hello_str,  _: Hello World}

  #Operation
  step:
    name: say_hello
    use: hello_parameter # special key _ can be skipped
    do:
      - _: echo $hello_str # - is optional in this case, as ther is only one do entry
        active: true

You can use different styles of writing key value pairs: In the example, the parameter is declared in one line using {}. Mutliple key value pairs can be stored per element. The main content attribute is marked by using _. As an alternative you can write the key value pairs amongst multiple lines using the same indent as the preceding line, like the key do in the example. If a key like use has only a value, you can write it in one line without using the special _ key.

Is list of elements can be specifiec by using [] or by using - amongst multiple lines (always keeping the same indent).

YAML also has a number of spcial characters which can be integrated by using quotation marks:

The input file special_values.yaml:

name: special values
outpath: bench_run
comment: An example for values that need to be in quotations

parameterset:
  name: special_parameters
  parameter:
    - {name: integer, type: int,  _: "1,2,4"} #comma seperated values need to be quoted
    - {name: "NUMBER", _: "#3"} #values with # need to be quoted

patternset:
  name: special_pattern
  pattern:
    - {name: result, type: int, _: "Result: test"} #values with : need to be quoted
    - {name: integers, type: int, _: "Integers = {$integer}"} #values with {} need to be quoted
    - {name: integer, type: int, _: "'Integer' = $NUMBER"} #values with ' need to be quoted

Anytime you have a symbol like #, ', ,, : or {} you have to enclose the entire value in quotation marks.

Analyse multiple output files

This FAQ entry is only relevant for JUBE versions prior version 2.2. Since version 2.2 JUBE automatically creates a combined result table.

Within an <analyser> you can analyse multiple files. Each <analyser> <analyse> combination will create independent result entries:

<analyser name="analyse">
   <use>a_patternset</use>
   <analyse step="step_A">
      <file>stdout</file>
   </analyse>
   <analyse step="step_B">
      <file>stdout</file>
   </analyse>
</analyser>

In this example the <patternset> a_patternset will be used for both files. This is ok if there are only patterns which match either the step_A stdout file or the step_B stdout file.

If you want to use a file dependent patternset you can move the use to a <file> attribute instead:

<analyser name="analyse">
   <analyse step="step_A">
      <file use="a_patternset_A">stdout</file>
   </analyse>
   <analyse step="step_B">
      <file use="a_patternset_B">stdout</file>
   </analyse>
</analyser>

This avoids the generation of incorrect result entries. A from=... option is not available in this case. Instead you can copy the patternset first to your local file by using the init_with attribute.

Due to the independet result_entries, you will end up with the following result table if you mix the extracted pattern:

| pattern1_of_A | pattern2_of_A | pattern1_of_B |
|---------------+---------------+---------------|
|             1 |             A |               |
|             2 |             B |               |
|               |               |            10 |
|               |               |            11 |
|               |               |            12 |
|               |               |            13 |

The different <analyse> were not combined. So you end up with independet result lines for each workpackage. JUBE does not see possible step dependencies in this point the user has to set the dependcies manually:

<analyser name="analyse">
   <analyse step="step_B">
      <file use="a_patternset_B">stdout</file>
      <file use="a_patternset_A">step_A/stdout</file>
   </analyse>
</analyser>

Now we only have one <analyse> and we are using the autogenerated link to access the dependent step. This will create the correct result:

| pattern1_of_A | pattern2_of_A | pattern1_of_B |
|---------------|---------------|---------------|
|            1  |             A |            10 |
|            2  |             B |            11 |
|            1  |             A |            12 |
|            2  |             B |            13 |

Extract data from a specifc text block

In many cases the standard program output is structured into multiple blocks:

blockA:
...
time=20

blockB:
...
time=30

Using a simple <pattern> like time=$jube_pat_int will match all time= lines (the default match will be the first one, and Statistic pattern values are available as well). However in many cases a specifc value from a sepcifc block should be extracted. This is possible by using \s within the pattern for each individual newline character within the block, or by using the dotall option:

<pattern name="a_pattern" dotall="true">blockB:.*?time=$jube_pat_int</pattern>
pattern:
  - {name: a_pattern, dotall: true, _: 'blockB:.*?time=$jube_pat_int'}

This only extracts 30 from blockB. Setting dotall="true" allows to use the . to take care of all newline characters in between (by default newline characters are not matched by .).

Restart a workpackage execution

If a problem occurs outside of the general JUBE handling (e.g. a crashed HPC job or a broken dependency) it might be necessary to restart a specific workpackage. JUBE allows this restart by removing the problematic workpackage entry and using the jube continue command afterwards:

jube remove bechmark_directory --id <id> --workpackage <workpackage_id>
...
jube continue bechmark_directory

This will rerun the specific workpackage. The JUBE configuration will stay unchanged. It is not possible to change the <paramter> or <step> configuration later on. Shared <do> operations (shared=true) will be ignored within such a rerun scenario except if all workpackages of a specifc step were removed and the full step is re-executed.