Advanced tutorial¶
This tutorial demonstrates more detailed functions and tools of JUBE. If you want a basic overview you should read the general JUBE tutorial first.
Schema validation¶
To validate your YAML based input files you can use schema validation.
You can find jube.json
in the contrib/schema
folder.
To validate your XML based input files you can use DTD or schema validation.
You can find jube.dtd
, jube.xsd
and jube.rnc
in the contrib/schema
folder.
You have to add these schema information to your input files which you want to validate.
DTD usage:
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE jube SYSTEM "<jube.dtd path>">
3<jube>
4...
Schema usage:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3 xsi:noNamespaceSchemaLocation="<jube.xsd path>">
4...
RELAX NG Compact Syntax (RNC for emacs nxml-mode) usage:
In order to use the provided rnc schema file schema/jube.rnc
in
emacs open an xml file and use C-c C-s C-f
or M-x
rng-set-schema-file-and-validate
to choose the rnc file. You can
also use M-x customize-variable rng-schema-locating-files
after
you loaded nxml-mode to customize the default search paths to include
jube.rnc
. After successful parsing emacs offers to automatically
create a schema.xml
file which looks like
1<?xml version="1.0"?>
2<locatingRules xmlns="http://thaiopensource.com/ns/locating-rules/1.0">
3 <uri resource="jube-file.xml" uri="../schema/jube.rnc"/>
4</locatingRules>
The next time you open the same xml file emacs will find the correct
rnc for the validation based on schema.xml
.
Example validation tools:
eclipse (using DTD or schema)
emacs (using RELAX NG)
xmllint:
For validation (using the DTD):
>>> xmllint --noout --valid <xml input file>
For validation (using the DTD and Schema):
>>> xmllint --noout --valid --schema <schema file> <xml input file>
Scripting parameter¶
In some cases it is needed to create a parameter which is based on the value of another parameter. In this case you can use a scripting parameter.
The files used for this example can be found inside examples/scripting_parameter
.
The input file scripting_parameter.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="scripting_parameter" outpath="bench_run">
4 <comment>A scripting parameter example</comment>
5
6 <!-- Configuration -->
7 <parameterset name="param_set">
8 <!-- Normal template -->
9 <parameter name="number" type="int">1,2,4</parameter>
10 <!-- A template created by a scripting parameter-->
11 <parameter name="additional_number" mode="python" type="int">
12 ",".join(str(a*${number}) for a in [1,2])
13 </parameter>
14 <!-- A scripting parameter -->
15 <parameter name="number_mult" mode="python" type="float">
16 ${number}*${additional_number}
17 </parameter>
18 <!-- Reuse another parameter -->
19 <parameter name="text">Number: $number</parameter>
20 </parameterset>
21
22 <!-- Operation -->
23 <step name="operation">
24 <use>param_set</use> <!-- use existing parameterset -->
25 <!-- shell commands -->
26 <do>echo "number: $number, additional_number: $additional_number"</do>
27 <do>echo "number_mult: $number_mult, text: $text"</do>
28 </step>
29 </benchmark>
30</jube>
The input file scripting_parameter.yaml
:
1name: scripting_parameter
2outpath: bench_run
3comment: A scripting parameter example
4
5#Configuration
6parameterset:
7 name: param_set
8 parameter:
9 #Normal template
10 - {name: number, type: int, _: "1,2,4"}
11 #A template created by a scripting parameter
12 - {name: additional_number, mode: python, type: int, _: '",".join(str(a*${number}) for a in [1,2])'}
13 #A scripting parameter
14 - {name: number_mult, mode: python, type: float, _: "${number}*${additional_number}"}
15 #Reuse another parameter
16 - {name: text, _: "Number: $number"}
17
18#Operation
19step:
20 name: operation
21 use: param_set #use existing parameterset
22 do:
23 - 'echo "number: $number, additional_number: $additional_number"'
24 - 'echo "number_mult: $number_mult, text: $text"'
In this example we see four different parameters.
number
is a normal template which will be expanded to three different workpackages.additional_number
is a scripting parameter which creates a new template and bases onnumber
. Themode
is set to the scripting language (python
,perl
andshell
are allowed). The additionaltype
is optional and declares the result type after evaluating the expression. The type is only used by the sort algorithm in the result step. It is not possible to create a template of different scripting parameters. Because of this second template we will get six different workpackages.number_mult
is a small calculation. You can use any other existing parameters (which are used inside the same step).text
is a normal parameter which uses the content of another parameter. For a simple concatenation parameter you do not need a scripting parameter.
For this example we will find the following output inside the run.log
-file:
1====== operation ======
2>>> echo "number: 1, additional_number: 1"
3>>> echo "number_mult: 1, text: Number: 1"
4====== operation ======
5>>> echo "number: 1, additional_number: 2"
6>>> echo "number_mult: 2, text: Number: 1"
7====== operation ======
8>>> echo "number: 2, additional_number: 2"
9>>> echo "number_mult: 4, text: Number: 2"
10====== operation ======
11>>> echo "number: 2, additional_number: 4"
12>>> echo "number_mult: 8, text: Number: 2"
13====== operation ======
14>>> echo "number: 4, additional_number: 4"
15>>> echo "number_mult: 16, text: Number: 4"
16====== operation ======
17>>> echo "number: 4, additional_number: 8"
18>>> echo "number_mult: 32, text: Number: 4"
Implicit Perl or Python scripting inside the <do>
or any other position is not possible.
If you want to use some scripting expressions you have to create a new parameter.
Scripting pattern¶
Similar to the Scripting parameter, also different patterns, or patterns and parameters can be combined.
For this a scripting pattern can be created by using the mode=
attribute in the same way as it is used for the Scripting parameter.
All scripting patterns are evaluated at the end of the analyse part. Each scripting pattern is evaluated once. If there are multiple matches as described in the Statistic pattern values section, only the resulting statistical pattern is available (not each individual value). Scripting pattern do not create statistic values by themselves.
In addition the default=
attribute can be used to set a default pattern value, if the value can’t be found during the analysis.
The files used for this example can be found inside examples/scripting_pattern
.
The input file scripting_pattern.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="scripting_pattern" outpath="bench_run">
4 <comment>A scripting_pattern example</comment>
5
6 <!-- Configuration -->
7 <parameterset name="param_set">
8 <parameter name="value" type="int">0,1,2</parameter>
9 </parameterset>
10
11 <!-- Operation -->
12 <step name="operation">
13 <use>param_set</use>
14 <do>echo "$value"</do>
15 </step>
16
17 <!-- Pattern to extract -->
18 <patternset name="pattern_set">
19 <!-- A normal pattern -->
20 <pattern name="value_pat" type="int">$jube_pat_int</pattern>
21 <!-- A combination of a pattern and a parameter -->
22 <pattern name="dep_pat" type="int" mode="python">$value_pat+$value</pattern>
23 <!-- This pattern is not available -->
24 <pattern name="missing_pat" type="int">
25 pattern_not_available: $jube_pat_int
26 </pattern>
27 <!-- The combination will fail (create NaN) -->
28 <pattern name="missing_dep_pat" type="int" mode="python">
29 $missing_pat*$value
30 </pattern>
31 <!-- Default value for missing pattern -->
32 <pattern name="missing_pat_def" type="int" default="0">
33 pattern_not_available: $jube_pat_int
34 </pattern>
35 <!-- Combination of default value and parameter -->
36 <pattern name="missing_def_dep_pat" type="int" mode="python">
37 $missing_pat_def*$value
38 </pattern>
39 </patternset>
40
41 <analyser name="analyse">
42 <use>pattern_set</use>
43 <analyse step="operation">
44 <file>stdout</file>
45 </analyse>
46 </analyser>
47
48 <!-- result table creation -->
49 <result>
50 <use>analyse</use>
51 <table name="result" style="pretty">
52 <column>value</column>
53 <column>value_pat</column>
54 <column>dep_pat</column>
55 <column>missing_pat</column>
56 <column>missing_dep_pat</column>
57 <column>missing_pat_def</column>
58 <column>missing_def_dep_pat</column>
59 </table>
60 </result>
61 </benchmark>
62</jube>
The input file scripting_pattern.yaml
:
1name: scripting_pattern
2outpath: bench_run
3comment: A scripting_pattern example
4
5#Configuration
6parameterset:
7 name: param_set
8 parameter: {name: value, type: int, _: "0,1,2"}
9
10#Operation
11step:
12 name: operation
13 use: param_set
14 do: echo "$value"
15
16#Pattern to extract
17patternset:
18 name: pattern_set
19 pattern:
20 #A normal pattern
21 - {name: value_pat, type: int, _: $jube_pat_int}
22 #A combination of a pattern and a parameter
23 - {name: dep_pat, type: int, mode: python, _: $value_pat+$value}
24 #This pattern is not available
25 - {name: missing_pat, type: int, _: "pattern_not_available: $jube_pat_int"}
26 #The combination will fail (create NaN)
27 - {name: missing_dep_pat, type: int, mode: python, _: $missing_pat*$value}
28 #Default value for missing pattern
29 - {name: missing_pat_def, type: int, default: 0, _: "pattern_not_available: $jube_pat_int"}
30 #Combination of default value and parameter
31 - {name: missing_def_dep_pat, type: int, mode: python, _: $missing_pat_def*$value}
32
33analyser:
34 name: analyse
35 use: pattern_set
36 analyse:
37 step: operation
38 file: stdout
39
40#result table creation
41result:
42 use: analyse
43 table:
44 name: result
45 style: pretty
46 column: [value,value_pat,dep_pat,missing_pat,missing_dep_pat,missing_pat_def,missing_def_dep_pat]
It will create the following output:
1| value | value_pat | dep_pat | missing_pat | missing_dep_pat | missing_pat_def | missing_def_dep_pat |
2|-------|-----------|---------|-------------|-----------------|-----------------|---------------------|
3| 0 | 0 | 0 | | nan | 0 | 0 |
4| 1 | 1 | 2 | | nan | 0 | 0 |
5| 2 | 2 | 4 | | nan | 0 | 0 |
Statistic pattern values¶
Normally a pattern should only match a single entry in your result files. But sometimes there are multiple similar entries (e.g. if the benchmark uses some iteration feature).
JUBE will create the statistical values first
, last
, min
, max
, avg
, std
, cnt
and sum
automatically.
To use these values, the user have to specify the pattern name followed by _<statistic_option>
,
e.g. pattern_name_last
(the pattern_name itself will always be the first match).
An example for multiple matches and the statistic values can be found in examples/statistic
.
The input file statistic.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="reduce_example" outpath="bench_run">
4 <comment>A result reduce example</comment>
5
6 <!-- Regex pattern -->
7 <patternset name="pattern">
8 <pattern name="number_pat" type="int">$jube_pat_int</pattern>
9 </patternset>
10
11 <!-- Operation -->
12 <step name="write_some_numbers">
13 <do>echo "1 2 3 4 5 6 7 8 9 10"</do> <!-- shell command -->
14 </step>
15
16 <!-- Analyse -->
17 <analyser name="analyse">
18 <use>pattern</use> <!-- use existing patternset -->
19 <analyse step="write_some_numbers">
20 <file>stdout</file> <!-- file which should be scanned -->
21 </analyse>
22 </analyser>
23
24 <!-- Create result table -->
25 <result>
26 <use>analyse</use> <!-- use existing analyser -->
27 <table name="result" style="pretty">
28 <column>number_pat</column> <!-- first match -->
29 <column>number_pat_first</column> <!-- first match -->
30 <column>number_pat_last</column> <!-- last match -->
31 <column>number_pat_min</column> <!-- min of all matches -->
32 <column>number_pat_max</column> <!-- max of all matches -->
33 <column>number_pat_sum</column> <!-- sum of all matches -->
34 <column>number_pat_cnt</column> <!-- number of matches -->
35 <column>number_pat_avg</column> <!-- avg of all matches -->
36 <column format=".2f">number_pat_std</column> <!-- std of all matches -->
37 </table>
38 </result>
39 </benchmark>
40</jube>
The input file statistic.yaml
:
1name: reduce_example
2outpath: bench_run
3comment: A result reduce example
4
5#Regex pattern
6patternset:
7 name: pattern
8 pattern: {name: number_pat, type: int, _: $jube_pat_int}
9
10#Operation
11step:
12 name: write_some_numbers
13 do: echo "1 2 3 4 5 6 7 8 9 10" #shell command
14
15#Analyse
16analyser:
17 name: analyse
18 use: pattern #use existing patternset
19 analyse:
20 step: write_some_numbers
21 file: stdout #file which should be scanned
22
23#Create result table
24result:
25 use: analyse #use existing analyser
26 table:
27 name: result
28 style: pretty
29 column:
30 - number_pat #first match
31 - number_pat_first #first match
32 - number_pat_last #last match
33 - number_pat_min #min of all matches
34 - number_pat_max #max of all matches
35 - number_pat_sum #sum of all matches
36 - number_pat_cnt #number of matches
37 - number_pat_avg #avg of all matches
38 - {_: number_pat_std, format: .2f} #std of all matches
It will create the following output:
| number_pat | number_pat_last | number_pat_min | number_pat_max | number_pat_sum | number_pat_cnt | number_pat_avg | number_pat_std |
|------------|-----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| 1 | 10 | 1 | 10 | 55 | 10 | 5.5 | 3.03 |
Jobsystem¶
In most cases you want to submit jobs by JUBE to your local jobsystem. You can use the normal file access and substitution system to prepare your jobfile and send it to the jobsystem. JUBE also provide some additional features.
The files used for this example can be found inside examples/jobsystem
.
The input jobsystem file job.run.in
for Torque/Moab (you can easily adapt your personal jobscript):
1#!/bin/bash -x
2#MSUB -l nodes=#NODES#:ppn=#PROCS_PER_NODE#
3#MSUB -l walltime=#WALLTIME#
4#MSUB -e #ERROR_FILEPATH#
5#MSUB -o #OUT_FILEPATH#
6#MSUB -M #MAIL_ADDRESS#
7#MSUB -m #MAIL_MODE#
8
9### start of jobscript
10
11#EXEC#
12touch #READY#
The JUBE input file jobsystem.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="jobsystem" outpath="bench_run">
4 <comment>A jobsystem example</comment>
5
6 <!-- benchmark configuration -->
7 <parameterset name="param_set">
8 <parameter name="number" type="int">1,2,4</parameter>
9 </parameterset>
10
11 <!-- Job configuration -->
12 <parameterset name="executeset">
13 <parameter name="submit_cmd">msub</parameter>
14 <parameter name="job_file">job.run</parameter>
15 <parameter name="nodes" type="int">1</parameter>
16 <parameter name="walltime">00:01:00</parameter>
17 <parameter name="ppn" type="int">4</parameter>
18 <parameter name="ready_file">ready</parameter>
19 <parameter name="mail_mode">abe</parameter>
20 <parameter name="mail_address"></parameter>
21 <parameter name="err_file">stderr</parameter>
22 <parameter name="out_file">stdout</parameter>
23 <parameter name="exec">echo $number</parameter>
24 </parameterset>
25
26 <!-- Load jobfile -->
27 <fileset name="files">
28 <copy>${job_file}.in</copy>
29 </fileset>
30
31 <!-- Substitute jobfile -->
32 <substituteset name="sub_job">
33 <iofile in="${job_file}.in" out="$job_file" />
34 <sub source="#NODES#" dest="$nodes" />
35 <sub source="#PROCS_PER_NODE#" dest="$ppn" />
36 <sub source="#WALLTIME#" dest="$walltime" />
37 <sub source="#ERROR_FILEPATH#" dest="$err_file" />
38 <sub source="#OUT_FILEPATH#" dest="$out_file" />
39 <sub source="#MAIL_ADDRESS#" dest="$mail_address" />
40 <sub source="#MAIL_MODE#" dest="$mail_mode" />
41 <sub source="#EXEC#" dest="$exec" />
42 <sub source="#READY#" dest="$ready_file" />
43 </substituteset>
44
45 <!-- Operation -->
46 <step name="submit" work_dir="$$SCRATCH/jobsystem_bench_${jube_benchmark_id}_${jube_wp_id}" >
47 <use>param_set</use>
48 <use>executeset</use>
49 <use>files,sub_job</use>
50 <do done_file="$ready_file">$submit_cmd $job_file</do> <!-- shell command -->
51 </step>
52 </benchmark>
53</jube>
The JUBE input file jobsystem.yaml
:
1name: jobsystem
2outpath: bench_run
3comment: A jobsystem example
4
5parameterset:
6 #benchmark configuration
7 - name: param_set
8 parameter: {name: number, type: int, _: "1,2,4"} #comma separated integer must be quoted
9 #Job configuration
10 - name: executeset
11 parameter:
12 - {name: submit_cmd, "_": msub}
13 - {name: job_file, "_": job.run}
14 - {name: nodes, type: int, "_": 1}
15 - {name: walltime, "_": "00:01:00"} #: must be quoted
16 - {name: ppn, type: int, "_": 4}
17 - {name: ready_file, "_": ready}
18 - {name: mail_mode, "_": abe}
19 - {name: mail_address}
20 - {name: err_file, "_": stderr}
21 - {name: out_file, "_": stdout}
22 - {name: exec, "_": echo $number}
23
24#Load jobfile
25fileset:
26 name: files
27 copy: ${job_file}.in
28
29substituteset:
30 name: sub_job
31 iofile: {in: "${job_file}.in", out: $job_file} #attributes with {} must be quoted
32 sub:
33 - {source: "#NODES#", dest: $nodes}
34 - {source: "#PROCS_PER_NODE#", dest: $ppn}
35 - {source: "#WALLTIME#", dest: $walltime}
36 - {source: "#ERROR_FILEPATH#", dest: $err_file}
37 - {source: "#OUT_FILEPATH#", dest: $out_file}
38 - {source: "#MAIL_ADDRESS#", dest: $mail_address}
39 - {source: "#MAIL_MODE#", dest: $mail_mode}
40 - {source: "#EXEC#", dest: $exec}
41 - {source: "#READY#", _: $ready_file } # _ can be used here as well instead of dest (should be used for multiline output)
42
43#Operation
44step:
45 name: submit
46 work_dir: "$$WORK/jobsystem_bench_${jube_benchmark_id}_${jube_wp_id}"
47 use: [param_set,executeset,files,sub_job]
48 do:
49 done_file: $ready_file
50 _: $submit_cmd $job_file #shell command
As you can see the jobfile is very general and several parameters will be used for replacement. By using a general jobfile and the substitution mechanism
you can control your jobsystem directly out of your JUBE input file.
$$
is used for Shell substitutions instead of JUBE substitution (see Environment handling).
The submit command is a normal Shell command so there are no special JUBE tags to submit a job.
There are two new attributes:
done_file
inside the<do>
allows you to set a filename/path to a file which should be used by the jobfile to mark the end of execution. JUBE does not know when the job ends. Normally it will return when the Shell command was finished. When using a jobsystem the user usually have to wait until the jobfile is executed. If JUBE found a<do>
containing adone_file
attribute JUBE will return directly and will not continue automatically until thedone_file
exists. If you want to check the current status of your running steps and continue the benchmark process if possible you can type:>>> jube continue bench_runThis will continue your benchmark execution (
bench_run
is the benchmarks directory in this example). The position of thedone_file
is relatively seen towards the work directory.
work_dir
can be used to change the sandbox work directory of a step. In normal cases JUBE checks that every work directory gets a unique name. When changing the directory the user must select a unique name by his own. For example he can use$jube_benchmark_id
and$jube_wp_id
, which are JUBE internal parameters and will be expanded to the current benchmark and workpackage ids. Files and directories out of a given<fileset>
will be copied into the new work directory. Other automatic links, like the dependency links, will not be created!
You will see this Output after running the benchmark:
| stepname | all | open | wait | error | done |
|----------|-----|------|------|-------|------|
| submit | 3 | 0 | 3 | 0 | 0 |
and this output after running the continue
command (after the jobs where executed):
| stepname | all | open | wait | error | done |
|----------|-----|------|------|-------|------|
| submit | 3 | 0 | 0 | 0 | 3 |
You have to run continue
multiple times if not all done_file
were written when running continue
for the first time.
Include external data¶
As you have seen in the example before a benchmark can become very long. To structure your benchmark you can use multiple files and reuse existing sets. There are three different include features available.
The files used for this example can be found inside examples/include
.
The include file include_data.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <parameterset name="param_set">
4 <parameter name="number" type="int">1,2,4</parameter>
5 </parameterset>
6
7 <parameterset name="param_set2">
8 <parameter name="text">Hello</parameter>
9 </parameterset>
10
11 <dos>
12 <do>echo Test</do>
13 <do>echo $number</do>
14 </dos>
15</jube>
The include file include_data.yaml
:
1parameterset:
2 - name: param_set
3 parameter: {name: number, type: int, _: "1,2,4"}
4 - name: param_set2
5 parameter: {name: text, _: Hello}
6
7dos:
8 - echo Test
9 - echo $number
All files which contain data to be included must use the XML-format. The include files can have a user specific structure (there can be no valid
JUBE tags like <dos>
), but the structure must be allowed by the searching mechanism (see below). The resulting file must have a valid JUBE structure.
The main file main.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="include" outpath="bench_run">
4 <comment>A include example</comment>
5
6 <!-- use parameterset out of an external file and add a additional parameter -->
7 <parameterset name="param_set" init_with="include_data.xml">
8 <parameter name="foo">bar</parameter>
9 </parameterset>
10
11 <!-- Operation -->
12 <step name="say_hello">
13 <use>param_set</use> <!-- use existing parameterset -->
14 <use from="include_data.xml">param_set2</use> <!-- out of an external file -->
15 <do>echo $foo</do> <!-- shell command -->
16 <include from="include_data.xml" path="dos/do" /> <!-- include all available tag -->
17 </step>
18 </benchmark>
19</jube>
The main file main.yaml
:
1name: include
2outpath: bench_run
3comment: A include example
4
5#use parameterset out of an external file and add a additional parameter
6parameterset:
7 name: param_set
8 init_with: include_data.yaml
9 parameter: {name: foo, _: bar}
10
11#Operation
12step:
13 name: say_hello
14 use:
15 - param_set #use existing parameterset
16 - from: include_data.yaml
17 _: param_set2 #out of an external file
18 do:
19 - echo $foo
20 - !include include_data.yaml:["dos"] #include all available tag
In these file there are three different include types:
The init_with
can be used inside any set definition. Inside the given file the search mechanism will search for the same set (same type, same name), will parse its structure (this must be JUBE valid) and copy the content to
main.xml
. Inside main.xml
you can add additional values or overwrite existing ones. If your include-set uses a different name inside your include file you can use init_with="filename.xml:old_name"
. It is possible to mix YAML based
include files with XML files and vice versa.
The second method is the <use from="...">
. This is mostly the same like the init_with
structure, but in this case you are not able to add or overwrite some values. The external set will be used directly. There is no set-type inside the <use>
, because of that, the set’s name must
be unique inside the include-file. The remote file can use the YAML or the XML format.
The last method is the most generic include. The include mechanic is the only element in JUBE which works slightly different in YAML and XML based files.
In XML based files by using <include />
you can copy any XML-nodes you want to your main-XML file. The included file can provide tags which are not JUBE-conform but it must be a valid XML-file (e.g. only one root node allowed). The
resulting main configuration file must be completely JUBE valid.
The path
is optional and can be used to select a specific node set (otherwise the root-node itself will be included). The <include />
is the only
include-method that can be used to include any tag you want. The <include />
will copy all parts without any changes. The other include types will update path names, which were relative to the include-file position.
In YAML based files the prefix ! include
is used followed by the file name. The file must be a YAML file, which will be opened and parsed. The second block :[“dos”] can be used to select any subset of data of the full dictionary, any Python syntax is allowed for this selection.
Finally it is possible to also specify a third block which allows full Python list comprehensions. _
is the match of the selection before, e.g.: !include include_data.yaml:["dos"]:[i for i in _ if "Test" in i]
. In contrast to the XML based include it isn’t possible to mix lists
or dictionaries out of different files, each key can only handle a single include.
To run the benchmark you can use the normal command:
>>> jube run main.xml
It will search for the files to include inside four different positions, in the following order:
inside a directory given over the command line interface:
>>> jube run --include-path some_path another_path -- main.xml
inside any path given by an
<include-path>
- orinclude-path:
-tag in XML or YAML, respectively:1<?xml version="1.0" encoding="UTF-8"?> 2<jube> 3 <include-path> 4 <path>some_path</path> 5 <path>another_path</path> 6 </include-path> 7 ... 8</jube>
1... 2include-path: 3 path: 4 - "some path" 5 - "another path" 6...
inside any path given with the
JUBE_INCLUDE_PATH
environment variable (see Configuration):>>> export JUBE_INCLUDE_PATH=some_path:another_path
inside the same directory of your
main.xml
JUBE stops searching as soon as it finds the file to include, or gives an error if the file is not found.
Tagging¶
Tagging is an easy way to hide selected parts of your input file.
The files used for this example can be found inside examples/tagging
.
The input file tagging.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <tags forced="true">
4 <check_tags>deu|eng</check_tags>
5 <tag name="deu">For german strings</tag>
6 <tag name="eng">For english strings</tag>
7 </tags>
8
9 <benchmark name="tagging" outpath="bench_run">
10 <comment>Tags as logical combination</comment>
11
12 <!-- Configuration -->
13 <parameterset name="param_set">
14 <parameter name="hello_str" tag="!deu+eng">Hello</parameter>
15 <parameter name="hello_str" tag="deu|!eng">Hallo</parameter>
16 <parameter name="world_str" tag="eng">World</parameter>
17 </parameterset>
18
19 <!-- Operation -->
20 <step name="say_hello">
21 <use>param_set</use> <!-- use existing parameterset -->
22 <do>echo '$hello_str $world_str'</do> <!-- shell command -->
23 </step>
24 </benchmark>
25</jube>
The input file tagging.yaml
:
1tags:
2 check_tags: deu|eng #check if tag deu or eng was set
3 forced: True
4 tag:
5 - {name: deu, _: For german strings}
6 - {name: eng, _: For english strings}
7
8name: tagging
9outpath: bench_run
10comment: Tags as logical combination
11
12#Configuration
13parameterset:
14 name: param_set
15 parameter:
16 - {name: hello_str, tag: "!deu+eng", _: Hello}
17 - {name: hello_str, tag: deu|!eng, _: Hallo}
18 - {name: world_str, tag: eng, _: World}
19
20#Operation
21step:
22 name: say_hello
23 use: param_set #use existing parameterset
24 do: echo '$hello_str $world_str' #shell command
The tag
attribute and the check_tags
tag allow you to define more complex boolean expressions.
For example:
!
can be used for negation (!deu
stands fornot deu
)+
can be used as an AND operator to combine tag values (e.g. XML:tag="!deu+eng"
; YAML:tag: "!deu+eng"
).|
can be used as an OR operator to combine tag values. (e.g.tag1 | tag2
means that one of the two (or both) tags must be set.)^
can be used as am XOR operator to combine tag values. (e.g.tag1 ^ tag2
means that one of the two tags must be set, but not both.)Parentheses are also allowed
The tag
attribute can be used within any <element>
within the input file (except the <jube>
).
If several different tag
attribute values are used in a script, they can be specified as a list separated by spaces from the command line.
To keep track of all tags, a description for each tag can be defined within the <tags>
, where <check_tags>
is also defined, using
<tag name="...">Description</tag>
. To force a description of all possible tags, the forced
attribute can be set to true,
so that an error is thrown if a tag is not documented.
All <elements>
which contain a special tag="..."
attribute will be hidden if the value of the tag evaluates to false
.
This means that JUBE will ignore the elements with these tags in its internal processing.
Caution: This can lead to erroneous execution if you forget to set the necessary tags for execution, as JUBE will ignore e.g. a <parameter>
provided with the corresponding tag
attribute that evaluates to false.
Careful: This can lead to erroneous execution if you forget to set the necessary tags for execution, as JUBE will no longer consider e.g. parameters provided with the corresponding tag
attribute.
To ensure that the user of the script specifies the necessary tag values that the script needs for successful execution, the check_tag
element (added with JUBE version 2.6.0) can be used.
It allows you to define tag values that must be specified when the script is called in order for it to run successfully.
If none of the required tag
combinations defined by check_tag
are set by the user, an error message is displayed and the run is aborted.
In the example above, check_tags: deu|eng
indicates that deu
or eng
must be set.
When running the example using one of the specific tag
values in <check_tags>
(in this case --tag eng
):
>>> jube run tagging.xml --tag eng
this results in the following output in the stdout
file:
Hello World
Platform independent benchmarking¶
If you want to create platform independent benchmarks you can use the include features inside of JUBE.
All platform related sets must be declared in an includable file e.g. platform.xml
. There can be multiple platform.xml
in different
directories to allow different platforms. By changing the include-path
the benchmark changes its platform specific data.
An example benchmark structure is based on three include files:
The main benchmark include file which contain all benchmark specific but platform independent data
A mostly generic platform include file which contain benchmark independent but platform specific data (this can be created once and placed somewhere central on the system, it can be easily accessed using the
JUBE_INCLUDE_PATH
environment variable.A platform specific and benchmark specific include file which must be placed in a unique directory to allow inlcude-path usage
Inside the platform
directory you will find some example benchmark independent platform configuration files for the supercomputers at
Forschungszentrum Jülich.
To avoid writing long include-paths every time you run a platform independent benchmark, you can store the include-path inside your input file. This can be mixed using the tagging-feature:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <include-path>
4 <path tag="plat1">some path</path>
5 <path tag="plat2">another path</path>
6 ...
7 </include-path>
8 ...
9</jube>
Or in YAML:
1...
2include-path:
3 path:
4 - {tag: plat1, _: "some path"}
5 - {tag: plat2, _: "another path"}
6...
Now you can run your benchmark using:
>>> jube run filename.xml --tag plat1
Multiple benchmarks¶
Often you only have one benchmark inside your input file. But it is also possible to store multiple benchmarks inside the same input file:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="a" outpath="bench_runs">...</benchmark>
4 <benchmark name="b" outpath="bench_runs">...</benchmark>
5 ...
6</jube>
1- name: a
2 # data for benchmark a
3- name: b
4 # data for benchmark b
All benchmarks can use the same global (as a child of <jube>
) declared sets. Often it might be better to use an include feature instead.
JUBE will run every benchmark in the given order. Every benchmark gets a unique benchmark id.
To select only one benchmark you can use:
>>> jube run filename.xml --only-bench a
or:
>>> jube run filename.xml --not-bench b
This information can also be stored inside the input file:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <selection>
4 <only>a</only>
5 <not>b</not>
6 </selection>
7 ...
8</jube>
Environment handling¶
Shell environment handling can be very important to configure paths or parameter of your program.
The files used for this example can be found inside examples/environment
.
The input file environment.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="environment" outpath="bench_run">
4 <comment>An environment handling example</comment>
5
6 <!-- Configuration -->
7 <parameterset name="param_set">
8 <parameter name="EXPORT_ME" export="true">VALUE</parameter>
9 </parameterset>
10
11 <!-- Operations -->
12 <step name="first_step" export="true">
13 <do>export SHELL_VAR=Hello</do> <!-- export a Shell var -->
14 <do>echo "$$SHELL_VAR world"</do><!-- use exported Shell var -->
15 </step>
16
17 <!-- Create a dependency between both steps -->
18 <step name="second_step" depend="first_step">
19 <use>param_set</use>
20 <do>echo $$EXPORT_ME</do>
21 <do>echo "$$SHELL_VAR again"</do> <!-- use exported Shell var out of privious step -->
22 </step>
23 </benchmark>
24</jube>
The input file environment.yaml
:
1name: environment
2outpath: bench_run
3comment: An environment handling example
4
5#Configuration
6parameterset:
7 name: param_set
8 parameter: {name: EXPORT_ME, export: true, _: VALUE}
9
10step:
11 #Operation
12 - name: first_step
13 export: true
14 do:
15 - export SHELL_VAR=Hello #export a Shell var
16 - echo "$$SHELL_VAR world" #use exported Shell var
17
18 #Create a dependency between both steps
19 - name: second_step
20 depend: first_step
21 use: param_set
22 do:
23 - echo $$EXPORT_ME
24 - echo "$$SHELL_VAR again" #use exported Shell var out of privious step
In normal cases all <do>
within one <step>
shares the same environment. All exported variables of one <do>
will be available inside the next <do>
within the same <step>
.
By using export="true"
inside of a <parameter>
you can export additional variables to your Shell environment. Be aware that this example uses
$$
to explicitly use Shell substitution instead of JUBE substitution.
You can also export the complete environment of a step to a dependent step by using export="true"
inside of <step>
.
Parameter dependencies¶
Sometimes you need parameters which are based on other parameters or only a specific parameter combination makes sense and other combinations are useless or wrong. For this there are several techniques inside of JUBE to create such a more complex workflow.
The files used for this example can be found inside examples/parameter_dependencies
.
The input file parameter_dependencies.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="parameter_dependencies" outpath="bench_run">
4 <comment>A parameter_dependencies example</comment>
5
6 <!-- Configuration -->
7 <parameterset name="param_set">
8 <parameter name="index" type="int">0,1</parameter>
9 <parameter name="text" mode="python">["hello","world"][$index]</parameter>
10 </parameterset>
11
12 <parameterset name="depend_param_set0">
13 <parameter name="number" type="int">3,5</parameter>
14 </parameterset>
15
16 <parameterset name="depend_param_set1">
17 <parameter name="number" type="int">1,2,4</parameter>
18 </parameterset>
19
20 <!-- Operation -->
21 <step name="operation">
22 <use>param_set</use> <!-- use basic parameterset -->
23 <use>depend_param_set$index</use> <!-- use dependent parameterset -->
24 <use from="include_file.xml:depend_param_set0:depend_param_set1">
25 depend_param_set$index
26 </use>
27 <do>echo "$text $number $number2"</do>
28 </step>
29 </benchmark>
30</jube>
The input file parameter_dependencies.yaml
:
1name: parameter_dependencies
2outpath: bench_run
3comment: A parameter_dependencies example
4
5#Configuration
6parameterset:
7 - name: param_set
8 parameter:
9 - {name: index, type: int, _: "0,1"} #comma separated integer must be in quotations
10 - {name: text, mode: python, _: '["hello","world"][$index]'} #attributes with " and [] must be in quotations
11 - name: depend_param_set0
12 parameter: {name: number, type: int, _: "3,5"} #comma separated integer must be in quotations
13 - name: depend_param_set1
14 parameter: {name: number, type: int, _: "1,2,4"} #comma separated integer must be in quotations
15
16#Operation
17step:
18 name: operation
19 use:
20 - param_set #use basic parameterset
21 - depend_param_set$index #use dependent parameterset
22 - {from: 'include_file.yaml:depend_param_set0:depend_param_set1', _: depend_param_set$index}
23 do: echo "$text $number $number2"
The include file include_file.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <parameterset name="depend_param_set0">
4 <parameter name="number2" type="int">10</parameter>
5 </parameterset>
6
7 <parameterset name="depend_param_set1">
8 <parameter name="number2" type="int">20</parameter>
9 </parameterset>
10</jube>
The include file include_file.yaml
:
1parameterset:
2 - name: depend_param_set0
3 parameter: {name: number2, type: int, _: 10}
4 - name: depend_param_set1
5 parameter: {name: number2, type: int, _: 20}
The easiest way to handle dependencies is to define an index-parameter which can be used in other scripting parameters to combine all dependent parameter combinations.
Also complete sets can be marked as dependent towards a specific parameter by using this parameter in the <use>
-tag. When using parametersets
out of an other file the correct set-name must be given within the from
attribute, because these sets will be loaded in a pre-processing step before the
corresponding parameter will be evaluated. Also sets out of different files can be combined within the same <use>
by using the
file1:set1,file2:set2
syntax. The sets names must be unique.
Parameter update¶
Once a parameter is specified and evaluated the first time, its value will not change. Sometimes this behaviour can produce the wrong behaviour:
<parameter name="foo">$jube_wp_id</parameter>
In this example foo
should hold the $jube_wp_id
. If you have two steps, where one step depends on the other one foo
will be available in both, but it will only be evaluated in
the first one.
There is a simple workaround to change the update behaviour of a parameter by using the attribute update_mode
:
update_mode="never"
No update (default behaviour)update_mode="use"
Re-evaluate the parameter if the parameterset is explicitly usedupdate_mode="step"
Re-evaluate the parameter for each new stepupdate_mode="cycle"
Re-evaluate the parameter for each new cycleloop, but not at the begin of a new stepupdate_mode="always"
Combine step and cycle
Within a cycle loop no new workpackages can be created. Templates will be reevaluated, but they can not increase the number of existing workpackages within a cycle.
Within the result generation, the parameter value, which is presented in the result table is the value of the selected analysed step. If another parameter representation is needed as well,
all other steps can be reached by using <parameter_name>_<step_name>
.
The files used for this example can be found inside examples/parameter_update
.
The input file parameter_update.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="parameter_updates" outpath="bench_run">
4 <comment>A parameter_update example</comment>
5
6 <!-- Configuration -->
7 <parameterset name="foo">
8 <parameter name="bar_never" mode="text" update_mode="never">
9 iter_never: $jube_wp_id
10 </parameter>
11 <parameter name="bar_use" mode="text" update_mode="use">
12 iter_use: $jube_wp_id
13 </parameter>
14 <parameter name="bar_step" mode="text" update_mode="step">
15 iter_step: $jube_wp_id
16 </parameter>
17 </parameterset>
18
19 <!-- Operation -->
20 <step name="step1">
21 <use>foo</use>
22 <do>echo $bar_never</do>
23 <do>echo $bar_use</do>
24 <do>echo $bar_step</do>
25 </step>
26
27 <step name="step2" depend="step1">
28 <use>foo</use>
29 <do>echo $bar_never</do>
30 <do>echo $bar_use</do>
31 <do>echo $bar_step</do>
32 </step>
33
34 <step name="step3" depend="step2">
35 <do>echo $bar_never</do>
36 <do>echo $bar_use</do>
37 <do>echo $bar_step</do>
38 </step>
39 </benchmark>
40</jube>
The input file parameter_update.yaml
:
1name: parameter_updates
2outpath: bench_run
3comment: A parameter_update example
4
5#Configuration
6parameterset:
7 name: foo
8 parameter:
9 - {name: bar_never, mode: text, update_mode: never, _: "iter_never: $jube_wp_id"}
10 - {name: bar_use, mode: text, update_mode: use, _: "iter_use: $jube_wp_id"}
11 - {name: bar_step, mode: text, update_mode: step, _: "iter_step: $jube_wp_id"}
12
13#Operation
14step:
15 - name: step1
16 use: foo
17 do:
18 - echo $bar_never
19 - echo $bar_use
20 - echo $bar_step
21 - name: step2
22 depend: step1
23 use: foo
24 do:
25 - echo $bar_never
26 - echo $bar_use
27 - echo $bar_step
28 - name: step3
29 depend: step2
30 do:
31 - echo $bar_never
32 - echo $bar_use
33 - echo $bar_step
The use and influence of the three update modes update_mode="never"
, update_mode="use"
and update_mode="step"
is shown here. Keep in mind, that the steps have to be dependent
from each other leading to identical outputs otherwise.
Step iteration¶
Especially in the context of benchmarking, an application should be executed multiple times to generate some meaningful statistical values. The handling of statistical values is described in Statistic pattern values. This allows you to aggregate multiple result lines if your application automatically support to run multiple times.
In addition there is also an iteration feature within JUBE to run a specific step and its parametrisation multiple times.
The files used for this example can be found inside examples/iterations
.
The input file iterations.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="iterations" outpath="bench_run">
4 <comment>A Iteration example</comment>
5
6 <!-- Configuration -->
7 <parameterset name="param_set">
8 <parameter name="foo" type="int">1,2,4</parameter>
9 <parameter name="bar" mode="text" update_mode="step">$foo iter:$jube_wp_iteration</parameter>
10 </parameterset>
11
12 <step name="first_step" iterations="2">
13 <use>param_set</use> <!-- use existing parameterset -->
14 <do>echo $bar</do> <!-- shell command -->
15 </step>
16
17 <step name="second_step" depend="first_step" iterations="2">
18 <do>echo $bar</do> <!-- shell command -->
19 </step>
20
21 <!-- analyse without reduce -->
22 <analyser name="analyse_no_reduce" reduce="false">
23 <analyse step="second_step" />
24 </analyser>
25
26 <!-- Analyse with reduce -->
27 <analyser name="analyse" reduce="true">
28 <analyse step="second_step" />
29 </analyser>
30
31 <result>
32 <use>analyse</use>
33 <use>analyse_no_reduce</use>
34 <table name="result" style="pretty">
35 <column>jube_res_analyser</column>
36 <column>jube_wp_id_first_step</column>
37 <column>jube_wp_id</column>
38 <column>jube_wp_iteration_first_step</column>
39 <column>jube_wp_iteration</column>
40 <column>foo</column>
41 </table>
42 </result>
43 </benchmark>
44</jube>
The input file iterations.yaml
:
1name: iterations
2outpath: bench_run
3comment: A Iteration example
4
5#Configuration
6parameterset:
7 name: param_set
8 parameter:
9 - {name: foo, type: int, _: "1,2,4"}
10 - {name: bar, update_mode: step, _: '$foo iter:$jube_wp_iteration'}
11
12step:
13 - name: first_step
14 iterations: 2
15 use: param_set #use existing parameterset
16 do: echo $bar #shell command
17 - name: second_step
18 depend: first_step
19 iterations: 2
20 do: echo $bar #shell command
21
22analyser:
23 #analyse without reduce
24 - name: analyse_no_reduce
25 reduce: false
26 analyse:
27 step: second_step
28 #analyse with reduce
29 - name: analyse
30 reduce: true
31 analyse:
32 step: second_step
33
34result:
35 use: [analyse,analyse_no_reduce]
36 table:
37 name: result
38 style: pretty
39 column:
40 - jube_res_analyser
41 - jube_wp_id_first_step
42 - jube_wp_id
43 - jube_wp_iteration_first_step
44 - jube_wp_iteration
45 - foo
In this example, both steps 1 and 2 are executed 2 times for each parameter and dependency configuration. Because of the given parameter, step 1 is executed 6 times in total (3 parameter combinations x 2). Step 2 is executed 12 times (6 from the dependent step x 2). Each run will be executed in the normal way using its individual sandbox folder.
$jube_wp_iteration
holds the individual iteration id. The update_mode
is needed here to reevaluate the parameter bar
in step 2.
In the analyser reduce=true
or reduce=false
can be enabled, to allow you to see all individual results or to aggregate all results of the same parameter combination.
for the given step. If reduce=true
is enabled (the default behaviour) the output of the individual runs, which uses the same parametrisation, are treated like a big continuous file
before applying the statistical patterns.
1| jube_res_analyser | jube_wp_id_first_step | jube_wp_id | jube_wp_iteration_first_step | jube_wp_iteration | foo |
2|-------------------|-----------------------|------------|------------------------------|-------------------|-----|
3| analyse_no_reduce | 0 | 6 | 0 | 0 | 1 |
4| analyse_no_reduce | 0 | 7 | 0 | 1 | 1 |
5| analyse_no_reduce | 1 | 8 | 1 | 2 | 1 |
6| analyse_no_reduce | 1 | 9 | 1 | 3 | 1 |
7| analyse_no_reduce | 2 | 10 | 0 | 0 | 2 |
8| analyse_no_reduce | 2 | 11 | 0 | 1 | 2 |
9| analyse_no_reduce | 3 | 12 | 1 | 2 | 2 |
10| analyse_no_reduce | 3 | 13 | 1 | 3 | 2 |
11| analyse_no_reduce | 4 | 14 | 0 | 0 | 4 |
12| analyse_no_reduce | 4 | 15 | 0 | 1 | 4 |
13| analyse_no_reduce | 5 | 16 | 1 | 2 | 4 |
14| analyse_no_reduce | 5 | 17 | 1 | 3 | 4 |
15| analyse | 5 | 16 | 1 | 2 | 4 |
16| analyse | 0 | 7 | 0 | 1 | 1 |
17| analyse | 1 | 8 | 1 | 2 | 1 |
18| analyse | 2 | 10 | 0 | 0 | 2 |
19| analyse | 3 | 12 | 1 | 2 | 2 |
20| analyse | 4 | 15 | 0 | 1 | 4 |
Step cycle¶
Instead of having a new workpackage you can also redo the <do>
commands inside a step using the cycle-feature.
In contrast to the iterations, all executions for the cycle feature take place inside the same folder.
The files used for this example can be found inside examples/cycle
.
The input file cycle.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="cycle" outpath="bench_run">
4 <comment>A cycle example</comment>
5
6 <step name="a_step" cycles="5">
7 <do break_file="done">echo $jube_wp_cycle</do>
8 <do active="$jube_wp_cycle==2">touch done</do>
9 </step>
10
11 </benchmark>
12</jube>
The input file cycle.yaml
:
1---
2benchmark:
3 - name: cycle
4 outpath: bench_run
5 comment: A cycle example
6
7 step:
8 - name: a_step
9 cycles: 5
10 do:
11 - _: echo $jube_wp_cycle
12 break_file: done
13 - _: touch done
14 active: $jube_wp_cycle==2
15
The cycles
attribute allows to repeat all <do>
commands within a step multiple times. The break_file
can be used to cancel the loop and all following commands in the current cycle (the command
itself is still executed). In the given example the output will be:
0
1
2
3
In contrast to the iterations, all executions for the cycle feature take place inside of the same folder.
Parallel workpackages¶
In a standard jube run
a queue is filled with workpackages and then
processed in serial. To enable parallel execution of independent workpackages,
which belong to the expansions of a step, the argument procs
of <step>
can be used.
The files used for this example can be found inside examples/parallel_workpackages
.
The input file parallel_workpackages.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="parallel_workpackages" outpath="bench_run">
4 <comment>A parallel workpackages demo</comment>
5
6 <parameterset name="param_set">
7 <parameter name="i" type="int" mode="python">",".join([ str(i) for i in range(0,10)])</parameter>
8 </parameterset>
9
10 <step name="parallel_execution" suffix="${i}" procs="4">
11 <use>param_set</use>
12 <do>echo "${i}"</do>
13 <do>N=1000000 ; a=1; k=0; while [ "$k" -lt $N ]; do echo $(( 2*k + 1 + $a )) ; k=$(( k + 1 )) ; done</do>
14 </step>
15 </benchmark>
16</jube>
The input file parallel_workpackages.yaml
:
1name: parallel_workpackages
2outpath: bench_run
3comment: A parallel workpackages demo
4
5parameterset:
6 name: param_set
7 parameter: {name: i, type: int, mode: python, _: "\",\".join([ str(i) for i in range(0,10)])"}
8step:
9 name: parallel_execution
10 suffix: ${i}
11 procs: 4
12 use: param_set
13 do:
14 - "echo \"${i}\""
15 - "N=1000000 ; a=1; k=0; while [ \"$k\" -lt $N ]; do echo $(( 2*k + 1 + $a )) ; k=$(( k + 1 )) ; done"
In the example above the expansion of the parameter i
will lead to the
creation of 10 workpackages of the step parallel_execution
. Due to the
given argument procs="4"
JUBE will start 4 worker processes which will
distribute the execution of the workpackages among themselves. N
within the JUBE script represents the number of computation iterations to
simulate a computational workload at hand. The parameters N
, procs
and the upper bound of range
within this prototypical example can be
alternated to study runtime, memory usage and load of CPUs.
Important hints:
<do shared="true">
is not supported ifprocs
is set for the corresponding step.If
<step shared="...">
is set, then the user is responsible to avoid data races within the shared directory.Switching to an alternative
work_dir
for a step can also lead to data races if all expansions of the step access the samework_dir
. Recommendation: Don’t use a sharedwork_dir
in combination withprocs
.This feature is implemented based on the Python package
multiprocessing
and doesn’t support inter-node communication. That’s why the parallelisation is limited to a single shared memory compute node.Be considerate when working on a multi-user system with shared resources. The parallel feature of JUBE can easily exploit a whole compute node.
Parallel execution of a JUBE script can lead to much higher memory demand compared to serial execution with
procs=1
. In this case it is advised to reduceprocs
leading to reduced memory usage.
Result database¶
Results can also be stored into a database to simplify result management.
The files used for this example can be found inside examples/result_database
.
The input file result_database.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="result_database" outpath="bench_run">
4 <comment>result database creation</comment>
5
6 <parameterset name="param_set">
7 <parameter name="number" type="int">1,2,4</parameter>
8 </parameterset>
9
10 <patternset name="pattern">
11 <pattern name="number_pat" type="int">Number: $jube_pat_int</pattern>
12 </patternset>
13
14 <step name="write_number">
15 <use>param_set</use>
16 <do>echo "Number: $number"</do>
17 </step>
18
19 <analyser name="analyse">
20 <use>pattern</use>
21 <analyse step="write_number">
22 <file>stdout</file>
23 </analyse>
24 </analyser>
25
26 <result>
27 <use>analyse</use>
28 <!-- creating a database containing the columns "number" and "number_pat" -->
29 <!-- one table of the name "results" is created within the database -->
30 <!-- optionally, you can use the "file" attribute to specify an alternative storage location for the database -->
31 <database name="results" primekeys="NUM">
32 <key primekey="true">number</key>
33 <key title="NUM">number_pat</key>
34 </database>
35 </result>
36 </benchmark>
37</jube>
The input file result_database.yaml
:
1name: result_database
2outpath: bench_run
3comment: result database creation
4
5parameterset:
6 name: param_set
7 parameter: {name: number, type: int, _: "1,2,4"}
8
9patternset:
10 name: pattern
11 pattern: {name: number_pat, type: int, _: "Number: $jube_pat_int"}
12
13step:
14 name: write_number
15 use: param_set
16 do: "echo \"Number: $number\""
17
18analyser:
19 name: analyse
20 use: pattern
21 analyse:
22 step: write_number
23 file: stdout
24
25result:
26 use: analyse
27 database:
28 # creating a database containing the columns "number" and "number_pat"
29 # one table of the name "results" is created within the database
30 # optionally, you can use the "file" attribute to specify an alternative storage location for the database
31 name: results
32 primekeys: "NUM"
33 key:
34 - {primekey: true, _: number}
35 - {title: NUM, _: number_pat}
The default database will be located as follows and has the database
tag name, which is here results
, as root name concatenated with the appendix .dat
:
1bench_run
2|
3+- 000000
4 |
5 +- result
6 |
7 +- results.dat
The database
tag takes the argument name
. name
is also the name of the table created within a database. If sqlite3
is installed the contents of the database can be shown with the following command line.
1>>> sqlite3 -header -table bench_run/000000/result/results.dat 'SELECT * FROM results'
2+--------+------------+
3| number | number_pat |
4+--------+------------+
5| 1 | 1 |
6| 2 | 2 |
7| 4 | 4 |
8+--------+------------+
The key
tag adds columns to the database table having the same type as the corresponding parameter
or pattern
. Information of columns of the database table results
can be shown as follows.
1>>> sqlite3 -header -table bench_run/000000/result/results.dat 'PRAGMA table_info(results)'
2+-----+------------+------+---------+------------+----+
3| cid | name | type | notnull | dflt_value | pk |
4+-----+------------+------+---------+------------+----+
5| 0 | number | int | 0 | | 1 |
6| 1 | number_pat | int | 0 | | 2 |
7+-----+------------+------+---------+------------+----+
The file
argument takes a relative (to the current working directory) or absolute path to an alternative/user-defined location for the database file. Assuming that file="result_database.dat"
was set in the above example, a file named result_database.dat
would be created in the current working directory where jube result
was invoked, containing a database named results
, and the file bench_run/000000/result/results.dat
would no longer contain the database, but the path specified in the file
attributes.
Invocating jube result
a second time will update the database specified by the file
parameter. Without the primekey
attribute of the key
-tag, three additional rows would be added to the results
table which are identical to the already existing three rows. Setting the primekey
attribute of the key
-tag to true
adds the key
to the primekeys
, ensuring that a new row is only added to the database table if the column values specified in the primekeys
do not match exactly in the database table. In this example, no new rows would be added. The use of the database
attribute primekeys
is deprecated. Instead, use the primekey
attribute of the key
. Updating primekeys
is not supported.
To have a look into a database within a python script the python modules sqalchemy or pandas can be used.
Creating a do log¶
To increase reproducibility of the do statements within a workpackage of a step and to archive the environment during execution, a do log can be printed. A do log tries to mimic an executable script recreating the environment at execution time. The files used for this example can be found inside examples/do_log
.
The input file do_log.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="do_log_example" outpath="bench_run">
4
5 <parameterset name="param_set">
6 <parameter name="number">1,2,3,4,5</parameter>
7 </parameterset>
8
9 <step name="execute" shared="shared" do_log_file="do_log">
10 <use>param_set</use>
11 <do>cp ../../../../loreipsum${number} shared</do>
12 <do shared="true">grep -r -l "Hidden!" loreipsum*</do>
13 </step>
14
15 </benchmark>
16</jube>
The input file do_log.yaml
:
1- name: do_log_example
2 outpath: bench_run
3
4 parameterset:
5 - name: "param_set"
6 parameter:
7 - {name: "number", _: "1,2,3,4,5"}
8
9 step:
10 name: execute
11 use:
12 - param_set
13 do_log_file: "do_log"
14 shared: "shared"
15 do:
16 - cp ../../../../loreipsum${number} shared
17 - {shared: "true", _: "grep -r -l \"Hidden!\" loreipsum*"}
In this example a hidden string is searched for within 5 files and the name of the file containing the hidden string is printed.
For the initial execution of this example within bench_run/000000/00000[0-4]_execute
each can be found a do_log
file. These files can be executed manually by prefixing it with /bin/sh
. The scripts will reproduce the environment at execution time, the execution and the result output. Keep in mind that the shared grep
will be executed by the benchmark with id 4 only.
The duplicate option¶
To simplify advanced tagging and parameter concatenation the duplicate
option within parametersets or parameters can be stated.
The input file duplicate.xml
:
1<?xml version="1.0" encoding="UTF-8"?>
2<jube>
3 <benchmark name="parameter_duplicate_example" outpath="bench_run">
4 <comment>parameter duplicate example</comment>
5
6 <parameterset name="options" duplicate="concat">
7 <parameter name="iterations" >1</parameter>
8 <parameter name="iterations" tag="few" >2,3,4</parameter>
9 <parameter name="iterations" tag="many">20,30,40</parameter>
10 </parameterset>
11
12 <parameterset name="result">
13 <parameter name="sum" mode="python">int(${iterations}*(${iterations}+1)/2)</parameter>
14 </parameterset>
15
16 <step name="perform_iterations">
17 <use>options,result</use>
18 <do>echo $sum</do>
19 </step>
20
21 </benchmark>
22</jube>
The input file duplicate.yaml
:
1name: parameter_duplicate_example
2outpath: bench_run
3comment: parameter duplicate example
4
5parameterset:
6 - name: options
7 duplicate: concat
8 parameter:
9 - {name: iterations, _: "1"}
10 - {name: iterations, tag: few, _: "2,3,4"}
11 - {name: iterations, tag: many, _: "20,30,40"}
12 - name: result
13 parameter:
14 - {name: sum, mode: "python", _: "int(${iterations}*(${iterations}+1)/2)"}
15
16step:
17 name: perform_iterations
18 use: "options,result"
19 do: "echo $sum"
In this example the duplicate
option with the value concat
is stated for a parameterset. This leads to a concatenation of parameter values of the same name. In combination with the tagging option for parameters the user can specify which options are included into the parameters. If the user states the tags few
and many
the parameter iterations
takes the values 1,2,3,4,20,30,40
.
The default option of duplicate
for parametersets is replace
which leads to a replacing of parameters if they are mentioned more than once. A third option for the duplicate
option for parametersets is error
. In this case the execution is aborted if a parameter is defined more than once.
The option duplicate
can also be stated for parameters. In this case the parameters duplicate
option is prioritized over the parametersets one. The possible values for parameters duplicate
option are none
, replace
, concat
and error
. none
is the default value and leads to the duplicate
option being ignored for this parameter such that the parametersets duplicate
option is taking precedence. The other three options have the same effect as in the parameterset.