JUBE tutorial

This tutorial is meant to give you an overview about the basic usage of JUBE.

Installation

Requirements: JUBE needs Python 2.7 or Python 3.2 (or any higher version)

You also can use Python 2.6 to run JUBE. In this case you have to add the argparse-module to your Python module library on your own.

If you plan to use YAML based JUBE input files, you have to add the pyyaml-module to your Python module library.

To use the JUBE command line tool, the PYTHONPATH must contain the position of the JUBE package. This can be achieved in three different ways:

  • You can use the installation script to copy all files to the right position (preferred):

    >>> python setup.py install --user
    

    This will install the JUBE package files and executables to your $HOME/.local directory. Instead of --user also a user specific --prefix option is available. Here you might have to set the PYTHONPATH environment variable first (this will be mentioned during the install process).

  • You can add the parent folder path of the JUBE package-folder (jube2 directory) to the PYTHONPATH environment variable:

    >>> export PYTHONPATH=<parent folder path>:$PYTHONPATH
    
  • You can move the JUBE package by hand to an existing Python package folder like site-packages

To use the JUBE command line tool like a normal command line command you can add it to the PATH environment variable:

>>> export PATH=$HOME/.local/bin:$PATH

To check your final installation, you can use

>>> jube --version

which should highlight the current version number.

Configuration

The main JUBE configuration bases on the given input configuration file. But in addition, some shell environment variables are available which can be used to set system specific options:

  • JUBE_INCLUDE_PATH: Can contain a list of paths (seperated by :) pointing to directories, which contain system relevant include configuration files. This technique can be used to store platform specific parameter in a platform specific directory.
  • JUBE_EXEC_SHELL: JUBE normally uses /bin/sh to execute the given shell commands. This default shell can be changed by using this environment variable.
  • JUBE_GROUP_NAME: JUBE will use the given UNIX groupname to share benchmarks between different users. The group must exist and the JUBE user must be part of this group. The given group will be the owner of new benchmark runs. By default (without setting the environment variable) all file and directory permissions are defined by the normal UNIX rules.

BASH autocompletion can be enabled by using the eval "$(jube complete)" command. You can store the command in your bash profile settings if needed.

Input format

JUBE supports two different types of input formats: XML based files and YAML based files. Both formats support the same amount of JUBE features and you can select your more preffered input format.

The following sections will always show all examples using both formats. However the explanations will mostly stick to the XML format but can be easily transfered to the YAML solution.

Both formats depends on a specifc special scharacter handling. More details can be found in the following FAQ sections:

Internally JUBE always uses the XML based format, by converting YAML based configuration files into XML if necessary. This is why parsing error messages might point to XML errors even if the YAML format was used.

Hello World

In this example we will show you the basic structure of a JUBE input file and the basic command line options.

The files used for this example can be found inside examples/hello_world.

The input file hello_world.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="hello_world" outpath="bench_run">
    <comment>A simple hello world</comment>

    <!-- Configuration -->
    <parameterset name="hello_parameter">
      <parameter name="hello_str">Hello World</parameter>
    </parameterset>
    
    <!-- Operation -->
    <step name="say_hello">
      <use>hello_parameter</use> <!-- use existing parameterset -->
      <do>echo $hello_str</do> <!-- shell command -->
    </step>    
  </benchmark>
</jube>

The input file hello_world.yaml:

name: hello_world
outpath: bench_run
comment: A simple hello world

#Configuration
parameterset:
  name: hello_parameter
  parameter: {name: hello_str,  _: Hello World}

#Operation
step:
  name: say_hello
  use: hello_parameter #use existing parameter
  do: echo $hello_str #shell command

Every JUBE XML based input file starts (after the general XML header line) with the root tag <jube>. This root tag must be unique. XML does not allow multiple root tags.

The first tag which contains benchmark specific information is <benchmark>. hello_world is the benchmarkname which can be used to identify the benchmark (e.g. when there are multiple benchmarks inside a single input file, or when different benchmarks use the same run directory).

The outpath describes the benchmark run directory (relative to the position of the input file). This directory will be managed by JUBE and will be automatically created if it does not exist. The directory name and position are very important, because they are the main interface to communicate with your benchmark, after it was submitted.

Using the <comment> you can store some benchmark related comments inside the benchmark directory. You can also use normal XML-comments to structure your input-file:

<!-- your comment -->

In this benchmark a <parameterset> is used to store the single <parameter name="hello_str">. The name of the parameter should contain only letters, numbers (should not be the first character) or the _ (like a normal Python identifier). The name of the parameterset must be unique (relative to the current benchmark). In further examples we will see that there are more types of sets, which can be distinguished by their names. Also the name of the parameter must be unique (relative to the parameterset).

The <step> contains the operation tasks. The name must be unique. It can use different types of existing sets. All used sets must be given by name using the <use>. There can be multiple <use> inside the same <step> and also multiple names within the same <use> are allowed (separated by ,). Only sets, which are explicitly used, are available inside the step! The <do> contains a single shell command. This command will run inside of a sandbox directory environment (inside the outpath directory tree). The step and its corresponding parameter space is named workpackage.

Available parameters can be used inside the shell commands. To use a parameter you have to write

$parametername

or

${parametername}

The brackets must be used if you want variable concatenation. $hello_strtest will not be replaced, ${hello_str}test will be replaced. If a parameter does not exist or isn’t available the variable will not be replaced! If you want to use $ inside your command, you have to write $$ to mask the symbol. Parameter substitution will be done before the normal shell substitution!

To run the benchmark just type:

>>> jube run hello_world.xml

This benchmark will produce the follwing output:

######################################################################
# benchmark: hello_world
# id: 0
#
# A simple hello world
######################################################################

Running workpackages (#=done, 0=wait, E=error):
############################################################ (  1/  1)

  |  stepname | all | open | wait | error | done |
  |-----------|-----|------|------|-------|------|
  | say_hello |   1 |    0 |    0 |     0 |    1 |

>>>> Benchmark information and further useful commands:
>>>>       id: 0
>>>>   handle: bench_run
>>>>      dir: bench_run/000000
>>>>  analyse: jube analyse bench_run --id 0
>>>>   result: jube result bench_run --id 0
>>>>     info: jube info bench_run --id 0
>>>>      log: jube log bench_run --id 0
######################################################################

As you can see, there was a single step say_hello, which runs one shell command echo $hello_str that will be expanded to echo Hello World.

The id is (in addition to the benchmark directory handle) an important number. Every benchmark run will get a new unique id inside the benchmark directory.

Inside the benchmark directory you will see the follwing structure:

bench_run               # the given outpath
|
+- 000000               # the benchmark id
   |
   +- configuration.xml # the stored benchmark configuration
   +- workpackages.xml  # workpackage information
   +- run.log           # log information
   +- 000000_say_hello  # the workpackage
      |
      +- done           # workpackage finished marker
      +- work           # user sandbox folder
         |
         +- stderr      # standard error messages of used shell commands
         +- stdout      # standard output of used shell commands

stdout will contain Hello World in this example case.

Help

JUBE contains a command line based help functionality:

>>> jube help <keyword>

By using this command you will have direct access to all keywords inside the glossary.

Another useful command is the info command. It will show you information concerning your existing benchmarks:

1
2
3
4
5
6
# display a list of existing benchmarks
>>> jube info <benchmark-directory>
# display information about given benchmark
>>> jube info <benchmark-directory> -- id <id>
# display information about a step inside the given benchmark
>>> jube info <benchmark-directory> -- id <id> --step <stepname>

The third, also very important, functionality is the logger. Every run, continue, analyse and result execution will produce log information inside your benchmark directory. This file contains much useful debugging output.

You can easily access these log files by using the JUBE log viewer command:

>>> jube log [benchmark-directory] [--id id] [--command cmd]

e.g.:

>>> jube log bench_runs --command run

will display the run.log of the last benchmark found inside of bench_runs.

Log output can also be displayed during runtime by using the verbose output:

>>> jube -v run <input-file>

-vv can be used to display stdout output during runtime and -vvv will display the stdout output as well as the log output at the same time.

Since the parsing step is done before creating the benchmark directory, there will be a jube-parse.log inside your current working directory, which contains the parser log information.

Errors within a <do> command will create a log entry and stop further execution of the corresponding parameter combination. Other parameter combinations will still be executed by default. JUBE can also stop automatically any further execution by using the -e option:

>>> jube run -e <input-file>

There is also a debugging mode integrated in JUBE:

>>> jube --debug <command> [other-args]

This mode avoids any shell execution but will generate a single log file (jube-debug.log) in your current working directory.

Parameter space creation

In this example we will show you an important feature of JUBE: The automatic parameter space generation.

The files used for this example can be found inside examples/parameterspace.

The input file parameterspace.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="parameterspace" outpath="bench_run">
    <comment>A parameterspace example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <!-- Create a parameterspace out of two template parameter -->
      <parameter name="number" type="int">1,2,4</parameter>
      <parameter name="text" separator=";">Hello;World</parameter>
    </parameterset>
    
    <!-- Operation -->
    <step name="say_hello">
      <use>param_set</use> <!-- use existing parameterset -->
      <do>echo "$text $number"</do> <!-- shell command -->
    </step>    
  </benchmark>
</jube>

The input file parameterspace.yaml:

name: parameterspace
outpath: bench_run
comment: A parameterspace example

#Configuration
parameterset:
  name: param_set
  #Create a parameterspace out of two template parameter
  parameter:
    - {name: number, type: int, _: "1,2,4"} #comma separated integer must be quoted
    - {name: text, separator: ;, _: Hello;World}

#Operation
step:
  name: say_hello
  use: param_set #use existing parameterset
  do: echo "$text $number" #shell command

Whenever a parameter contains a , (this can be changed using the separator attribute) this parameter becomes a template. A step which uses the parameterset containing this parameter will run multiple times to iterate over all possible parameter combinations. In this example the step say_hello will run 6 times:

|  stepname | all | open | wait | error | done |
|-----------|-----|------|------|-------|------|
| say_hello |   6 |    0 |    0 |     0 |    6 |

Every parameter combination will run in its own sandbox directory.

Another new keyword is the type attribute. The parameter type is not used inside the substitution process, but for sorting operations inside the result creation. The default type is string. Possible basic types are string, int and float.

Step dependencies

If you start writing a complex benchmark structure, you might want to have dependencies between different steps, for example between a compile and the execution step. JUBE can handle these dependencies and will also preserve the given parameter space.

The files used for this example can be found inside examples/dependencies.

The input file dependencies.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="dependencies" outpath="bench_run">
    <comment>A Dependency example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <parameter name="number" type="int">1,2,4</parameter>
    </parameterset>
    
    <!-- Operations -->
    <step name="first_step">
      <use>param_set</use> <!-- use existing parameterset -->
      <do>echo $number</do> <!-- shell command -->
    </step>
    
    <!-- Create a dependency between both steps -->
    <step name="second_step" depend="first_step">
      <do>cat first_step/stdout</do> <!-- shell command -->
    </step>    
  </benchmark>
</jube>

The input file dependencies.yaml:

name: dependencies
outpath: bench_run
comment: A Dependency example

#Configuration
parameterset:
  name: param_set
  parameter: {name: number, type: int,  _: "1,2,4" } #comma separated integers must be quoted

#Operation
step:
  - name: first_step
    use: param_set #use existing parameterset
    do: echo $number #shell command
  - name: second_step
    depend: first_step #Create a dependency between both steps
    do: cat first_step/stdout #shell command

In this example we create a dependency between first_step and second_step. After first_step is finished, the corresponding second_step will start. Steps can also have multiple dependencies (separated by , in the definition), but circular definitions will not be resolved. A dependency is a unidirectional link!

To communicate between a step and its dependency there is a link inside the work directory pointing to the corresponding dependency step work directory. In this example we use

cat first_step/stdout

to write the stdout-file content of the dependency step into the stdout-file of the current step.

Because the first_step uses a template parameter which creates three execution runs, there will also be three second_step runs each pointing to different first_step-directories:

|    stepname | all | open | wait | error | done |
|-------------|-----|------|------|-------|------|
|  first_step |   3 |    0 |    0 |     0 |    3 |
| second_step |   3 |    0 |    0 |     0 |    3 |

Loading files and substitution

Every step runs inside a unique sandbox directory. Usually, you will need to have external files inside this directory (e.g. the source files) and in some cases you want to change a parameter inside the file based on your current parameter space. There are two additional set-types which handle this behaviour inside of JUBE.

The files used for this example can be found inside examples/files_and_sub.

The input file files_and_sub.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="files_and_sub" outpath="bench_run">
    <comment>A file copy and substitution example</comment>

    <!-- Configuration -->
    <parameterset name="param_set">
      <parameter name="number" type="int">1,2,4</parameter>
    </parameterset>

    <!-- Files -->
    <fileset name="files">
      <copy>file.in</copy>
    </fileset>

    <!-- Substitute -->
    <substituteset name="substitute">
      <!-- Substitute files -->
      <iofile in="file.in" out="file.out" />
      <!-- Substitute commands -->
      <sub source="#NUMBER#" dest="$number" />
    </substituteset>

    <!-- Operation -->
    <step name="sub_step">
      <use>param_set</use> <!-- use existing parameterset -->
      <use>files</use>        <!-- use existing fileset -->
      <use>substitute</use>   <!-- use existing substituteset -->
      <do>cat file.out</do>   <!-- shell command -->
    </step>
  </benchmark>
</jube>

The input file files_and_sub.yaml:

name: files_and_sub
outpath: bench_run
comment: A file copy and substitution example

#Configuration
parameterset:
  name: param_set
  parameter: {name: number, type: int,  _: "1,2,4"} #comma separated integers must be quoted

#Files
fileset:
  name: files
  copy: file.in

#Substitute
substituteset:
  name: substitute
  iofile: {in: file.in, out: file.out}
  sub: {source: "#NUMBER#", dest: $number} #"#" must be quoted

#Operation
step:
  name: sub_step
  use: 
    - param_set #use existing parameterset
    - files #use existing fileset
    - substitute #use existing substituteset
  do: cat file.out #shell command

The content of file file.in:

Number: #NUMBER#

Inside the <fileset> the current location (relative to the current input file; also absolute paths are allowed) of files is defined. <copy> specifies that the file should be copied to the sandbox directory when the fileset is used. Also a <link> option is available to create a symbolic link to the given file inside the sandbox directory.

If there are additional operations needed to prepare your files (e.g. expand a tar-file). You can use the <prepare>-tag inside your <fileset>.

The <substituteset> describes the substitution process. The <iofile> contains the input and output filename. The path is relative to the sandbox directory. Because we do/should not know that location we use the fileset to copy file.in to this directory.

The <sub> specifies the substitution. All occurrences of source will be substituted by dest. As you can see, you can use parameters inside the substitution.

There is no <use> inside any set. The combination of all sets will be done inside the <step>. So if you use a parameter inside a <sub> you must also add the corresponding <parameterset> inside the <step> where you use the <substituteset>!

In the sub_step we use all available sets. The use order is not relevant. The normal execution process will be:

  1. Parameter space expansion
  2. Copy/link files
  3. Prepare operations
  4. File substitution
  5. Run shell operations

The resulting directory-tree will be:

bench_run               # the given outpath
|
+- 000000               # the benchmark id
   |
   +- configuration.xml # the stored benchmark configuration
   +- workpackages.xml  # workpackage information
   +- 000000_sub_step   # the workpackage ($number = 1)
      |
      +- done           # workpackage finished marker
      +- work           # user sandbox folder
         |
         +- stderr      # standard error messages of used shell commands
         +- stdout      # standard output of used shell commands (Number: 1)
         +- file.in     # the file copy
         +- file.out    # the substituted file
   +- 000001_sub_step   # the workpackage ($number = 2)
      |
      +- ...
   +- ...

Creating a result table

Finally, after running the benchmark, you will get several directories. JUBE allows you to parse your result files distributed over these directories to extract relevant data (e.g. walltime information) and create a result table.

The files used for this example can be found inside examples/result_creation.

The input file result_creation.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="result_creation" outpath="bench_run">
    <comment>A result creation example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <!-- Create a parameterspace with one template parameter -->
      <parameter name="number" type="int">1,2,4</parameter>
    </parameterset>
    
    <!-- Regex pattern -->
    <patternset name="pattern">
      <pattern name="number_pat" type="int">Number: $jube_pat_int</pattern>
    </patternset>
    
    <!-- Operation -->
    <step name="write_number">
      <use>param_set</use> <!-- use existing parameterset -->
      <do>echo "Number: $number"</do> <!-- shell command -->
    </step>
    
    <!-- Analyse -->
    <analyser name="analyse">
      <use>pattern</use> <!-- use existing patternset -->
      <analyse step="write_number">
        <file>stdout</file> <!-- file which should be scanned -->
      </analyse>
    </analyser>
    
    <!-- Create result table -->
    <result>
      <use>analyse</use> <!-- use existing analyser -->
      <table name="result" style="pretty" sort="number">
        <column>number</column>
        <column>number_pat</column>
      </table>
    </result>
  </benchmark>
</jube>

The input file result_creation.yaml:

name: result_creation
outpath: bench_run
comment: A result_creation creation example

#Configuration
parameterset:
  name: param_set
  #Create a parameterspace with one template parameter
  parameter: {name: number, type: int, _: "1,2,4"} #comma separated integer must be quoted

#Regex pattern
patternset:
  name: pattern
  pattern: {name: number_pat, type: int, _: "Number: $jube_pat_int"} # ":" must be quoted

#Operation
step:
  name: write_number
  use: param_set #use existing parameterset
  do: 'echo "Number: $number"' #shell command

#Analyse
analyser:
  name: analyse
  use: pattern #use existing patternset
  analyse:
    step: write_number
    file: stdout #file which should be scanned

#Create result table
result:
  use: analyse #use existing analyser
  table:
    name: result
    style: pretty
    sort: number
    column: [number,number_pat]

Using <parameterset> and <step> we create three workpackages. Each writing Number: $number to stdout.

Now we want to parse these stdout files to extract information (in this example case the written number). First of all we have to declare a <patternset>. Here we can describe a set of <pattern>. A <pattern> is a regular expression which will be used to parse your result files and search for a given string. In this example we only have the <pattern> number_pat. The name of the pattern must be unique (based on the usage of the <patternset>). The type is optional. It is used when the extracted data will be sorted. The regular expression can contain other patterns or parameters. The example uses $jube_pat_int which is a JUBE default pattern matching integer values. The pattern can contain a group, given by brackets (...), to declare the extraction part ($jube_pat_int already contains these brackets).

E.g. $jube_pat_int and $jube_pat_fp are defined in the following way:

<pattern name="jube_pat_int" type="int">([+-]?\d+)</pattern>
<pattern name="jube_pat_fp" type="float">([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)</pattern>

If there are multiple matches inside a single file you can add a reduce option. By default, only the first match will be extracted.

To use your <patternset> you have to specify the files which should be parsed. This can be done using the <analyser>. It uses relevant patternsets. Inside the <analyse> a step-name and a file inside this step is given. Every workpackage file combination will create its own result entry.

The analyser automatically knows all parameters which were used in the given step and in depending steps. There is no <use> option to include additional <parameterset> that have not been already used within the analysed <step>.

To run the anlayse you have to write:

>>> jube analyse bench_run

The analyse data will be stored inside the benchmark directory.

The last part is the result table creation. Here you have to use an existing analyser. The <column> contains a pattern or a parameter name. sort is the optional sorting order (separated by ,). The style attribute can be csv, pretty or aligned to get different ASCII representations.

To create the result table you have to write:

>>> jube result bench_run -i last

If you run the result command for the first time, the analyse step will be executed automatically, if it wasn’t executed before. So it is not necessary to run the separate analyse step all the time. However you need the separate analyse if you want to force a re-run of the analyse step, otherwise only the stored values of the first analyse will be used in the result step.

The result table will be written to STDOUT and into a result.dat file inside bench_run/<id>/result. The last is the default option and can also be replaced by a specific benchmark id. If the id selection is missing a combined result table of all available benchmark runs from the bench_run directory will be created.

Output of the given example:

| number | number_pat |
|--------|------------|
|      1 |          1 |
|      2 |          2 |
|      4 |          4 |

The analyse and result instructions can be combined within one single command:

>>> jube result bench_run -a

This was the last example of the basic JUBE tutorial. Next you can start the advanced tutorial to get more information about including external sets, jobsystem representation and scripting parameter.