JUBE tutorial

This tutorial is meant to give you an overview about the basic usage of JUBE.

Installation

Requirements: JUBE needs Python 2.7 or Python 3.2 (or any higher version)

You also can use Python 2.6 to run JUBE. In this case you had to add the argparse-module to your Python module library on your own.

To use the JUBE command line tool, the PYTHONPATH must contain the position of the JUBE package. This can be achieved in three different ways:

  • You can use the installation tool to copy all files to the right position (preferred):

    >>> python setup.py install --user
    

    This will install the JUBE package and the binary to your $HOME/.local directory. Instead of --user also a user specific --prefix option is available. Here you might have to set the PYTHONPATH environment variable first (this will be mentioned during the install process).

  • You can add the parent folder path of the JUBE package-folder (jube2 directory) to the PYTHONPATH environment variable:

    >>> export PYTHONPATH=<parent folder path>:$PYTHONPATH
    
  • You can move the JUBE package by hand to an existing Python package folder like site-packages

To use the JUBE command line tool like a normal command line command you can add it to the PATH environment variable:

>>> export PATH=$HOME/.local/bin:$PATH

To check your final installation, you can use

>>> jube --version

which should highlight you the current version number.

Configuration

The main JUBE configuration bases on the given input configuration file. But in addition, some shell environment variables are available which can be used to set system specific options:

  • JUBE_INCLUDE_PATH: Can contain a list of pathes (seperated by :) pointing to directories, which contain system relevant include configuration files. This technique can be used to store platform specific parameter in a platform specific directory.
  • JUBE_EXEC_SHELL: JUBE normaly uses /bin/sh to execute the given shell commands. This default shell can be changed by using this environment variable.
  • JUBE_GROUP_NAME: JUBE will use the given UNIX groupname to share benchmarks between different users. The group must exist and the JUBE user must be part of this group. The given group will be the owner of new benchmark runs. By default (without setting the environment variable) all file and directory permissions are definied by the normal UNIX rules.

BASH autocompletion can be enabled by using the eval "$(jube complete)" command. You can store the command in your bash profile settings if needed.

Hello World

In this example we will show you the basic structure of a JUBE input file and the basic command line options.

The files used for this example can be found inside examples/hello_world.

The input file hello_world.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="hello_world" outpath="bench_run">
    <comment>A simple hello world</comment>

    <!-- Configuration -->
    <parameterset name="hello_parameter">
      <parameter name="hello_str">Hello World</parameter>
    </parameterset>
    
    <!-- Operation -->
    <step name="say_hello">
      <use>hello_parameter</use> <!-- use existing parameterset -->
      <do>echo $hello_str</do> <!-- shell command -->
    </step>    
  </benchmark>
</jube>

Every JUBE input file starts (after the general XML header line) with the root tag <jube>. This root tag must be unique. XML does not allow multiple root tags.

The first tag which contains benchmark specific information is <benchmark>. hello_world is the benchmarkname which can be used to identify the benchmark (e.g. when there are multiple benchmarks inside a single input file, or when different benchmarks use the same run directory).

The outpath describes the benchmark run directory (relative to the position of the input file). This directory will be managed by JUBE and will be automatically created if it does not exist. The directory name and position are very important, because they are the main interface to communicate with your benchmark, after it was submitted.

Using the <comment> you can store some benchmark related comments inside the benchmark directory. You can also use normal XML-comments to structure your input-file:

<!-- your comment -->

In this benchmark a <parameterset> is used to store the single <parameter name="hello_str">. The name of the parameter should contain only letters, numbers (should not be the first character) or the _ (like a normal Python identifier). The name of the parameterset must be unique (relative to the current benchmark). In further examples we will see that there are more types of sets, which can be distinguished by their names. Also the name of the parameter must be unique (relative to the parameterset).

The <step> contains the operation tasks. The name must be unique. It can use different types of existing sets. All used sets must be given by name using the <use>. There can be multiple <use> inside the same <step> and also multiple names within the same <use> are allowed (separated by ,). Only sets, which are explicitly used, are available inside the step! The <do> contains a single shell command. This command will run inside of a sandbox directory environment (inside the outpath directory tree). The step and its corresponding parameter space is named workpackage.

Available parameters can be used inside the shell commands. To use a parameter you have to write

$parametername

or

${parametername}

The brackets must be used if you want variable concatenation. $hello_strtest will not be replaced, ${hello_str}test will be replaced. If a parameter does not exist or isn't available the variable will not be replaced! If you want to use $ inside your command, you have to write $$ to mask the symbol. Parameter substitution will be done before the normal shell substitution!

To run the benchmark just type:

>>> jube run hello_world.xml

This benchmark will produce the follwing output:

######################################################################
# benchmark: hello_world

A simple hello world
######################################################################

Running workpackages (#=done, 0=wait, E=error):
############################################################ (  1/  1)

   stepname | all | open | wait | error | done
  ----------+-----+------+------+-------+-----
  say_hello |   1 |    0 |    0 |     0 |    1

>>>> Benchmark information and further useful commands:
>>>>       id: 0
>>>>   handle: bench_run
>>>>      dir: bench_run/000000
>>>>  analyse: jube analyse bench_run --id 0
>>>>   result: jube result bench_run --id 0
>>>>     info: jube info bench_run --id 0
######################################################################

As you can see, there was a single step say_hello, which runs one shell command echo $hello_str that will be expanded to echo Hello World.

The id is (in addition to the benchmark directory handle) an important number. Every benchmark run will get a new unique id inside the benchmark directory.

Inside the benchmark directory you will see the follwing structure:

bench_run               # the given outpath
|
+- 000000               # the benchmark id
   |
   +- configuration.xml # the stored benchmark configuration
   +- workpackages.xml  # workpackage information
   +- run.log           # log information
   +- 000000_say_hello  # the workpackage
      |
      +- done           # workpackage finished marker
      +- work           # user sanbox folder
         |
         +- stderr      # standard error messages of used shell commands
         +- stdout      # standard output of used shell commands

stdout will contain Hello World in this example case.

Help

JUBE contains a command line based help functionality:

>>> jube help <keyword>

By using this command you will have direct access to all keywords inside the glossary.

Another useful command is the info command. It will show you information concerning your existing benchmarks:

1
2
3
4
5
6
# display a list of existing benchmarks
>>> jube info <benchmark-directory>
# display information about given benchmark
>>> jube info <benchmark-directory> -- id <id>
# display information about a step inside the given benchmark
>>> jube info <benchmark-directory> -- id <id> --step <stepname>

The third, also very important, functionality is the logger. Every run, continue, analyse and result execution will produce log information inside your benchmark directory. This file contains much useful debugging output.

You can easily access these log files by using the JUBE log viewer command:

>>> jube log [benchmark-directory] [--id id] [--command cmd]

e.g.:

>>> jube log bench_runs --command run

will display the run.log of the last benchmark found inside of bench_runs.

Log output can also be displayed during runtime by using the verbose output:

>>> jube -v run <input-file>

-vv can be used to display stdout output during runtime and -vvv will display the stdout output as well as the log output at the same time.

Since the parsing step is done before creating the benchmark directory, there will be a jube-parse.log inside your current working directory, which contains the parser log information.

Errors within a <do> command will create a log entry and stop further execution of the corresponding parameter combination. Other parameter combinations will still be executed by default. JUBE can also stop automatically any further execution by using the -e option:

>>> jube run -e <input-file>

There is also a debugging mode integrated in JUBE:

>>> jube --debug <command> [other-args]

This mode avoids any shell execution but will generate a single log file (jube-debug.log) in your current working directory.

Parameter space creation

In this example we will show you an important feature of JUBE: The automatic parameter space generation.

The files used for this example can be found inside examples/parameterspace.

The input file parameterspace.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="parameterspace" outpath="bench_run">
    <comment>A parameterspace creation example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <!-- Create a parameterspace out of two template parameter -->
      <parameter name="number" type="int">1,2,4</parameter>
      <parameter name="text" separator=";">Hello;World</parameter>
    </parameterset>
    
    <!-- Operation -->
    <step name="say_hello">
      <use>param_set</use> <!-- use existing parameterset -->
      <do>echo "$text $number"</do> <!-- shell command -->
    </step>    
  </benchmark>
</jube>

Whenever a parameter contains a , (this can be changed using the separator attribute) this parameter becomes a template. A step which uses the parameterset containing this parameter will run multiple times to iterate over all possible parameter combinations. In this example the step say_hello will run 6 times:

 stepname | all | open | wait | error | done
----------+-----+------+------+-------+-----
say_hello |   6 |    0 |    0 |     0 |    6

Every parameter combination will run in its own sandbox directory.

Another new keyword is the type attribute. The parameter type is not used inside the substitution process, but for sorting operations inside the result creation. The default type is string. Possible basic types are string, int and float.

Step dependencies

If you start writing a complex benchmark structure, you might want to have dependencies between different steps, for example between a compile and the execution step. JUBE can handle these dependencies and will also preserve the given parameter space.

The files used for this example can be found inside examples/dependencies.

The input file dependencies.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="dependencies" outpath="bench_run">
    <comment>A Dependency example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <parameter name="number" type="int">1,2,4</parameter>
    </parameterset>
    
    <!-- Operations -->
    <step name="first_step">
      <use>param_set</use> <!-- use existing parameterset -->
      <do>echo $number</do> <!-- shell command -->
    </step>
    
    <!-- Create a dependency between both steps -->
    <step name="second_step" depend="first_step">
      <do>cat first_step/stdout</do> <!-- shell command -->
    </step>    
  </benchmark>
</jube>

In this example we create a dependency between first_step and second_step. After first_step finished, the corresponding second_step will start. Steps can also have multiple dependencies (separated by , in the definition), but circular definitions will not be resolved. A dependency is a unidirectional link!

To communicate between a step and its dependency there is a link inside the work directory pointing to the corresponding dependency step work directory. In this example we use

cat first_step/stdout

to write the stdout-file content of the dependency step into the stdout-file of the current step.

Because the first_step uses a template parameter which creates three execution runs, there will also be three second_step runs each pointing to different first_step-directories:

   stepname | all | open | wait | error | done
------------+-----+------+------+-------+-----
 first_step |   3 |    0 |    0 |     0 |    3
second_step |   3 |    0 |    0 |     0 |    3

Loading files and substitution

Every step runs inside a unique sandbox directory. Usually, you will need to have external files inside this directory (e.g. the source files) and in some cases you want to change a parameter inside the file based on your current parameter space. There are two additional set-types which handle this behaviour inside of JUBE.

The files used for this example can be found inside examples/files_and_sub.

The input file files_and_sub.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="files_and_sub" outpath="bench_run">
    <comment>A file copy and substitution example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <parameter name="number" type="int">1,2,4</parameter>
    </parameterset>
    
    <!-- Files -->
    <fileset name="files">
      <copy>file.in</copy>
    </fileset>
    
    <!-- Substitute -->
    <substituteset name="substitute">
      <!-- Substitute files -->
      <iofile in="file.in" out="file.out" />
      <!-- Substitute commands -->
      <sub source="#NUMBER#" dest="$number" />
    </substituteset>
            
    <!-- Operation -->
    <step name="sub_step">
      <use>param_set</use> <!-- use existing parameterset -->
      <use>files</use>        <!-- use existing fileset -->
      <use>substitute</use>   <!-- use existing substituteset -->
      <do>cat file.out</do>   <!-- shell command -->
    </step>    
  </benchmark>
</jube>

The content of file file.in:

Number: #NUMBER#

Inside the <fileset> the current location (relativly to the current input file; also absolute pathes are allowed) of files is defined. <copy> specifies that the file should be copied to the sandbox directory when the fileset is used. Also a <link> option is available to create a symbolic link to the given file inside the sandbox directory.

If there are additional operations needed to prepare your files (e.g. expand a tar-file). You can use the <prepare>-tag inside your <fileset>.

The <substituteset> describe the substitution process. The <iofile> contains the input and output filename. The path is relative to the sandbox directory. Because we do/should not know that location we use the fileset to copy file.in to this directory.

The <sub> specifies the substitution. All occurrences of source will be substituted by dest. As you can see, you can use parameters inside the substitution.

There is no <use> inside any set. The combination of all sets will be done inside the <step>. So if you use a parameter inside a <sub> you must also add the corresponding <parameterset> inside the <step> where you use the <substituteset>!

In the sub_step we use all available sets. The use order is not relevant. The normal execution process will be:

  1. Parameter space expansion
  2. Copy/link files
  3. Prepare operations
  4. File substitution
  5. Run shell operations

The resulting directory-tree will be:

bench_run               # the given outpath
|
+- 000000               # the benchmark id
   |
   +- configuration.xml # the stored benchmark configuration
   +- workpackages.xml  # workpackage information
   +- 000000_say_hello  # the workpackage ($number = 1)
      |
      +- done           # workpackage finished marker
      +- work           # user sandbox folder
         |
         +- stderr      # standard error messages of used shell commands
         +- stdout      # standard output of used shell commands (Number: 1)
         +- file.in     # the file copy
         +- file.out    # the substituted file
   +- 000001_say_hello  # the workpackage ($number = 2)
      |
      +- ...
   +- ...

Creating a result table

Finally, after running the benchmark, you will get several directories. JUBE allows you to parse your result files distributed over these directories to extract relevant data (e.g. walltime information) and create a result table.

The files used for this example can be found inside examples/result_creation.

The input file result_creation.xml:

<?xml version="1.0" encoding="UTF-8"?>
<jube>
  <benchmark name="result_creation" outpath="bench_run">
    <comment>A result creation example</comment>
    
    <!-- Configuration -->
    <parameterset name="param_set">
      <!-- Create a parameterspace with one template parameter -->
      <parameter name="number" type="int">1,2,4</parameter>
    </parameterset>
    
    <!-- Regex pattern -->
    <patternset name="pattern">
      <pattern name="number_pat" type="int">Number: $jube_pat_int</pattern>
    </patternset>
    
    <!-- Operation -->
    <step name="write_number">
      <use>param_set</use> <!-- use existing parameterset -->
      <do>echo "Number: $number"</do> <!-- shell command -->
    </step>
    
    <!-- Analyse -->
    <analyser name="analyse">
      <use>pattern</use> <!-- use existing patternset -->
      <analyse step="write_number">
        <file>stdout</file> <!-- file which should be scanned -->
      </analyse>
    </analyser>
    
    <!-- Create result table -->
    <result>
      <use>analyse</use> <!-- use existing analyser -->
      <table name="result" style="pretty" sort="number">
        <column>number</column>
        <column>number_pat</column>
      </table>
    </result>
  </benchmark>
</jube>

Using <parameterset> and <step> we create three workpackages. Each writing Number: $number to stdout.

Now we want to parse these stdout files to extract information (in this example case the written number). First of all we have to declare a <patternset>. Here we can describe a set of <pattern>. A <pattern> is a regular expression which will be used to parse your result files and search for a given string. In this example we only have the <pattern> number_pat. The name of the pattern must be unique (based on the usage of the <patternset>). The type is optional. It is used when the extracted data will be sorted. The regular expression can contain other patterns or parameters. The example uses $jube_pat_int which is a JUBE default pattern matching integer values. The pattern can contain a group, given by brackets (...), to declare the extraction part ($jube_pat_int already contains these brackets).

E.g. $jube_pat_int and $jube_pat_fp are defined in the following way:

<pattern name="jube_pat_int" type="int">([+-]?\d+)</pattern>
<pattern name="jube_pat_fp" type="float">([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)</pattern>

If there are multiple matches inside a single file you can add a reduce option. Normally only the first match will be extracted.

To use your <patternset> you have to specify the files which should be parsed. This can be done using the <analyser>. It uses relevant patternsets. Inside the <analyse> a step-name and a file inside this step is given. Every workpackage file combination will create its own result entry.

The analyser automatically knows all parameters which where used in the given step and in depending steps. There is no <use> option to add additonal completely new parametersets.

To run the anlayse you have to write:

>>> jube analyse bench_run

The analyse data will be stored inside the benchmark directory.

The last part is the result table creation. Here you have to use an existing analyser. The <column> contains a pattern or a parameter name. sort is the optional sorting order (separated by ,). The style attribute can be csv or pretty to get different ASCII representations.

To create the result table you have to write:

>>> jube result bench_run -i last

The result table will be written to STDOUT and into a result.dat file inside bench_run/<id>/result. The last can also be replaced by a specific benchmark id. If the id selection is missing a combined result table of all available benchmark runs from the bench_run directory will be created.

Output of the given example:

number | number_pat
-------+-----------
     1 |          1
     2 |          2
     4 |          4

This was the last example of the basic JUBE tutorial. Next you can start the advanced tutorial to get more information about including external sets, jobsystem representation and scripting parameter.