Complete workflows

A workflow is a list of calls to workflow jobs, with additional arguments. The job name should be the first element on each line. Based on the two jobs PLOT and ECL_HIST we can create a small workflow example:

PLOT      WWCT:OP_1   WWCT:OP_3  PRESSURE:10,10,10
PLOT      FGPT        FOPT
ECL_HIST  <RUNPATH_FILE>   <QC_PATH>/<ERTCASE>/wwct_hist   WWCT:OP_1  WWCT:OP_2

In this workflow we create plots of the nodes WWCT : OP_1, WWCT : OP_3, PRESSURE:10,10,10, FGPT and FOPT. The plot job we have created in this example is general, if we limited ourselves to ECLIPSE summary variables we could get wildcard support. Then we invoke the ECL_HIST example job to create a histogram. See documentation of RUNPATH_FILE and ERTCASE.

DEFINE usage in workflows

Variables within workflows can be defined using the DEFINE keyword. If a DEFINE is already set in the ert config, and then re-specified within a workflow, the define within the workflow will overshadow the DEFINE from the ERT config. A DEFINE within the workflow will set the value of that variable only within the scope of the workflow, but not alter its value outside of the workflow.

Loading workflows

Workflows are loaded with the configuration option LOAD_WORKFLOW:

LOAD_WORKFLOW  /path/to/workflow/WFLOW1
LOAD_WORKFLOW  /path/to/workflow/workflow2  WFLOW2

The LOAD_WORKFLOW takes the path to a workflow file as the first argument. By default the workflow will be labeled with the filename internally in ERT, but you can optionally supply a second extra argument which will be used as the name for the workflow. Alternatively, you can load a workflow interactively.

Automatically run workflows

With the keyword HOOK_WORKFLOW you can configure workflow ‘hooks’; meaning workflows which will be run automatically at certain points during ERTs execution. Currently there are five points in ERTs flow of execution where you can hook in a workflow:

  • Before the experiment starts using PRE_EXPERIMENT

  • before the simulations (all forward models for a realization) start using PRE_SIMULATION,

  • after all the simulations have completed using POST_SIMULATION,

  • before the update step using PRE_UPDATE

  • after the update step using POST_UPDATE and

  • only before the first update using PRE_FIRST_UPDATE.

  • after the experiment has completed using POST_EXPERIMENT

For non-iterative algorithms, PRE_FIRST_UPDATE is equal to PRE_UPDATE. The POST_SIMULATION hook is typically used to trigger QC workflows.

HOOK_WORKFLOW preExperimentWFLOW        PRE_EXPERIMENT
HOOK_WORKFLOW initWFLOW                 PRE_SIMULATION
HOOK_WORKFLOW preUpdateWFLOW            PRE_UPDATE
HOOK_WORKFLOW postUpdateWFLOW           POST_UPDATE
HOOK_WORKFLOW QC_WFLOW1                 POST_SIMULATION
HOOK_WORKFLOW QC_WFLOW2                 POST_SIMULATION
HOOK_WORKFLOW postExperimentWFLOW       POST_EXPERIMENT

In this example the workflow, preExperimentWFLOW will run, then initWFLOW will run at the start of every iteration, when simulation directories have been created, just before the forward model is submitted to the queue. The workflow preUpdateWFLOW will be run before the update step and postUpdateWFLOW will be run after the update step. At the end of each forward model run, the two workflows QC_WFLOW1 and QC_WFLOW2 will be run. After all iterations are complete, the postExperimentWFLOW will run.

Observe that the workflows being ‘hooked in’ with the HOOK_WORKFLOW must be loaded with the LOAD_WORKFLOW keyword.

Locating the realisations: <RUNPATH_FILE>

Context must be passed between the main ERT process and the script through the use of string substitution, in particular the ‘magic’ key <RUNPATH_FILE> has been introduced for this purpose.

Many of the external workflow jobs involve looping over all the realisations in a construction like this:

for each realisation:
    // Do something for realisation
summarize()

When running an external job in a workflow there is no direct transfer of information between the main ERT process and the external script. We therefore must have a convention for transferring the information of which realisations we have simulated on, and where they are located in the filesystem. This is done through a file which looks like this:

0   /path/to/real0  CASE_0000
1   /path/to/real1  CASE_0001
...
9   /path/to/real9  CASE_0009

The name and location of this file is available as the magical string <RUNPATH_FILE> which is typically used as the first argument to external workflow jobs which should iterate over all realisations. The realisations referred to in the <RUNPATH_FILE> should be the last simulations you have run. The file is updated every time you run simulations. This implies that it is (currently) not so convenient to alter which directories should be used when running a workflow.