Data types¶
In essence, the purpose of ERT is to pass uncertain parameter values to a forward model and then store the resulting outputs. Forward models include all necessary pre-processing and post-processing steps, as well as the computational model (e.g., a physics simulator like ECLIPSE) that produces the predictions. Consequently, ERT must be able to read and write data in a format compatible with the forward model.
Data managed by ERT are organized into distinct data types, each of which will be detailed in this chapter. The data types in ERT can be classified based on two main criteria:
Dynamic behaviour: whether the data type is a static input to the simulator, such as porosity or permeability, or an output of the simulation.
Implementation: this includes the type of files it can read and write, how it is configured, and so forth.
Note: All data types share a common namespace, meaning that each keyword must be globally unique.
Scalar parameters with a template: GEN_KW¶
This section describes the distributions built into ERT that can be used as priors.
For detailed description and examples on how to use the GEN_KW keyword, see here.
The algorithms used for updating parameters expect normally distributed variables. ERT supports other types of distributions by transforming normal variables as outlined next.
ERT samples a random variable
x ~ N(0,1)- before outputing to the forward model this is transformed toy ~ F(Y)where the distributionF(Y)is the correct prior distribution.When the prior simulations are complete ERT calculates misfits between simulated and observed values and updates the parameters; hence the variables
xnow represent samples from a posterior distribution which is Normal with mean and standard deviation different from (0,1).
The transformation prescribed by F(y) still “works” - but it no longer maps
to a distribution in the same family as initially specified by the prior. A
consequence of this is that the update process can not give you a posterior
with updated parameters in the same distribution family as the Prior.
Reproducibility¶
When ERT samples values there is a seed for each parameter. This means that if ERT is started with a fixed RANDOM_SEED each prior that is sampled will be identical. When running without a random seed ERT will output which random seed was used, so it is possible to reproduce results as long as that is kept.
- This section only applies if a fixed seed is used:
If the ensemble size is increased from N -> N+1 the N first realizations will be identical to before
Parameter order is irrelevant
Parameter names are case sensitive, PARAM:MY_PARAM != PARAM:myParam
NORMAL: Gaussian Distribution¶
The NORMAL keyword allows assigning a Gaussian (or normal) prior to a variable.
It requires two arguments: a mean value and a standard deviation.
Syntax¶
VAR NORMAL <mean_value> <standard_deviation>
Parameters¶
<mean_value>: The mean of the normal distribution.
<standard_deviation>: The standard deviation of the normal distribution.
Example¶
For a Gaussian distribution with mean 0 and standard deviation 1 assigned to the variable VAR:
VAR NORMAL 0 1
Notes¶
The NORMAL keyword is integral for scenarios demanding priors that reflect typical real-world data patterns, as the Gaussian distribution is prevalent in many natural phenomena.
LOGNORMAL: Log Normal Distribution¶
The LOGNORMAL keyword is used to assign a log normal prior to a variable. A variable is considered log normally distributed if the logarithm of that variable follows a normal distribution.
The logarithm in question is the natural logarithm. If \(X\) is normally distributed, then \(Y = e^X\) is log normally distributed.
Log normal priors are especially suitable for modeling positive values that exhibit a heavy tail, indicating a tendency for the quantity to occasionally take large values.
Syntax¶
VAR LOGNORMAL <log_mean> <log_standard_deviation>
Parameters¶
<log_mean>: The mean of the logarithm of the variable.
<log_standard_deviation>: The standard deviation of the logarithm of the variable.
Example¶
Histogram from values sampled from a lognormal variable specified with log-mean of 0 and log standard deviation 1.
VAR LOGNORMAL 0 1
TRUNCATED_NORMAL: Truncated Normal Distribution¶
The TRUNCATED_NORMAL keyword is utilized to assign a truncated normal distribution to a variable.
This distribution works as follows:
Draw a random variable \(X \sim N(\mu,\sigma)\).
Clamp \(X\) to the interval [min, max].
Syntax¶
VAR TRUNCATED_NORMAL <mean> <standard_deviation> <min> <max>
Parameters¶
<mean>: The mean of the normal distribution prior to truncation.
<standard_deviation>: The standard deviation of the distribution before truncation.
<min>: The lower truncation limit.
<max>: The upper truncation limit.
Example¶
VAR TRUNCATED_NORMAL 2 0.7 0 4
UNIFORM: Uniform Distribution¶
The UNIFORM keyword is used to assign a uniform distribution to a variable.
A variable is considered uniformly distributed when it has a constant probability density over a closed interval.
Thus, the uniform distribution is fully characterized by it’s minimum and maximum values.
Syntax¶
VAR UNIFORM <min_value> <max_value>
Parameters¶
<min_value>: The lower bound of the uniform distribution.
<max_value>: The upper bound of the uniform distribution.
Example¶
To assign a uniform distribution spanning between 0 and 1 to a variable named VAR:
VAR UNIFORM 0 1
Notes¶
It can be shown that among all distributions bounded below by \(a\) and above by \(b\), the uniform distribution with parameters \(a\) and \(b\) has the maximal entropy (contains the least information).
LOGUNIF: Log Uniform Distribution¶
The LOGUNIF keyword is used to assign a log uniform distribution to a variable.
A variable is said to be log uniformly distributed when its logarithm displays a uniform distribution over a specified interval, [a, b].
Syntax¶
VAR LOGUNIF <min_value> <max_value>
Parameters¶
<min_value>: The lower bound of the log uniform distribution.
<max_value>: The upper bound of the log uniform distribution.
Example¶
To assign a log uniform distribution ranging from 0.00001 to 1 to a variable:
VAR LOGUNIF 0.00001 1
Notes¶
The log uniform dstribution is useful when modeling positve variables that are heavily skewed towards a boundary.
CONST: Dirac Delta Distribution¶
The CONST keyword ensures that a variable always takes a specific, unchanging value.
Syntax¶
VAR CONST <value>
Parameters¶
<value>: The fixed value to be assigned to the variable.
Example¶
To assign a value of 1.0 to a variable:
VAR CONST 1.0
DUNIF: Discrete Uniform Distribution¶
The DUNIF keyword assigns a discrete uniform distribution to a variable over a specified range and number of bins.
Syntax¶
VAR DUNIF <nbins> <min_value> <max_value>
Parameters¶
<nbins>: Number of discrete bins or possible values.
<min_value>: The minimum value in the range.
<max_value>: The maximum value in the range.
Example¶
To create a discrete uniform distribution with possible values of 1, 2, 3, 4, and 5:
VAR DUNIF 5 1 5
Notes¶
Values are derived based on the formula: \(\text{min} + i \times (\text{max} - \text{min}) / (\text{nbins} - 1)\) Where \(i\) ranges from 0 to \(\text{nbins} - 1\).
ERRF: Error Function-Based Prior¶
The ERRF keyword allows creating prior distributions derived from applying the normal CDF (involving the error function) to a standard normal variable.
Note that the CDF is not necessarily the standard normal, as SKEWNESS and WIDTH corresponds to its negative mean and standard deviation respectively.
This allows flexibility in creating distributions of diverse shapes and symmetries.
Syntax¶
VAR8 ERRF MIN MAX SKEWNESS WIDTH
Parameters¶
MIN: The minimum value of the transform.
MAX: The maximum value of the transform.
SKEWNESS: The asymmetry of the distribution.
SKEWNESS < 0: Shifts the distribution towards the left.SKEWNESS = 0: Results in a symmetric distribution.SKEWNESS > 0: Shifts the distribution towards the right.
WIDTH: The peakedness of the distribution.
WIDTH = 1: Generates a uniform distribution.WIDTH > 1: Creates a unimodal, peaked distribution.WIDTH < 1: Forms a bimodal distribution with peaks.
Examples¶
For a symmetric, uniform distribution:
VAR ERRF -1 1 0 1
For a right-skewed, unimodal distribution:
VAR ERRF -1 1 2 1.5
Notes¶
Keep in mind the interactions between the parameters, especially when both SKEWNESS and WIDTH are adjusted.
Their combination can result in a wide range of distribution shapes.
DERRF: Discrete Error Function-Based Distribution¶
The DERRF keyword is a discrete version of the ERRF keyword.
It is designed for creating distributions based on the error function but with discrete output values.
This keyword facilitates sampling from discrete distributions with various shapes and asymmetries.
Syntax¶
VAR DERRF NBINS MIN MAX SKEWNESS WIDTH
Parameters¶
NBINS: The number of discrete bins or possible values.
MIN: The minimum value of the distribution.
MAX: The maximum value of the distribution.
SKEWNESS: The asymmetry of the distribution.
SKEWNESS < 0: Shifts the distribution towards the left.SKEWNESS = 0: Produces a symmetric distribution.SKEWNESS > 0: Shifts the distribution towards the right.
WIDTH: The shape of the distribution.
WIDTH close to zero, for exampe 0.01: Generates a uniform distribution.WIDTH > 1: Leads to a unimodal, peaked distribution.WIDTH < 1: Forms a bimodal distribution with peaks.
Examples¶
For a discrete symmetric, uniform distribution with five bins:
VAR_DERRF1 DERRF 5 -1 1 0 1
For a discrete right-skewed, unimodal distribution with five bins:
VAR_DERRF2 DERRF 5 -1 1 2 1.5
TRIANGULAR: Triangular Distribution¶
The TRIANGULAR keyword is used to define a triangular distribution, which is shaped as a triangle and is determined by three parameters: minimum, mode (peak), and maximum.
Syntax¶
VAR TRIANGULAR MIN MODE MAX
Parameters¶
XMIN: The minimum value of the distribution.
XMODE: The location (value) where the distribution reaches its maximum (or peak).
XMAX: The maximum value of the distribution.
Description¶
The triangular distribution is a continuous probability distribution with a probability density function
that is zero outside the interval [XMIN, XMAX], and is linearly increasing from XMIN to XMODE and decreasing from XMODE to XMAX.
Example¶
To define a triangular distribution with a minimum of 1, mode (peak) of 3, and maximum of 5:
VAR_TRIANGULAR TRIANGULAR 1 3 5
Loading GEN_KW values from an external file¶
The default use of the GEN_KW keyword is to let ert sample
random values for the elements in the GEN_KW instance, but it is also possible
to tell ert to load a precreated set of data files. This can for instance be
used as a component in an experimental design based workflow. When using external
files to initialize the GEN_KW instances you supply an extra keyword
INIT_FILE:/path/to/priors/files%d which tells where the prior files are:
GEN_KW MY-FAULTS MULTFLT.tmpl MULTFLT.INC MULTFLT.txt INIT_FILES:priors/multflt/faults%d
In the example above you must prepare files priors/multflt/faults0, priors/multflt/faults1, … priors/multflt/faultsn which ert will load when you initialize the ensemble. The format of the GEN_KW input files can be of two varieties:
The files can be plain ASCII text files with a list of numbers:
1.25
2.67
The numbers will be assigned to parameters in the order found in the MULTFLT.txt file.
Alternatively values and keywords can be interleaved as in:
FAULT1 1.25
FAULT2 2.56
in this case the ordering can differ in the init files and the parameter file.
The heritage of the ERT program is based on the EnKF algorithm, and the EnKF algorithm evolves around Gaussian variables - internally the GEN_KW variables are assumed to be samples from the N(0,1) distribution, and the distributions specified in the parameters file are based on transformations starting with a N(0,1) distributed variable. The slightly awkward consequence of this is that to let your sampled values pass through ERT unmodified you must configure the distribution NORMAL 0 1 in the parameter file; alternatively if you do not intend to update the GEN_KW variable you can use the distribution RAW.
3D field parameters: FIELD¶
The FIELD keyword is used to parametrize quantities that span the entire grid,
with porosity and permeability being the most common examples.
For detailed description and examples see here.
2D Surface parameters: SURFACE¶
The SURFACE keyword can be used to work with surface from RMS in the irap format. For detailed description and examples see here.
Simulated data¶
The datatypes in the Simulated data chapter correspond to datatypes which are used to load results from a forward model simulation and into ERT. In a model updating workflow instances of these datatypes are compared with observed values and that is used as basis for the update process. Also post processing tasks like plotting and QC is typically based on these data types.
Summary: SUMMARY¶
The SUMMARY keyword is used to configure which summary vectors you want to
load from the (Eclipse) reservoir simulation. In its simplest form, the
SUMMARY keyword just lists the vectors you wish to load. You can have
multiple SUMMARY keywords in your config file, and each keyword can mention
multiple vectors:
SUMMARY WWCT:OP_1 WWCT:OP_2 WWCT:OP_3
SUMMARY FOPT FOPR FWPR
SUMMARY GGPR:NORTH GOPR:SOUTH
If you in the observation use the SUMMARY_OBSERVATION or
HISTORY_OBSERVATION keyword to compare simulations and observations for a
particular summary vector you need to add this vector after SUMMARY in the ERT
configuration to have it plotted.
You can use wildcard notation to all summary vectors matching a pattern, i.e. this:
SUMMARY WWCT*:* WGOR*:*
SUMMARY F*
SUMMARY G*:NORTH
will load the WWCT and WWCTH, as well as WGOR and WGORH vectors
for all wells, all field related vectors and all group vectors from the NORTH
group.
General data: GEN_DATA¶
The GEN_DATA keyword is used to load text files which have been generated
by the forward model.
For detailed description and examples see here.
EnKF heritage¶
With regards to the datatypes in ERT this is a part of the application where the EnKF heritage shows through quite clearly, the datetypes offered by ERT would probably be different if ERT was made for Ensemble Smoother from the outset. Pecularites of EnKF heritage include:
The FIELD implementation can behave both as a dynamic quantity, i.e. pressure and saturation, and static property like porosity. In ERT it is currently only used as a parameter.
The parameter types have an internal pseudo time dependence corresponding to the “update time” induced by the EnKF scheme. This pseudo time dependence is not directly exposed to the user, but it is still part of the implementation and e.g. when writing plugins which work with parameter data managed by ERT you must relate to it.
The time dependence of the GEN_DATA implementation. This is just too complex, there have been numerous problems with people who configure the GEN_DATA keywords incorrectly.