ERT Storage Server#
The ERT Storage Server API has two implementations. The PostgreSQL (and optionally Azure Blob Storage) centralised server is implemented by ERT Storage Server while the Block Storage [3] -based is implemented in the ERT repository by the “Dark Storage” [4]
Entities highlighted in green represent concepts that are part of the public API. That is, they’re not just implementation details but concepts that we expose to the user.
- Experiment
A specific ERT configuration, reservoir model and observations.
- Ensemble
A single ERT ensemble evaluation.
- Observation
A named group of observations.
Indirectly connected to ‘response’ records by name.
- Prior
The statistical distribution used to generate the parameters of a given ensemble. The
function
field is really an enum representing the ERT statistical function used. Eg. uniform, std_norm, etc.- ObservationTransformation
ERT might have a workflow that scales observations or deactivates them between ensemble evaluations in an iterative ensemble smoother run.
- Update
The relationship between ensembles can be described with a directed acyclic graph (DAG). In graph theory terms, Ensemble are the nodes and Update are the edges.
- Record
A piece of data generated by either a forward-model or a workflow.
Indirectly connected to an observation by name.
- RecordInfo
A utility for keeping track of record meta information. Helps when eg. validating that all the record with the same name has the same “structure” (ie. numerical vs blob data) across realisations. If realisation 1 for “FOPR” is numeric, then all others are required to be numeric.
- F64Matrix
Container for numeric record data.
- File
Container for blob record data. The “Azure Blob Storage” fields are used when running in conjunction with an Azure Blob Storage instance rather than storing the data in the PostgreSQL database.
- FileBlock
Utility entity for supporting Azure Blob Storage chunked blob uploads.
Experiment#
An experiment represents the concept of a single ERT configuration tied together with the input data (ie, reservoir model, observations, etc). At the time of writing, only observations are saved.
This table also has the required [1] and unique [2] field “name”. This is used by ERT 3. [Note: This field should be removed. There can be many experiments and forcing users to come up with new names is bad UX. This value isn’t used in webviz-ert]
Observation#
An observation represents a real-world sample that has some value and an error.
In ERT, observations are grouped into logical units and given names. For
example, this could be “FOPR” – field oil production rate – which is a
collection of different rate values and errors sampled at different times. Thus,
confusingly, each instance of the Observation
entity in ERT Storage is
actually a collection of observations under a common name. See issue #68 for discussion on this
wording.
Records of “class” response
can optionally be related to this entity, in
which case it’s possible to query those records and get data that is useful in
data assimilation algorithms.
[Note: This ought become a record that is attached to experiments]
Ensemble#
An ensemble represents a single full ensemble evaluation. This object is created when the user presses the “Evaluate” button and ends once all of the ensemble realisations have been completed or had an error. This object is mutable while the evaluation is running and is immutable after it is completed. [Note: There is no way to inform the storage that an ensemble has completed yet]
The ensemble object is meant to be created as late as possible. After the user decides to evaluate and not before. That is, creating an ensemble ahead of time is an anti-pattern and to be avoided. This is due to ERT Storage’s immutability requirement (creating this too early might require us to allow modification of this object) and due to unpredictable object lifetime. If the user can create the ensemble a week ahead of time, for example, then it’ll be impossible for us to determine whether we can automatically remove this object.
It is assumed that ERT has constructed the job graph (ie. which
forward-models/workflow are to be run and in which order) ahead of creating the
Ensemble
entity and it’s therefore possible to specify which Record s
will be of class ‘parameter’ or ‘response’ (ie the inputs and outputs to the
ensemble evalutation).
It is also assumed that the values of parameters are all known at this point,
however, to make our program simple these are uploaded after the Ensemble
entity is created. The alternative would be that this data is submitted together
with JSON data such as ensemble size, but this would risk the uploads being too
large and complicated.
Iterative runs that evaluate multiple ensembles will create multiple ensemble objects that all have the same experiment object as parent. It is also possible to have a sequence of connected ensembles with different parent experiments. This would happen when the user runs an iterative ensemble smoother, changes some experiment inputs (eg. observations) and starts a new ensemble based on data from a previous ensemble.
Record#
A record represents a single piece of data. A record can be “ensemble-wide”, whereby the record is uniquely represented by its parent ensemble and a unique name (ie. a key-value store), or a “forward-model” where in addition to the parent ensemble and name, it has a realisation index.
Additionally, each record has a “type” (‘numeric’ or ‘blob’) and a “class”
(‘none’, ‘parameter’ or ‘response’). A blob record [refered to as ``file`` in
the code] contains binary data and is equivalent to an ERT3 BlobRecord
,
while a numeric record [refered to as ```f64_matrix`` in the code] is similar
but not equivalent to ERT3 NumericalRecord
.
ERT Storage’s version on numeric records are essentially arbitrary dimensional matrices with arbitrary labels on each dimension. The labels are strings and may represent integers, 3D space positions, ISO8601 dates or anything else the user may want.
The record’s class distinguishes the intention behind its creation. If one
thinks of a forward model as a function with input and output, then a
‘parameter’ record is the input, ‘response’ is the output and ‘none’ is any
additional data that is used by other forward-models, or isn’t used by anything
else but is of interest to the user. Because ‘parameter’ and ‘response’
represent data that ERT has to interpret, records defined with these classes
must be of type ‘numeric’. Records with class ‘none’ are the only ones allowed
to be ‘blob’ type. A record’s class is determined by the list of names given to
the Ensemble entity. That is, if the ensemble’s parameter_names
contains
the name "foo"
, then any record with that same name will be assumed to be a
‘parameter’ class.
It’s sometimes possible to access a given record either with a given
realization_index
or get all of the realisations at once. This distinction
is made with “ensemble-wide” vs “forward-model” access. The currently
implemented and buggy access method works for ‘parameter’ records. One can
either POST
parameters with a separate request for each
realization_index
, or it’s possible to POST
a table where each column is
the realization_index
. Likewise, it’s possible to access parameters either
by specifying each realization_index
or fetching all realisations at once
regardless of how this data was submitted. However, there are major issues in
how this feature is designed, and is discussed in issue #128.