ERT Storage Server#

The ERT Storage Server API has two implementations. The PostgreSQL (and optionally Azure Blob Storage) centralised server is implemented by ERT Storage Server while the Block Storage [3] -based is implemented in the ERT repository by the “Dark Storage” [4]

Entities highlighted in green represent concepts that are part of the public API. That is, they’re not just implementation details but concepts that we expose to the user.

Experiment

A specific ERT configuration, reservoir model and observations.

Ensemble

A single ERT ensemble evaluation.

Observation

A named group of observations.

Indirectly connected to ‘response’ records by name.

Prior

The statistical distribution used to generate the parameters of a given ensemble. The function field is really an enum representing the ERT statistical function used. Eg. uniform, std_norm, etc.

ObservationTransformation

ERT might have a workflow that scales observations or deactivates them between ensemble evaluations in an iterative ensemble smoother run.

Update

The relationship between ensembles can be described with a directed acyclic graph (DAG). In graph theory terms, Ensemble are the nodes and Update are the edges.

Record

A piece of data generated by either a forward-model or a workflow.

Indirectly connected to an observation by name.

RecordInfo

A utility for keeping track of record meta information. Helps when eg. validating that all the record with the same name has the same “structure” (ie. numerical vs blob data) across realisations. If realisation 1 for “FOPR” is numeric, then all others are required to be numeric.

F64Matrix

Container for numeric record data.

File

Container for blob record data. The “Azure Blob Storage” fields are used when running in conjunction with an Azure Blob Storage instance rather than storing the data in the PostgreSQL database.

FileBlock

Utility entity for supporting Azure Blob Storage chunked blob uploads.

Experiment#

An experiment represents the concept of a single ERT configuration tied together with the input data (ie, reservoir model, observations, etc). At the time of writing, only observations are saved.

This table also has the required [1] and unique [2] field “name”. This is used by ERT 3. [Note: This field should be removed. There can be many experiments and forcing users to come up with new names is bad UX. This value isn’t used in webviz-ert]

Observation#

An observation represents a real-world sample that has some value and an error. In ERT, observations are grouped into logical units and given names. For example, this could be “FOPR” – field oil production rate – which is a collection of different rate values and errors sampled at different times. Thus, confusingly, each instance of the Observation entity in ERT Storage is actually a collection of observations under a common name. See issue #68 for discussion on this wording.

Records of “class” response can optionally be related to this entity, in which case it’s possible to query those records and get data that is useful in data assimilation algorithms.

[Note: This ought become a record that is attached to experiments]

Ensemble#

An ensemble represents a single full ensemble evaluation. This object is created when the user presses the “Evaluate” button and ends once all of the ensemble realisations have been completed or had an error. This object is mutable while the evaluation is running and is immutable after it is completed. [Note: There is no way to inform the storage that an ensemble has completed yet]

The ensemble object is meant to be created as late as possible. After the user decides to evaluate and not before. That is, creating an ensemble ahead of time is an anti-pattern and to be avoided. This is due to ERT Storage’s immutability requirement (creating this too early might require us to allow modification of this object) and due to unpredictable object lifetime. If the user can create the ensemble a week ahead of time, for example, then it’ll be impossible for us to determine whether we can automatically remove this object.

It is assumed that ERT has constructed the job graph (ie. which forward-models/workflow are to be run and in which order) ahead of creating the Ensemble entity and it’s therefore possible to specify which Record s will be of class ‘parameter’ or ‘response’ (ie the inputs and outputs to the ensemble evalutation).

It is also assumed that the values of parameters are all known at this point, however, to make our program simple these are uploaded after the Ensemble entity is created. The alternative would be that this data is submitted together with JSON data such as ensemble size, but this would risk the uploads being too large and complicated.

Iterative runs that evaluate multiple ensembles will create multiple ensemble objects that all have the same experiment object as parent. It is also possible to have a sequence of connected ensembles with different parent experiments. This would happen when the user runs an iterative ensemble smoother, changes some experiment inputs (eg. observations) and starts a new ensemble based on data from a previous ensemble.

Record#

A record represents a single piece of data. A record can be “ensemble-wide”, whereby the record is uniquely represented by its parent ensemble and a unique name (ie. a key-value store), or a “forward-model” where in addition to the parent ensemble and name, it has a realisation index.

Additionally, each record has a “type” (‘numeric’ or ‘blob’) and a “class” (‘none’, ‘parameter’ or ‘response’). A blob record [refered to as ``file`` in the code] contains binary data and is equivalent to an ERT3 BlobRecord, while a numeric record [refered to as ```f64_matrix`` in the code] is similar but not equivalent to ERT3 NumericalRecord.

ERT Storage’s version on numeric records are essentially arbitrary dimensional matrices with arbitrary labels on each dimension. The labels are strings and may represent integers, 3D space positions, ISO8601 dates or anything else the user may want.

The record’s class distinguishes the intention behind its creation. If one thinks of a forward model as a function with input and output, then a ‘parameter’ record is the input, ‘response’ is the output and ‘none’ is any additional data that is used by other forward-models, or isn’t used by anything else but is of interest to the user. Because ‘parameter’ and ‘response’ represent data that ERT has to interpret, records defined with these classes must be of type ‘numeric’. Records with class ‘none’ are the only ones allowed to be ‘blob’ type. A record’s class is determined by the list of names given to the Ensemble entity. That is, if the ensemble’s parameter_names contains the name "foo", then any record with that same name will be assumed to be a ‘parameter’ class.

It’s sometimes possible to access a given record either with a given realization_index or get all of the realisations at once. This distinction is made with “ensemble-wide” vs “forward-model” access. The currently implemented and buggy access method works for ‘parameter’ records. One can either POST parameters with a separate request for each realization_index, or it’s possible to POST a table where each column is the realization_index. Likewise, it’s possible to access parameters either by specifying each realization_index or fetching all realisations at once regardless of how this data was submitted. However, there are major issues in how this feature is designed, and is discussed in issue #128.