Another discussion led us to finalizing the above headers for the calculations table. To repeat them here, the final headers are:
calculationID
: - varchar - mandatory - the PK for this table, a CK of container ID and step ID. Formerlyprovisional PK (CK)
pipelineID
: -varchar - mandatory - linked to a single measure (may be used multiple times) is the shorthand for the data transformation pipeline used. FormerlycontainerID
, a data pipeline identifiertreatmentID
: - varchar - mandatory - an ID for a single calculation/data treatment, a single data treatment identifier. FormerlystepID
.name
: - varchar - optional - free text human-readable name of a single calculation/data treatment.summary
: - varchar - optional - free text human-readable summary of a single calculation/data treatment. This should also explain terms used in the equation field.calcType
: - categorical - recommended - explain the purpose/nature of a single calculation/data treatment. Possible inputs are:normalization
,standardization
,smoothing
, andpredictiveModelling
Formerlypurpose
.standard
: - varchar - mandatoryIf - field where one can categorically record the standard to which something is being standardized (ie. PMMoV, Crassphage, Flow, etc.) or smoothed (ie. bayesian smoothing, central average smoothing, 7-days, time, etc.).order
: - integer - recommended - the order a single calculation/data treatment takes within the full container pipeline/workflow.equation
: - varchar - recommended - the equation used in the calculation/data treatment.reflink
: - varchar - recommended - a reference link to the calculation/data treatment.sourceCode
: - varchar - recommended - the source code for the calculation/data treatment, more applicable for algorithms/more complex steps. It is possible to record the full code as text, to record a URL to where code is stored. Likely a different URL than therefLink
field.lastEditted
: - dateTime - optional - when the row was last updated/changed.notes
: - varchar - optional - free text - notes on the row and the the calculation/data treatment.
There was some discussion around the standard
field and naming, but it was decided that because value
has a rather specific meaning in the measures and methods context in ODM, it would be inconsistent to use it here. Given that that field will mostly (we anticipate) be used for reporting standardization details, we opted to use that name.
Additional ODM infrastructure to be added, beyond this new table and its fields:
- A category set for the
calcType
field, with the options ofnormalization
,standardization
,smoothing
, andpredictiveModels
. - Conditional sets for the
standard
field, depending on what the input value forcalcType
is. - A new field in the
measures
table calledvalTreat
or “value treatment”, which specifies what kinds of data treatments or the nature of the measurement value. - A category set for the
valTreat
field, with the options ofraw
,derived
,estimate
, andpredicted
. - Adding
calculationID
as a FK to the measures table. - Add more generic units to the ODM so that the unit/aggregation and standardization metadata can begin to be stored separately. ie. pmmov standardized mean → standardized mean + calcType standardized, PMMoV standard
This last bullet factors into discussion about unit and aggregation metadata, and charts a path toward eventually splitting them. Cognizant that this may require some changes to the wide names rules for measures. Calculations as a table may already require some additional attention on this point.