Another discussion led us to finalizing the above headers for the calculations table. To repeat them here, the final headers are:
calculationID: - varchar - mandatory - the PK for this table, a CK of container ID and step ID. Formerlyprovisional PK (CK)pipelineID: -varchar - mandatory - linked to a single measure (may be used multiple times) is the shorthand for the data transformation pipeline used. FormerlycontainerID, a data pipeline identifiertreatmentID: - varchar - mandatory - an ID for a single calculation/data treatment, a single data treatment identifier. FormerlystepID.name: - varchar - optional - free text human-readable name of a single calculation/data treatment.summary: - varchar - optional - free text human-readable summary of a single calculation/data treatment. This should also explain terms used in the equation field.calcType: - categorical - recommended - explain the purpose/nature of a single calculation/data treatment. Possible inputs are:normalization,standardization,smoothing, andpredictiveModellingFormerlypurpose.standard: - varchar - mandatoryIf - field where one can categorically record the standard to which something is being standardized (ie. PMMoV, Crassphage, Flow, etc.) or smoothed (ie. bayesian smoothing, central average smoothing, 7-days, time, etc.).order: - integer - recommended - the order a single calculation/data treatment takes within the full container pipeline/workflow.equation: - varchar - recommended - the equation used in the calculation/data treatment.reflink: - varchar - recommended - a reference link to the calculation/data treatment.sourceCode: - varchar - recommended - the source code for the calculation/data treatment, more applicable for algorithms/more complex steps. It is possible to record the full code as text, to record a URL to where code is stored. Likely a different URL than therefLinkfield.lastEditted: - dateTime - optional - when the row was last updated/changed.notes: - varchar - optional - free text - notes on the row and the the calculation/data treatment.
There was some discussion around the standard field and naming, but it was decided that because value has a rather specific meaning in the measures and methods context in ODM, it would be inconsistent to use it here. Given that that field will mostly (we anticipate) be used for reporting standardization details, we opted to use that name.
Additional ODM infrastructure to be added, beyond this new table and its fields:
- A category set for the
calcTypefield, with the options ofnormalization,standardization,smoothing, andpredictiveModels. - Conditional sets for the
standardfield, depending on what the input value forcalcTypeis. - A new field in the
measurestable calledvalTreator “value treatment”, which specifies what kinds of data treatments or the nature of the measurement value. - A category set for the
valTreatfield, with the options ofraw,derived,estimate, andpredicted. - Adding
calculationIDas a FK to the measures table. - Add more generic units to the ODM so that the unit/aggregation and standardization metadata can begin to be stored separately. ie. pmmov standardized mean → standardized mean + calcType standardized, PMMoV standard
This last bullet factors into discussion about unit and aggregation metadata, and charts a path toward eventually splitting them. Cognizant that this may require some changes to the wide names rules for measures. Calculations as a table may already require some additional attention on this point.