Calculations, Normalizations, and Math Protocols

Another discussion led us to finalizing the above headers for the calculations table. To repeat them here, the final headers are:

  • calculationID: - varchar - mandatory - the PK for this table, a CK of container ID and step ID. Formerly provisional PK (CK)
  • pipelineID: -varchar - mandatory - linked to a single measure (may be used multiple times) is the shorthand for the data transformation pipeline used. Formerly containerID, a data pipeline identifier
  • treatmentID: - varchar - mandatory - an ID for a single calculation/data treatment, a single data treatment identifier. Formerly stepID.
  • name: - varchar - optional - free text human-readable name of a single calculation/data treatment.
  • summary: - varchar - optional - free text human-readable summary of a single calculation/data treatment. This should also explain terms used in the equation field.
  • calcType: - categorical - recommended - explain the purpose/nature of a single calculation/data treatment. Possible inputs are: normalization, standardization, smoothing, and predictiveModelling Formerly purpose.
  • standard: - varchar - mandatoryIf - field where one can categorically record the standard to which something is being standardized (ie. PMMoV, Crassphage, Flow, etc.) or smoothed (ie. bayesian smoothing, central average smoothing, 7-days, time, etc.).
  • order: - integer - recommended - the order a single calculation/data treatment takes within the full container pipeline/workflow.
  • equation: - varchar - recommended - the equation used in the calculation/data treatment.
  • reflink: - varchar - recommended - a reference link to the calculation/data treatment.
  • sourceCode: - varchar - recommended - the source code for the calculation/data treatment, more applicable for algorithms/more complex steps. It is possible to record the full code as text, to record a URL to where code is stored. Likely a different URL than the refLink field.
  • lastEditted: - dateTime - optional - when the row was last updated/changed.
  • notes: - varchar - optional - free text - notes on the row and the the calculation/data treatment.

There was some discussion around the standard field and naming, but it was decided that because value has a rather specific meaning in the measures and methods context in ODM, it would be inconsistent to use it here. Given that that field will mostly (we anticipate) be used for reporting standardization details, we opted to use that name.

Additional ODM infrastructure to be added, beyond this new table and its fields:

  • A category set for the calcType field, with the options of normalization, standardization, smoothing, and predictiveModels.
  • Conditional sets for the standard field, depending on what the input value for calcType is.
  • A new field in the measures table called valTreat or “value treatment”, which specifies what kinds of data treatments or the nature of the measurement value.
  • A category set for the valTreat field, with the options of raw, derived, estimate, and predicted.
  • Adding calculationID as a FK to the measures table.
  • Add more generic units to the ODM so that the unit/aggregation and standardization metadata can begin to be stored separately. ie. pmmov standardized mean → standardized mean + calcType standardized, PMMoV standard

This last bullet factors into discussion about unit and aggregation metadata, and charts a path toward eventually splitting them. Cognizant that this may require some changes to the wide names rules for measures. Calculations as a table may already require some additional attention on this point.