On yesterday’s working group call, there was discussion around how best to report normalization of single measures. This revived previous discussions about the possibility of recording calculation methods (likely separate from protocols) to report with aggregated measures. This was something we discussed previously as being a good idea, but we said we would hold off until version 3. This new conversation, however, sparked renewed interest in this possibility.
An example use case of this is the 7-day moving average. This is an (arbitrary) type of mean used across the field, but there are differences in how it’s calculated - front-weighting, end-weighting, the number of observations used to calculate it, etc. So having a different aggregation
value for each option seems ungainly and inefficient. Using the 7-day moving average aggregation value allows us to group similar measures, however, which having this calculations table would allow for more detailed reporting on how reporters arrived at that number.
The main reasoning for not recording this with lab protocols in the protocols
, protocolRelationships
, and protocolSteps
tables, is that each measure can only be linked to a single protocol ID, and so a separate calculation protocol would be difficult to record. Furthermore, we have received some feedback that there are already a lot of steps in a single protocol ID, making changes, updates, and deviations from the protocol challenging to manage in model administration. If we also added all the calculations to protocols, it is efficient when it works, but becomes ungainly when things change.
What I’m proposing is a calculations
table, with the following headers:
calculationID
: the primary key of the table, used to refer to the calculation.name
: the name of the calculation or series of calculations.summ
: a plain-language summary of the calculation, why it is done, and how.equation
: the mathematical equation/expression being used.lastEdited
: when the field was last edited or updated.notes
: any additional notes on the calculation.
calculationID
would also be added as a header in the measures
table, where it would be a foreign key.
For example:
When recording a single flow- and population-normalized measure of the N1 region of the SARS-CoV-2 virus at the National Microbiology Lab (NML).
The calculations
table looks like this:
calculationID | name | summ | equation | lastEdited | notes |
---|---|---|---|---|---|
nml_floPopNorm | Flow and population normalization at NML | Normalization of the measures levels of viral DNA or RNA in gene copies are normalized and standardized to the flow volume measured in the wastewater treatment that day, and well as to the population served by a wastewater treatment plant in 100,00 people. | (genome copies / day / 100 000 people) = gcmL * 10^9 * flowVol / (population / 100 000) | 24-09-2024 | NA |
and then the measures
table would look like this:
measureRepID | protocolID | sampleID | calculationID | … | measure | value | unit | aggregation | … | notes |
---|---|---|---|---|---|---|---|---|---|---|
measureA | nmlProtocol | sampleA | nml_floPopNorm | … | covN1 | 0.345 | gcDay100k | single | … | lorem ipsum |
Obviously the naming of parts here is entirely in draft, so happy to hear feedback on that. Also very happy to hear feedback on any other recommended part of the structure proposed here.
@Sorin @dmanuel @jeandavidt @NHizon - anxious to hear your thoughts.