NWSS to ODM Mapping - Issues & Questions

I have a good chunk of the work and the planning now done for how mapping would be done from NWSS to ODM, though acquiring NWSS-formatted data to test this is still on my to-do list.

With that said, here is a a (non-exhaustive) summary of of the challenges/decisions that need addressed before we can move forward:

  1. hum_frac_target_chem & hum_frac_target_mic - severity: low:
    In NWSS they have different fields for chemical normalization targets and microbial normalization targets. In the ODM, all of these would just be captured as a measure of the given target. Do we want to try and capture the microbial vs. chemical distinction? My instinct is to say no.

  2. other_norm_name - severity: low:
    This field seems to be used in instances where two targets are used for normalization, so this second target can be specified here. This is still (I think) just a measure as with 1, and this structure is not important to maintain. Though a “measure report ID” will need to be programmatically generated to link these measures together.

  3. vs_mic_chem_units - severity: medium:
    The units used (copies per litre, milligrams per litre, milligrams per gram, etc.), however NWSS also includes 10 copies/L wastewater, etc. Do we want to transform these log values into a normal format? They also specify units in log10 copies/g dry sludge micrograms/L wastewater, and
    log10 micrograms/L wastewater etc. Do we want to capture these materials in our units as well? Or should this go elsewhere? Please note that the sample material is still recorded in a different column, and is not dry/wet, etc.

  4. num_no_target_control - severity: low:
    This asks users to specify the number of NTCs (non-template controls) run on a plate. I think this can be captured as a new measure, but again would need to be connected to other items via a machine-generated measure report ID.

  5. MHV (PREvalence), BCoV (GT-Digital) - severity: low:
    In the list of spike targets murine hepatitis virus and bovine coronavirus (MHV and BCoV, respectively) are already listed, but these two additional and tweaker categories are also included in the list. It’s not clear what the difference is between MHV and MHV (PREvalence) is, but do we think this differentiation is worth preserving?

  6. vs_reporting_jurisdiction & vs_wwtp_jurisdiction - severity: low:
    I think these would both be organization IDs, but the reporting jurisdiction is the sites table’s repOrg1, and the wwtp jurisdiction is the sites table orgID. Are folks in agreement with that mapping?

  7. vs_sample_type - severity: medium:
    For sample type, NWSS has a 195 categories that are actually all just grab samples, passive samples, manual composite samples, time-weighted composite samples, and flow-weighted composite samples - just with different hourly length options. In ODM, the type of sample would be in collType and the period in hours would be recorded in collPer.

@dmanuel @jeandavidt @ysequeira

ODM and NWSS take a similar approach; dictionaries and models need to keep harmonized and learn from each other.

Specific comments below:
1 - I, too, am not clear on the distinction. My instinct is this is a low priority issue, but we can reach out to NWSS folks to ask for clarity.

2 - It is not uncommon to have normalization consider two targets—for example, PPMoV and flow. When we identify these more complex normalization methods, let’s add these as new units. Sometimes, the more complex normalization approaches require a description of the calculations, which should be encouraged through protocols.

We’ve tried to reduce the use of ‘other’, and I am inclined not to introduce other units or aggregations. If we did, we could use the approach that was adopted for other mutations or organisms.

You are correct that a better approach is to record normalization metrics separately, but NWSS and ODM have tried to accommodate a range of units and aggregations.

3- There are a few considerations in this point.
3.1 - I suggest adding ‘dry sludge’ as a material to the ODM. That is easy.
3.2 - I can see the merit of log10 as an unit. I’d say to add them. I sometimes wonder if dictionaries should report numerators and denominators as separate unit attributes, but I’ve never seen it done and ODM’s ship has sailed on this topic.
3.3 - Whether to add log10 copies/g dry sludge or L wastewater, rather than log10 copies/g is an interesting question. We currently are non-specific what g or L references. The lack of specificity is an issue, however, I suggest that we clarify in the instructions that substance refers. We need to be careful with the distinction between the fraction analysed and the sample material.

4- I agree. However, we could also add num_no_target_control as a new measure, as well. We have several situations where we have both an easy-to-fill summary measure and the provision for more details. For example, reportable and quality measures. I suggest we do the same here and add this new measure. regardless, this is a minor issue, as you stated.

5- I think it is worth preserving. If for no other reason than maintaining harmony with NWSS.

6- In ODM, reporting jurisdiction (vs_reporting_jurisdiction is identified by site_reportOrg1 and _reportOrg2. I interpret vs_wwtp_jurisdiction1 as relating to the polygons_organizationID. If so, we have a nice mapping.

7- That is right. We designed those attributes to encompass all 195 categories (and more). We should be good to go.

  1. hum_frac_target_chem & hum_frac_target_mic - severity: low:
    I agree with you, Mathew. I think normalization targets are just measures, and the idea that these measures are used for normalization probably fits better inside a protocol than in the measure report.

  2. other_norm_name - severity: low:
    Agreed on the automatic ID generation (though I think you might have meant measure set ID?)

  3. vs_mic_chem_units - severity: medium:
    I agree with Doug on adding log10 units. Also agree that the wastewater or dry sludge component of the unit is linked to the sample material.

  4. num_no_target_control - severity: low:
    Agreed that it could be a measure - but it could also be part of the protocol, no?

  5. MHV (PREvalence), BCoV (GT-Digital) - severity: low:
    The least we can say is that they thought the distinction was worth reporting. It’d be good to ask more details so we can decide for ourselves.

  6. vs_reporting_jurisdiction & vs_wwtp_jurisdiction - severity: low:
    Funny that the three of us institively picked 3 different mappings. For me, reporting_jurisdiction is the sample and measure report’s orgIDs, and the wwtp jusrisdiction is the site_reportOrg1. But I can see arguments on either side.

  7. vs_sample_type - severity: medium:
    Agreed with both of you :slight_smile:

Thanks for the thoughtful replies, Folks. Overall I think it sounds like these are all sorted then, though with one exception.

  1. We will add these as measures, easy fix. Likely measures for a protocol steps table.

  2. Keep just as measures, with the need for an automatic measure set ID generation (thanks for the correction, JD). I hear you too, Doug, but this is separate from a units issue, I think, because there are no reported aggregations or units that mention normalization. Maybe something we can look at more closely though once we get some example data.

  3. Okay, we’ll add the log10 units, another easy fix. This item is still outstanding though. While I agree that intuitively you would think that dry sludge, wet sludge, etc. is something captured in sample material instead. But actually, nwss also has a separate sample material field that is similar to ours, and these various sludge types aren’t mentioned. a look into dry sludge, etc. seems to show that they’re like a secondary sample type? Because the sample was collected as one type, but was then baked in an oven to make the dry sludge that the analysis was run on. And I’m not quite sure how to capture that detail.

  4. Sounds good - I’ll add num_no_target_control as a measure, but it will be one for protocol steps, by the sounds of it. Unless I misunderstood you, Doug, and you think it should be added twice, once as a measure and once as a method?

  5. Okay, an easy add then. I don’t have a close enough contact at NWSS to really follow up on it any deeper, unfortunately.

  6. Okay, in their metadata NWSS says that the wwtp_jurisdiction is “State, DC, US territory, or Freely Associated State jurisdiction name (2-letter abbreviation) in which the wastewater treatment plant provided in ‘wwtp_name’ is located” and the reporting_jurisdiction is “The CDC Epidemiology and Laboratory Capacity (ELC) jurisdiction, most frequently a state, reporting these data (2-letter abbreviation)”. Does that change people’s interpretations at all?

  7. good to go!

Let me know your further thoughts on 3 and 6 when you get a chance. Thanks again!