Protocol Relationship: V1 <--> V2

@martinwellman has been working hard on harnessing some of the existing infrastructure for linkml and their “linkml-map” library to do mapping from ODM to and from other data models. It has been quite successful, and has required some creative coding.

Right now there is an issue in mapping between version 1 and version 2 where certain IDs are not found in the same tables, and so mapping them to the final ID can be troublesome.

For example:

  • V2 Instruments table:
    PKs/IDs are:

    • instrumentID
    • datasetID
    • contactID
    • organizationID.
  • V1 Instruments table:
    PKs/IDs are:

    • instrumentID

In version 1, samples and measures have an instrumentID as a header, but in version 2, samples and measures don’t have an instrumentID - only via protocolSteps. So to link the version 1 instrument ID to samples and measures, we’d need to link by following instruments → protocolSteps → protocolRelationships → protocols → samples or measures.

Except there are no rules at present for how to populate protocolRelationships when mapping from ODM v1 to v2.

This is because assayMethods (protocols in v1) didn’t use this same structure, and protocolRelationships are quire specific. I can think of two ways to potentially approach this though:

  1. We set a rough standard of what assay methods in version 1 would follow one another and then enforce that sort of relationship upon data that is mapped into version 2. This would create a rough road map of things and conform to a V2 structure, but this enforced mapping might not be true in all cases, and so could potentially introduce erroneous data.
  2. We create a new relationship ID called “not-reported”. This allows us to link things together without necessarily creating a more detailed relationship structure where there isn’t one. it also means a blank protocol step could be linked as “not-reported” and then be linked to protocols, and then to measures and samples.

I’m not sure if this is fully clear to folks, but I would be keen to hear people’s thoughts. @dmanuel @jeandavidt @Sorin

I think I prefer option 2! Given how most ODM1 data was collected at a time when assays were not really standardized, it feel wrong to enforce a structure post-hoc.

I agree with @jeandavidt on this one - you can always tell which option I think is best because I present it last, haha. Any additional thoughts on this one, @dmanuel @Sorin @NHizon ?

I agree, option 2 makes no assumptions of the data and it was the Wild West.

1 Like

I don’t think there were many people that used instrumentID in version 1. So, it is probably OK to say that it is a breaking change and we can’t map, or put this aside for later.

That said, addressing later could be even harder – if we need to do that – and option 2 is a thumbs up from me.

1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.