Hierarchy for enumerations

In ODM v2/v3 there is currently no hierarchy for the values that a categorical/enum slot can take on. Below is an example of a hierarchy for the enum values in the “collection device” slot in PHA4GE, which is categorical:

Grab sampler
      Core sampling device
      Vacuum sludge sampling device
      Cone-shaped sampling device
      Horizontal grab sampling device
      Vertical grab sampling device
Composite sampler
      Passive (trap) sampler
            Moore swab
      Automatic composite sampler
            Automatic flow-proportional sampler
            Automatic sequential (time-proportional) sampler
Bag filtration device

For data entry purposes, some people will enter the full hierarchy for certain slots (we discussed this with the PHA4GE people, who said that this is sometimes done). For example, instead of entering just ‘Moore swab’, some people might enter 3 values to specify the full hierarchy: [‘Composite sampler’, ‘Passive (trap) sampler’, ‘Moore swab’]. In this example we would typically want to convert it to a single value by removing the values higher up in the hierarchy, so that we only have ‘Moore swab’ (since in ODM we only allow single-valued slots, instead of multivalued slots).

For the PHES-ODM-Mapper I’ve written code to only keep the deepest enum values in the hierarchy. In other words, if any value in the list of values for the slot is a parent of one of the other values, then it gets removed (in the above case we just keep ‘Moore swab’ from the list of 3 values).

There might be other reasons to include a hierarchy, such as for data entry purposes to select from a pick-list, a hierarchy might make it easier for the user.

Having a hierarchy is supported in LinkML schemas. For an enum value the is_a attribute can be specified for the value. This is similar to specifying that the enum value is a child of another value. For example, Moore swab would have an is_a value of Passive (trap) sampler and Passive (trap) sampler would have an is_a value of Composite sampler.

Adding a hierarchy to ODM should not break anything. I think it would be useful to discuss if we should include this type of hierarchy in ODM.

Martin

Thanks, Martin, for bringing this up.

I think the structure is really interesting, and can be useful, particularly for a data entry and pick list structure. I do think that the pick lists we do have are relatively short, and that the items on them specific enough that I don’t know that the work to set up this structure would be worth the return on investment.

The other use case that I can see is in data analysis where we could collapse down values like Cone-shaped sampling device, Horizontal grab sampling device, and Vertical grab sampling device into one larger Grab sampler category for analysis. Which would be easier with the hierarchical enums, but not impossible with our current structure.

Also if the mapping is able to just pull the last ennum value, it sounds like there’s not larger structural need for it from on interoperability standpoint… but Im curious to hear @dmanuel and @jeandavidt 's thoughts, if they have any!