Tidying Groups/Classes in line with Organism Information

Carrying some discussion that started over on this post.

It sounds like we might want to keep organism information exclusively in groups. If so, however, we would need to tidy up some measures and classes, because right now organism information (ie.: SARS-CoV-2, RSV, Influenza, etc.) is spread across groups and classes, and even measurements. We also discussed the possibility - once cleaned up - of adding the 70-100 organisms that are currently detectable in wastewater (as per a systematic review) as groups, and then asking people to request new ones to add as needed.

We like the idea of keeping this information in groups, because groups are then something like “organisms++” where we have organism information and/or physical measurement information for things like site details, water quality, etc.

@dmanuel - you mentioned a work-around using classes, could you elaborate more?

Currently, here are the groups in ODM:

The classes in ODM:

And a selection of measurements that are also organisms:

I think that there are valid reasons why this information is kind of scattered across levels - as a measure, these organisms correspond to a binary “yes/no” or “detected/undetected” kind of testing value, which I think makes sense. This is also why I kind of thought that organism might be best mapped to measurements initially. But when looking at variants, mutations, or specific gene-regions, it makes a lot more sense to have this information in the groups column as well, with classes helping to delineate whether we’re talking about variants, mutations, etc.

Curious to hear other people’s impressions, and thoughts about how best to tidy these up. @Sorin @dmanuel @jeandavidt

Dears,

I think that having group and parent group would allow for better maintenance of organisms mutations/variants, as it would allow for deeper hierarchies.

Contemplate for a while capturing this in ODM:

Source: https://coronavirusexplained.ukri.org/en/article/und0001/

Kind regards,

Sorin

1 Like

As Mathew stated, we need some tidying, but the overall structure looks good. @sorin does the following meet your needs? I don’t think we need or want to recreate linage maps, like photo you sent, because that information is available elsewhere. We do, however, need to represent the data points on those linage maps.

Suggestions:

  1. Add ‘organism’ to class
  2. Add the known list of organisms tested in WW to the list (about 70, I’ll separately send a list). There will be more organisms for environmental testing, but the WW list is a good start.
  3. Have NCBI as our main reference link/taxonomy. We may already be doing this. We are currently using the NCI for ontology links - that is an ontology that includes NCBI and a few other agencies. The link to NCI has the NCBI reference ID. PHA4GE and NCBI have been trying to use the NCBI listing, but folks in the related fields use different taxonomies (and there are alternative taxonomies in the OBO ontology).

We could use the approach that we have for SARS-CoV-2 for all organisms (meaning tidy up and be consistent), but with the addition of adding class = organism when we define the organism using ‘group’. That would be a non-breaking change that makes it a bit easier to create list and aligns the “organism” terminology used by PHA$GE.

There are currently about 100 SARS-CoV-2 measures. I’ve taken an example from each class.

partID partLabel partType domain group class nomenclature ontologyRef
sarsCov2 SARS-CoV-2 groups bio sarsCov2 currently ‘naClass’ propose to change to ‘organism’ naNomenclature http://purl.obolibrary.org/obo/NCIT_C169076
a1306s a1306s delta-variant gene target measurements bio sarsCov2 mutation naNomenclature NA
beta Beta measurements bio sarsCov2 variant who NA
cov Covid-19 measurements bio sarsCov2 disease ICD NA
covN1 SARS-CoV-2-N1 measurements bio sarsCov2 allele naNomenclature NA

Consistent use of the approach above would have a few implications.
A) Currently, we have a few parts like, partID ‘virusMisc’ (Miscellaneous viruses group). We would change these to individual groups with class = ‘organism’.
We would have just a list of organisms without the ability to quickly list all the viruses, bacteria, phages. We can robustly do that now, and I don’t think other models can either. There are a few options if we wanted to be able to keep the concept of different types of organisms. Probably, the easiest approach is to use class. Instead of ‘organism’ (which is a general term captured in the ‘domain’ = ‘bio’), we have more specific terms that represent the kingdom or similar taxonomy level (virus, bacteria, phage, fungi, etc.) I think it may be easiest to keep it simple and just use class = ‘organism’.
B) We would have quite a few groups, but I think that is okay. Using group and class for a high-level classification of measures seems robust and intuitive.

Indeed, if we don’t aim to recreate the lineages (and I agree that there is no pressure to do that) your proposed approach could work quite well. Let’s do it this way, thanks!

Here are a few references that list organisms and targets examined in WW.

Kilaru P, Hill Tiwari A, Kurittu P, Al-Mustapha AI, Heljanko V, Johansson V, Thakali O, et al. Wastewater surveillance of antibiotic-resistant bacterial pathogens: A systematic review. Frontiers in microbiology. 2022;13:977106–977106.D, Anderson K, Collins MB, Green H, Kmush BL, et al. Wastewater Surveillance for Infectious Disease: A Systematic Review. Am J Epidemiol. 2023 Feb 1;192(2):305–22.

Santiago M, Olesen SW. Pathogen biomarkers in wastewater, stool, and urine: an informal literature survey [Internet]. BioBot; 2023 [cited 2023 Apr 21]. Available from: http://biobot.io/wp-content/uploads/2022/05/2022-04-28-Pathogen-lit-survey-combined.pdf

Tiwari A, Kurittu P, Al-Mustapha AI, Heljanko V, Johansson V, Thakali O, et al. Wastewater surveillance of antibiotic-resistant bacterial pathogens: A systematic review. Frontiers in microbiology. 2022;13:977106–977106.

Santiago M, Olesen SW. Pathogen biomarkers in wastewater, stool, and urine: an informal literature survey [Internet]. BioBot; 2023 [cited 2023 Apr 21]. Available from: http://biobot.io/wp-content/ uploads/2022/05/2022-04-28-Pathogen-lit-survey-combined.pdf

Sorry - I like where this is going but I have some clarifying questions to help me understand.

1 - so we are changing all organisms to groups? ie. the virusMisc group will be depreciated, and then we will add RSV, Flu, etc. as groups?

2 - then every group that is an organism will have a class that specifies organism? This is just for internal organization though, I assume?

Otherwise, I think this looks great, and the organism list looks great to me. This is the ontology that PHA4GE requires users to submit their organisms as: Home - Taxonomy - NCBI

So we are changing all organisms to groups?

Yes.

ie. the virusMisc group will be depreciated, and then we will add RSV, Flu, etc. as groups?

Yes.

then every group that is an organism will have a class that specifies organism?

Yes.

This is just for internal organization though, I assume?

Yes. This is just to help the organization. I suggest, thou, we create a few additional lists for the documentation including a list of groups (or maybe organisms), and then below the list of groups, have the specific measures by group. Similar to list of tables and sets.

This is the ontology that PHA4GE requires users to submit their organisms as: Home - Taxonomy - NCBI

Let’s use that Taxonomy as the default. Currently, we are using NCI but that NCI reference has reference to the NCBI taxonomy. See SARS-CoV-2 below.

http://purl.obolibrary.org/obo/NCIT_C169076

Taxonomy browser (Severe acute respiratory syndrome coronavirus 2) (uottawa.ca)

2 Likes

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

This is the workplan. @matthomson and I will take it on.

  1. Add class = ‘organism’
  2. Depreciate class = 'misc…" miscVirus, etc.
  3. Mutate/clean up existing organisms to the new format – focus on RSV, influenza and other organisms that are in high use.
    • Make/mutate existing listing to ‘group/organism’ format.
    • Ensure all measures have the appropriate group. i.e. all RSV measures should have group = ‘rsv’.
  4. Review existing classes and measures to ensure everything is updated.
  5. Ensure links and names correspond to NCBI.
  6. Ensure all updates are documented in the revision tab.
  7. Write a changelog with a good description of this list.
  8. Create new lists for the documentation. Follow the approach of table lists.
  9. Create and publish and new 0.1 version update.
1 Like

A final update on this one, after a meeting @dmanuel and I had to hash this out completely:

1 - Every organism will have a high-level group created in line with the NCBI taxonomy ontology.
2 - Bearing in mind item 1 above, the “high-level” part means that there will be a group for RSV, and a group for Hepatitis, but not individual groups for RSVA, RSVB, HepA, HepB, etc.
3 - We will keep the measurement part-type parts for general viruses, like rsv, SARS-CoV-2, rsvA, etc. These will be nested within the associated group. ex: rsvA, rsvB are both general measurements, within the rsv group.
4 - the virusMisc group will be changed to be the a class instead, but a “miscellaneous organism measurement” (or orgMisc) class. This is to correct the fact that an RSV-A measure is not an allele measure, but a general measure of RSV-A. So now we have a general RSV-A measurement, within the RSV group, and the orgMisc class.
5 - So now when we create or add a new measurement for a virus or bacteria, we create an associated group for that organism, and general measurement within that same group and the orgMisc class. Additional measures can be added to the group within other classes, like allele, mutation, variant, etc.

EX: A lab starts testing for the pizza virus. Currently though they’re only reporting whether or not a sample tests positive for the virus. To the ODM parts list we add a new group (pizzaVirusGrp), and a generic measurement for pizza virus (pizzaMe). The pizzaMe part has pizzaVirusGrp as the group, and orgMisc as the class.

Later, two new types of pizza virus are detected, with pepperoni pizza virus being dominant in the spring, and Hawaiian pizza virus being dominant in the fall. These are also added as generic measurements - pepPizzaMe and HawaPizzaMe - both in the pizzaVirusGrp group, and the pizzaMe class.

Later still, we begin to be more specific and test for the pineapple gene region/allele, and the mozzarella gene region/allele when testing and reporting on these viruses. So now we add the measurements pizzaPA and pizzaMozza, which are also in the pizzaVirusGrp group, but the alleles class.

The part for pizzaVirusGrp group has organism as the class. When mapping from other dictionaries, the organism field will most often map to the group field for an organism entry.

1 Like