Version 3 Ultra-Minimal Structure and Template

For the launch of version 3 of the ODM, we are planning on also launching an ultra-minimal structure and template. The goal for this is to cut away everything that’s not essential so that users can get started in under a minute and understand the structure of the model.

To this end, I have started the process and cut away the reference tables, as while they are essential, I think they clutter the ERD in a way that is confusing to users freshly approaching the ODM. I have also eliminated all other tables except for: measures, samples, and sites I am of the opinion that while it is ideal to capture additional information on methods/protocols, instruments, organizations, etc. It is not purely essential. I also trimmed the majority of the fields in each of these tables, but I left some that I think could still be removed. I’ve attached an ERD of this ultra-minimal structure here for your review:

Curious to hear the perspectives of users and other developers @Sorin @NHizon @dmanuel @jeandavidt

Hi Mathew,

I think the absolute minimum ERD could get rid of the samples table as well. The mandatory fields in the measures table would be measureRepID, siteID, aDateEnd, measure, value, unit, aggregation and in the sites table just the siteID and the site name (which is missing from your ERD). This is also supported by the way some of the EU countries are presenting data in their dashboards.
What do you think?

I agree that a lot of labs don’t report sample details. But I’m kinda of the opinion that there are two approaches here:

1- a true minimal template with basically just the measure, value, and the site name.

2- a more opinionated minimal template where we opine (somewhat) on what should be considered the minimal data. And for that I think we should include some sample details.

I think that we should maybe be trying to do the second option, but I’m open to moving on that depending on feedback from others.

Hi Matthew,

The minimal version sounds like a great idea. It’ll definitely make the ODM less intimidating and easier to start getting started with.

The question seems, to me, to be who the data is “minimal” for.

If an organisation always uses the same sampling protocol, then the sample data is sort of implied and known to them. However, as a potential user of open ODM datasets, I could not use datasets that don’t at least report some sampling details (was it sludge, wastewater, was flow proportional or time proportional?) The fields you kept in the table look like a very good minimal set for that purpose.

Cheers!

A great point! Thanks, JD. I think - of course - you’re right, that for internal data management and tracking, much of this additional metadata may be moot. Specifically the sampling metadata. I think we should be aiming for data sharing/interoperability standards, even if data is only internal.

I think we could potentially trim this further by removing the reportable fields. We could also potentially drop compartment. I think we could even trim off some of the date fields, for example, and keep only collDT in samples and only aDateEnd in measures. Do we think we should keep fraction still? And for sites, do we think that latitude, longitude, and EPSG are all mandatory? Should be drop EPSG as a piece of the minimal standard?

I think fraction is probably still required, as it’s a very basic piece of data that can be informative about the analysis method. compartment is required if the minimal template is supposed to be usable outside of just wastewater. By removing the different dt’s, you get fewer fields, but you might create more ambiguity for the users, which is also not ideal.
And if geoEPSG is removed, harmonising datasets in GIS will be impossible (unless you impose an EPSG in the minimal structure)

Great comments.

  • The minimal example could/should be WW.

  • I wouldn’t have though of not having sample, but I agree with the comments.

  • A key for the minimal example is to start with a wide format table, IMO.