I have a few feedbacks/questions for the polygon table:
If the polygons reported are WWTP catchement areas then are the values reported for the variable polypop the same than for popServ from the site table?
We get update of the equivalent inhabitants behind each WWTP more or less every year (equivalent inhabitants can change slightly over time depending on the population movements and also modifications on the sewage network).
polypo and popServ only allow to report a single value but it would be better to report the equivalent inhabitants evolution over years. How to report all the updates of the equivalent inhabitants in the ODM (taking into account we use the inhabitants to calculate the viral load, so each covid measures should be easily linkable with its corresponding yearly inhabitant value)? By creating a new entry for a wwtp in table site (and its catchment area in table polygon) each time the inhabitant are updated? Or maybe more complex but more efficient by creating a separate table for population data but then more difficult to link a measure with a specific year inhabitant value?
why not use a geography/geometry variable type for geoWKT in table polygon and geoLat/geoLong in table site instead of varchar? Databases have a set of tools which allow to query geographical/geometric data.
And finally more a long term thought about the polygon table: I would include all the geographical data which describe a site in this table not only polygons. That would include the location of the site (which are points object in GIS system), polygons (a second GIS object) for catchment areas and even lines (the third GIS object type) for a sewage network. I would also rename the table Geometry or something similar.
The polygons can be really anything (i.e. administrative areas, health zones etc_, not necessarily a catchment area but yes, if it’s a catchment area served by a single wwtp/site, then yes, the polypop and popServ could be the same.
Any time a parameter is changed in the “yellow” tables (i.e. you get updated values for equivalent population), an updated dataset with the changed values should be provided. The philosophy here is that the known (old) values are valid until they are changed by an update (no need to provide a full dataset with each set of, say, weekly measurements). That’s what the lastEdited dates are used for.
Geometry use could be problematic when lacking the projection, so we insist on using geography and specifying the EPSG. Also, different countries have different conventions in reporting geographies and coordinates. We believe that using varchar allows for maximum flexibility and inclusiveness. Anyhow, the ODM ERD that you see is, if you want, the customer-facing interface, not the database implementation, which comes after a set of ETL processes that will, of course, validate and convert all the inputs to proper database data types.
Each point/site can be associated with multiple polygons and, moreover, can be a parent for other points/sites (think all the manholes in a sewershed). Lines are just degenerate polygons so yes, we can even store the network if we really want. Regarding the use of Geometry, again, I’d avoid it at all costs. It might work at local level but not when dealing with international reports, with multiple projections.
Regarding the second point, my question was rather how to handle this in a database which would use the ODM for its structure. We could of course modify it on our own to take into account multiple population values across time, but would not it be better to foresee the possibility for a next version to take this limitation into account and provide a solution in the ODM?
These population values across time can be important for the end users if for instance they want to calculate the viral load (viral load = conc * daily flow/ (pop*100.000)). We always use the viral load in Belgium for reporting, decisions, etc. and never the concentration alone and the pop values in your calculations are these for the corresponding year of the sample day used to measure the concentration.
If we only provide the last update of the population value the end users will never be able to reuse and reproduce the calculations like we do.
The only possibility I see to take into account this with the current ODM is to add an extra line for each WWTP in the sites table each time the population value is updated
I don’t really see a problem but maybe I do not properly understand your concern.
As I see it, for a particular site/polygon you will create a history of popServ/polypop every time you provide a new dataset with this value updated. We won’t erase the old value and update it with the new one. Of course, the lastEdited date should be updated with each addition or change of any value in the site or polygon tables.
Your measurements comes from samples having a colDTstart and colDTend (collection start and end dates) which will always fall between two lastEdited dates of the popServ/polypop or the last lastEdited value and the present date. With this, you can associate the measurements done on the samples with the proper popServ/polypop.
Does this answers your concern?
Yes I think this replies to my concern.
The only downside I believe is that the sites table entry will have to be duplicated each time the popServ is updated. Maybe it would make more sense to store the popServ in a separate table with links to siteID and measureID. This way each WWTP would be reported only once in the sites table.
But this would of course be more for a long term improvement of the ODM.