We would like support to ingest epi week and year average data into Measure Table.
The table looks like the following:
Location (site/City or PT)
EpiYear
EpiWeek
Week Start
Measure
Value
Toronto Ashbridges Bay (TAB,)
2025
1
2024-12-29
covN2
24.9311116078494
Toronto Highland Creek (THC)
2025
1
2024-12-29
covN2
189.486292492068
Toronto Humber (THU)
2025
1
2024-12-29
covN2
176.4997421
Toronto North Toronto (TNT)
2025
1
2024-12-29
covN2
423.6073144
In the current setting, we believe it can feed int Measure, however, we have an issue to link the value to the sample table. A value in a Measure table should be link to only one sampleID. The average value reported by epi week and year includes more than one sample that week.
Any Thought ?
Merci
fyi - @NHizon - feel free to add an additional details you can think about
In a previous chat with @NHizon we added a new aggregation option, which was the “epi week mean”. We didn’t with this provide necessarily which epi week it applied to, or create the infrastructure or guidance to necessarily be able to derive that.
As we look forward, we see that more people are using epi weeks and so being able to better ingest or record epi-week-associated data, and to have better infrastructure for epi week averages, becomes more important.
As a result, in discussion with @Sorin, @asmabahamyirou, and @nikho this morning, I floated the idea of adding epiWeek, epiYear, and epiWeekStart to both the measures and the samples tables. This would join the other date fields as optional fields, to allow for a more diverse range of options when recording date information. While this does err on the side of potentially duplicative/redundant, each different way of recording the date can infer parts of the other, but not always the whole. It also clarifies for api-week-level measures, which epi week it is specifically referring to.
There are two possible redundancies to address:
having the epi week reported in both samples and measures might be overkill, as the the epi week of one is the epi week of the other. It might make more sense to only record this in measures (ie. noting the epi week to which the measure applies)
The epi year is kind of obvious based on the epi week start date (with the exception of maybe the first week of the year). The epi week start date though is kind of essential however, since even with a standardized week, people do still calculate the start and end times differently. Do we keep all three? Or just epiWeek and epi week start date? Curious for your quick thoughts, @jeandavidt and @dmanuel
Finally, in re-reading your post hear @asmabahamyirou , I think I see what you were trying to say this morning. Basically “how do I link one measure based on 7 (or more) samples to the samples table?”. For this, I think we turn to the sample relationships table. You would set up a sample ID that is missing most of the metadata because it is actually a series of samples, and you define in the sample relationships table that each sample is part of that series. Let me know if this makes sense, and also if I ended up understanding correctly. I can also make a better worked example to showcase what I mean.
Looking forward to hearing other folks’ thoughts!
1- yes! The sampleIDObject and sampleIDSubject fields both have to be sampleIDs from the samples table. In this case both relationships feature the same object because the two subjects are part of that series. The series-TAB-01 is then defined as a series sample in the samples table, and referenced in the measures table.
2- No particular reason. It made sense to me since that was the date of the first sample to mark that as the collection date for the series. But it would perhaps be more accurate to use collDTStart and collDTEnd to capture the length and breadth of the series’ samples collections times. If that makes sense?