I bit of a one-off thought, but I was wondering what people thought about creating a data quality flag for mapped data.
The idea came to me because as I have been developing the “translation keys” for mapping between the ODM and other models/data structures, and as I go through I sometimes think “this mapping works, and is true, but something is also maybe lost in this ‘translation’”. So I wonder if mapped data might benefit from such a flag, so that if there’s any oddness or missing mandatory fields, there’s an explanation for that? Or does that defeat the point of mapping and interoperability? Curious to hear people’s thoughts!
Information about the mapping process would be helpful to capture. This issue combines data quality and data provanance.
Not to make the issue more complicated; it is common to go through several data mapping stages before reaching ODM and also these mappings can occur after.
What about identifying the data was mapped from. The database table may be the most appropriate location, and we would likely require a new field. That field would be something like: original data dictionary. Ideally, we would also provide support for a chain of mapping or data manipulation, but knowing how the data was first collected may be the most important step.
Having a data quality flag could work well. However, knowing how the data was mapped to/from ODM will give a good indication of data quality of the mapping process.
I’m still inclined to agree with your idea here, @dmanuel, to add a field for “original data structure” or something to that effect. I agree that it makes the most sense in the datasets table as well. @jeandavidt - any thoughts?
After team discussion, it was decide to add an “originalFormat” field to the datasets table. I’ll come up with a category set for this field as well to try to keep it relatively clean. It’ll show up in red in the ERD for the next little while, but that makes this issue resolved.