I’d like to submit this proposal for consideration by the committee. This would be included in version 3.2 of HSDS.
For background on the discussions leading to this proposal see the original forum post
This looks good to me.
We’ve asked the committee to vote on this using a google form. @skyleryoung made an interesting comment which I want to share here for responses.
I’m voting yes, but I do have questions about implementation details. This augments the API, but not the specification, is that correct? In other words, how would this get expressed in the tabular version of the schema definition? I’m not sure maintaining that is the highest priority, but if we’re diverging the two version of the specification we should acknowledge and discuss it.
Here’s my initial reckons, very happy to discuss further
This augments the API, but not the specification, is that correct
Yes, that’s correct
In other words, how would this get expressed in the tabular version of the schema definition
Based on the current proposal, it wouldn’t. If we wanted to here’s some initial thoughts.
We have the metadata object already, but that is currently for record level information. Maybe it could be expanded and adapted for this purpose.
OCDS has a release package schema which is the most direct tabular comparison I can think of to this. Introducing something like this would be a major overhaul.
If we did either of the above we’d probably only want to include publisher information, not developer? Something to think about.
if we’re diverging the two version of the specification we should acknowledge and discuss it
We already have some data that is returned by GET/ that, as far as I know, isn’t included in the tabular version.
I think this is still important to consider though.
I think clarity could be improved by providing some definitions.
First a definition of terms:
- What do we mean by “publisher”?
- What do we mean by “developer”?
- What do we mean by “provenance”?
Second, where exactly in the JSON schema will these fields be located? Are they sent once at the top-level for each API? Or are they sent with each record?
I acknowledge that I’m more accustomed to thinking from a specification standpoint as opposed to an API standpoint, so that may be where I got lost.
Thanks @skyleryoung here’s the definitions - I can incorporate these into the proposal doc if needed.
term | definition |
---|---|
publisher | the organisation responsible for publishing the entire data set/API. This could be an organisation who are publishing data about their own services or it could be an organisation combining data from various sources |
developer | the organisation developing the API. This could be the same as the publisher if they have in house capacity to do this. Or it could be an organisation that has been contracted by the publisher. |
provenance | in this case refers to the overall responsibility for a whole data set, rather than the original source of each record which could vary across records. We aren’t provididing information about how the data has potentially been collected/transformed to get to the current state. |
As @MikeThacker has found an alternative way to feed this info into the UK dashboard I would be interested to hear thoughts about whether the developer information is still useful to include.
It could be (e.g. if someone noticed common issues across APIs made by a particular developer) but perhaps the publisher info is the most important thing.
Where exactly in the JSON schema will these fields be located? Are they sent once at the top-level for each API? Or are they sent with each record?
They would be sent once at the top level GET/ alongside the fields version
, profile
and openapi_url
I don’t love the term developer
here but i don’t have a suggested alternative yet.
I mostly just want to chime in to suggest steward
as the designated holder of responsibility for a given record or set of records.
I mostly just want to chime in to suggest
steward
as the designated holder of responsibility for a given record or set of records.
would this be in regards to Publisher/source metadata at a record level or as alternative to publisher
here?
Yeah i mean these should be subject to further consideration, but i think there’s publisher of the aggregated dataset, steward of individual records, and then there should be a term for a representative of the organization that the record is about (maybe that’s source
tho in other contexts i use registrant
or representative
)
Thanks @bloom I’m going to copy your comments over to the metadata at a record level thread talking about future developments where we can discuss more.