Extra metadata with API feeds

I’d like to propose extra optional fields to be returned by the root/stub API endpoint.

Specifically to allow the Dashboard (new version is under development) to be documented entirely from a feed under the publisher’s control, we could add:

  • Organization name
  • Organization URL
  • Developer name
  • Developer URL
  • Summary text

There may be extra data (like administrator email address) which should not be in an open feed and should stil be communicated manually.

1 Like

Jumping in here for the first time to piggy back on this, I’d like to also propose that the version field returned from the same GET / endpoint should be a constant with a value of HSDS-UK-3.0 or something similar.
To me it becomes a much more useful field if it can be relied upon to describe the version in a constant format, particularly going forward if each new version of the standard used a constant field here as well.

As for Mike’s original proposal the extra fields would be useful for our current development and could be useful for others trying to consume this APIs as well.

2 Likes

As agreed at the last technical standing committee meeting, I’ve drafted MBT10038 - Use cases for extra metadata with API feeds.

I’ve opened the link to comments by anyone and will accept all sensible suggest changes or raise them with the committee if controversial.

@klambacher I think you said you’d review and add to this. Feel free to change my text as you think fit.

Thanks @MikeThacker for pointing me to this - just noting having this field as a constant wouldn’t for minor or patch releases due to backwards compatibility requirements (e.g. data than conforms to 3.0 also conforms to 3.1)

However we could still have a fixed list of acceptable versions or a pattern

1 Like

As we discussed, I wanted to chime in with some of the meta data that we have need in the past and why; some is tied to the file / feed and some is per-record. I’ll break it down by who “needs” the extra information and why.

  1. System / software level:
    We have different protocols for import based on the source system, even with the same incoming format, and our external ID storage is actually system code (specifically, the software it came from, but theoretically we could accept system codes that were multiple sources but the same software) + external ID. This allows distinct handling based on source (e.g. taxonomy or coverage area name transformations). For non-GUID identifier systems (not an issue with HSDS but is an issue for some systems we manage) it also allows for a unique ID to be created via system code + external ID, even in cases where we have the record coming in with the same ID but gets forked / duplicated for various reasons. This is also distinct from what we would call the source database name/URL, since we can have the same software or version but multiple sources / feeds.

  2. Funder / attribution:
    In some cases we have data from multiple data managing organizations or multiple public source sites coming out of the same software system in the same file / feed. This means attribution at the system level is not sufficient, and needs to be available per-record, for funder verification of contributions + data quality analysis, and public record attribution in websites etc.

  3. Ability to request changes and report issues:

  • Need both an overall data source / provenance AND where possible the method to use to request corrections to specific data; this is key for trustworthiness and reliability for users.

In sum…

At the file/feed level we would have:

  • Source system code (consistent system-internal identifier for the source software system, which is used for unique ID formation + specialized import handling)
  • Source system name (“user-friendly” database source name in all applicable languages for display purposes)
  • System URL (all applicable languages, this is to the website of the data owners / public database source NOT the software vendor)
  • Source system version
  • Schema version

At the record level we would have:

  • Record owner agency information (we have a unique code + name for attribution purposes and funder use)
  • URL for submission of changes to the record (to account for the public and/or other end user wanting to suggest possible corrections to the data when they do not have direct editorial permissions)

Thanks Kate. I’ll leave others to assess how much these requirements apply to data publishers in general. Just a couple of comments:

  • We already have Schema version (as well as profile)
  • A URL (as well as possible email address) for submitting changes would be useful for the feed as a whole, even if some use record level data instead or use the record-level value as an override

I’ve started writing this up in the proposal template combining Mike and Kate’s suggestions.

Kate you’ve mentioned having record level/feed level meta data.

Mike - with your original proposal if we had a data republisher making a new feed combining several feeds the “organization_name” for the mega feed would be the name of the republisher? In which case does this meet the user need regarding provenance of the data.

If we want to capture original source publishers then it might be we need to look at record level metadata for this aspect.

2 Likes

I would expect the publisher to be the organization combining the feeds, and so taking some responsibility for the resultant content.

For simplicity I’d suggest that the summary text could include details of the included feeds if the publisher wants to put that there.

In our verbal conversation, we also mentioned an optional contact email address for the publisher. This could be used to direct queries on content which the publisher could pass to the original publisher or just put the two parties in touch with one another.

I’d suggest waiting to see if there’s demand for that. Note that a republisher could be republishing another combined feed, so the ultimate solution would be something recursive!

:pray:

2 Likes

I have written this up in a proposal here and this will be added to the agenda for the next committee meeting Extra metadata with API feeds

This doesn’t include

  • record level metadata, which is something we may want to consider in the future
  • standardising/restricting the format of the schema field that is already present. This is something we could look into separately as a PATCH level proposal
2 Likes