Myself and my colleagues at ODS put our heads together and wanted to get a good response out to 360Giving’s great feeback. It follows below Sorry in advance for the mini-essay!
Firstly, thanks to both Marion and 360Giving for taking the time to feed back on HSDS 3.0. This is especially true given that the feedback was both positive and constructive, and means a lot coming from 360Giving.
Organization Identifiers
You are correct and we’re very grateful that you raised this ambiguity as an issue. It’s not currently clear in our documentation what some of the fields within organization_identifier
are for and we’ll be taking action to clarify this. To respond to you directly:
-
organization_id
is the join for the tabular data to the organization
.This is a uuid, and is different to the id
of the organization_identifier
object (discussed below). We see how this naming convention produces confusion given that it is within the organization_identifier
object. We will seek to clarify this through documentation.
-
identifier_scheme
is the org-id scheme taken from org-id.guide
-
identifier_type
is a human-readable version of identifier_scheme
and acts as a description. It also covers cases where there’s a need to describe a scheme which is not present in org-id.guide, since publishers often need to publish faster than org-id.guide adds requested schemes.
-
identifier
is the actual identifier string e.g. a GB-COH number.
The field title “Third Party Identifier” (for the identifier
field) is named to show that it is not the publisher’s identifier used for this organization, but one drawn from an external list.
In terms of the “identifier for the identifier”; this is a result of a few decisions which were made consciously to support publishers and data users.
With 3.0, HSDS is transitioning from a data package model — where different entities were represented in tables — towards a JSON schema model more suited for APIs. We are supporting publishers who continue to use the data package model through tooling to convert between them.
This requires that each entry in the organization_identifier
table has a uuid identifier for the “row”. This supports the conversion between formats, allowing publishers to continue publishing via data packages. Alongside this, HSDS does not have a strong concept of a single “top-level object”, unlike other JSON standards which can confidently model based on e.g. “a procurement” (OCDS) or “a grant” (360Giving). id
fields therefore become important features of each object definition to allow for multiple representations based on the needs of the data user.
Another benefit of id
fields for the organization_identifier
object is that publishers can attach multiple organization identifiers to an organization. Different publishers may identify the same organization using different schemes, so having multiple organization identifiers therefore helps with interoperability between HSDS data sets as well as with other standards. Other standards also take this approach e.g. in OCDS, the organization object has an identifier
field but also has an array of additionalIdentifiers
for the same reason.
For data users analyzing via spreadsheets and publishers publishing via data packages, this id
field also makes it possible to represent the one-to-many relationship between organizations and identifiers.
Ultimately, as HSDS continues to focus on JSON publication for future versions of the standard, we expect discussions around top-level objects to play out constructively which may mean that this is not such an issue; however we will still need to cater for spreadsheet analysis as well as being able to apply multiple identifiers to an organization.
Location updates
Thanks for raising the concern about what3words, this echoes other discussions and we will be removing the mention of what3words from the documentation due to community feedback.
In terms of your suggestion on a closed list of third party schemes for location identifiers, we agree that it is much better to standardize this data by restricting options. This also has the benefit of interoperability between datasets which include location information from the same schemes.
We could investigate including this as an optional feature in a future MINOR update, to support backwards compatibility with 3.0. From there, the next MAJOR version of HSDS could afford to be more decisive in restricting or standardizing these identifier schemes.
It is worth noting at this point that HSDS “profiles” may apply additional restrictions which reflect the needs of particular publishers and users. This may provide a way to address this in the short-term and feed into the discussions on how to approach this in future versions of HSDS.
In the long term, we must balance the benefits to standardizing the third party identifier schemes against the work required to decide on an initial list of these schemes, and then to manage this list. As noted, what3words can be considered problematic and we’d want to ensure that any scheme we “endorse” by means of inclusion was appropriate. Analysis of the HSDS corpus may provide some clues as to which schemes are used the most in practice. This would give us a good starting point for engaging the community around this topic for future upgrades.
If we did restrict the use of third party identifier schemes we’d be keen to ensure that this didn’t pose a barrier for some publishers. Some areas (geographic or professional) may not have access to collect or produce data in certain schemes. We’d also want to avoid accidentally creating a list of schemes that could be framed as being too US/Europe centric.
In practice, this may not turn out to be a problem but we’d welcome collaboration to learn from your experiences and the experiences of others to avoid this.
Funding
This can be considered for a future version update, and perhaps added as an optional feature for a MINOR update subject to the priorities of the HSDS community.
One barrier to publishing this data is that it can be difficult for some HSDS publishers to provide this themselves or gain access to it. This is because some may see it as not directly relevant to service discovery. Nevertheless, it would be straightforward to include an optional feature to describe the funding organization as well as the recipient organization in a future update of HSDS.
Longer-term, there are exciting opportunities here for interoperability with various other open data standards including 360Giving. If a service is known to be funded via grants then it makes sense to create links between data sets by referencing e.g. 360Giving grant identifiers. The same can be said for services funded through procurements and linking with OCDS datasets. This supports a fuller picture of service funding and drawing lines to evidence impact of spending on services. This is a larger piece of thinking which will require time to get correct, but we would be excited to approach this for future versions of the standard.