Should we add `$id` properties to the HSDS Schemas?

mrshll · September 4, 2025, 3:27pm

Having $id properties is generally considered good practice for JSON Schemas for a few reasons, and becomes particularly important when we consider validators, SSOTs for versions of the schema files, and people cacheing it.

I’ve raised a Github issue to discuss this:

github.com/openreferral/specification

Consider adding $id elements to the HSDS Schemas

opened 03:13PM - 04 Sep 25 UTC

mrshll1001

We should consider adding `$id` properties to the HSDS schemas, allowing them to… be uniquely identified and follow conventions for JSON Schema 2020-12 so that tools (such as validators) may resolve references properly, and that the HSDS Schemas may be re-used. This is a particularly pertinent issue for HSDS, which uses multiple schema files with relationships between them. Key benefits: * Schemas can be versioned umbiguously, rather than by de-facto URL resulting from the Github branch * Therefore validators can use more intelligent cacheing to improve performance and disambiguate between Profile schemas which are named the same thing * Schemas are more composable, meaning other Standards or systems could more easily reference HSDS' schemas for their own uses. Tradeoffs: * Profiles mechanism would likely become more involved, requiring us to either adapt the tooling or force the Profiles themselves to adapt. * Slight maintenance burden for MINOR version upgrades updating the `$id` and `$refs` in schemas. **Reasoning**: Currently, the HSDS Schemas lack `$id` properties which provide a canonical definition for the schema file itself. The main issue this creates is that tools which cache schemas incorrectly may then fail to resolve references to other schemas for validation. This is because `$id` is used to establish a base URI for resolving definitions of schemas. `$id` in JSON Schema 2020-12 is a special property which allows a schema to define its own unique identifier in a URI format. Note, that this should not be confused with the `id` properties of each *object* defined in HSDS. `$id` represents an identifier *for the schema itself*. The schemas function without `$id` for our current uses mostly because they are co-located in the same folder in the Git repository. For example when `service.json` references `organization.json` in `service.organization`, the property is defined by way of a reference to `organization.json`: ```json { "organization": { "name": "organization", "title": "Organization", "description": "The details about each organization delivering services. Each service should be linked to the organization responsible for its delivery. One organization may deliver many services.", "$ref": "organization.json" } } ``` ([source](https://github.com/openreferral/specification/blob/3.1/schema/service.json#L320)) This works because `organization.json` is located in the same folder on Github. So anything resolving the `$ref` looks into the same folder. This is fragile because it could fall down with cacheing; and we have already seen that [the ORUK API Validator maintains local copies of the ORUK Schemas](https://github.com/tpximpact/OpenReferralApi/tree/main/OpenReferralApi/Schemas/V3.0-UK). Future validation tooling which maintains cached copies of some schemas may end up using a mix of older schemas and newer schemas as it uses some from its local cache and fetches others fresh from Github. This situation is avoided if there are canonical `$ids` for each schema file, which would change with each MINOR and MAJOR version of HSDS. **Immediate effects** * We'd need to update all of the HSDS schemas to include an appropriate `$id` property * All internal references (`$ref`) to other schemas in HSDS Schemas would need to be updated to point towards these ids, including on the `openapi.json` file. * Due to the way Profile tooling works, Profiles would need to explicitly override the `$id` property of each schema to replace it with their own, as well as the relevant internal references; otherwise the Profile schemas will reference the vanilla HSDS schemas instead of the Profile-specific ones. This could be overcome with a re-working of the Profiles tooling. **Ongoing maintenance** * For every MAJOR and MINOR upgrade, we'd need to update the `$id` and `$ref` properties for each schema file. This can be automated or done relatively mechanically with find and replaces. **Other thoughts** The `$id` is a URI, not a URL; but it generally should also be a URL so it can be resolved. The best bet for our current infrastructure is to use the Github "raw" URL for each schema file on the version branch. So the `$id` for `service.json` in HSDS 3.1 would be the following: https://raw.githubusercontent.com/openreferral/specification/refs/heads/3.1/schema/service.json This is a bit unwieldy, which is more of a human issue than a mechanical one. However, it also ties us to Github. I don't foresee us moving away from Github any time soon but if we ever did choose to — for whatever reason — then this would break our canonical ids. Open Contracting get around this by hosting their canonical schemas outside of Github on their `standard.open-contracting.org` domain. However Open Fibre just use the Github URL similar to outlined above: * [Index of /schema on standard.open-contracting.org](https://standard.open-contracting.org/schema/) * [network-package-schema on Open Fibre's github](https://github.com/Open-Telecoms-Data/open-fibre-data-standard/blob/0.3-dev/schema/network-package-schema.json#L2) If we were to do this, I don't see the harm in having the Github URL for now. While I like the Open Contracting approach from a perspective of technical purity and flexibility, I'm not sure the benefits weigh up against our priorities and I think it's more important to have `$id`s than to have *perfect* ones.

In general, I think that this is the direction we should be headed in. This has already been raised as a problem to me by Jeff Cumpsty, who is maintaining the ORUK Validator at the moment. Between: the canonical HSDS Schemas; the UK Profile Schemas; and the copies of the UK Profile Schemas inside the ORUK Validator tool – there are three copies of schemas with similar names at different locations. Having $id properties cuts through this ambiguity, allowing validators to cache schemas properly and allows us to track the version of schemas as part of the identifier.

Theoretically, adding these is as simple as adding the $id properties in with an appropriate URL, and updating the references appropriately. However, this might create a burden on Profile maintainers due to how the Profiles mechanism works in practice. The Profiles tooling could of course be adapted, but this would take a bit of time.

It’d be appreciated if people could weigh in here or on the Github thread with their thoughts on this issue.

jeffc · September 9, 2025, 12:39am

Hi everyone,

I’m suggesting we add ID properties to the HSDS schemas. This idea came to me while I was looking through the changelog for the 3.1.1 Profile and noticed that there were several copies of the JSON schema files without a clear way to identify the latest or standard versions.

I believe this is a crucial step for two main reasons.

Managing Schema Versions

Adding a unique ID to each schema would solve the versioning problem by removing ambiguity. Generating and including a new unique ID for each modification, assists to track changes.

I also suggest that profile tools could be modified to generate unique IDs when compiling a profile. This could include information about the profile owner, which would make it even easier to track the origin of a schema.

Centralized Schema Repository

The unique IDs would also make it easier to set up a central repository for generated schemas. Instead of sharing entire files, a URI (the $id property), points directly to the individual schema.

From my perspective, this would eliminate the need for us to store schema files locally. It would also reduce duplication and the potential for errors.

I think these two points, especially the logistics of implementation, are worth further discussion.

jeffc · September 30, 2025, 5:15pm

I guess this topic has stagnated for a while.

My specific reasons for wanting “a unique” $id value assigned to a generated schema is to simplify validation process.

An id for a schema is a URI which uniquely defines the schema. If there exists a central repositoy for the generated schemas, the URI becomes a URL which permits validation logic to remotely access the schema and avoids duplication without ther need to store json schema’s locally. You also remove the opportunity for variances in copies of copies of copies of the schema files. I appreciate that could include some very bespoke schema diverging from a base HSDS profile.

(anyone correct me if im spouting rubbish)

The validation engine, which I envision (I could be way off base), would potentially only require 2 URL values…the data feed and a schema version that it must satisfy.

We can talk about using LLMs and CLMs later.

Topic		Replies	Views
How far do we expect HSDS resources to be standalone and re-usable? Technical	1	256	July 4, 2022
3.0 for review and comment Technical datastructure , api	20	723	June 22, 2023
First draft of HSDS 3.0 proposal Technical	8	470	November 2, 2022
Approaches to extending the HSDS data structure Technical	8	491	January 25, 2023
Codifying "recommendations" and "warnings" Technical	1	291	July 4, 2022

Should we add `$id` properties to the HSDS Schemas?

Related topics