Having $id properties is generally considered good practice for JSON Schemas for a few reasons, and becomes particularly important when we consider validators, SSOTs for versions of the schema files, and people cacheing it.
I’ve raised a Github issue to discuss this:
In general, I think that this is the direction we should be headed in. This has already been raised as a problem to me by Jeff Cumpsty, who is maintaining the ORUK Validator at the moment. Between: the canonical HSDS Schemas; the UK Profile Schemas; and the copies of the UK Profile Schemas inside the ORUK Validator tool – there are three copies of schemas with similar names at different locations. Having $id properties cuts through this ambiguity, allowing validators to cache schemas properly and allows us to track the version of schemas as part of the identifier.
Theoretically, adding these is as simple as adding the $id properties in with an appropriate URL, and updating the references appropriately. However, this might create a burden on Profile maintainers due to how the Profiles mechanism works in practice. The Profiles tooling could of course be adapted, but this would take a bit of time.
It’d be appreciated if people could weigh in here or on the Github thread with their thoughts on this issue.
I’m suggesting we add ID properties to the HSDS schemas. This idea came to me while I was looking through the changelog for the 3.1.1 Profile and noticed that there were several copies of the JSON schema files without a clear way to identify the latest or standard versions.
I believe this is a crucial step for two main reasons.
Managing Schema Versions
Adding a unique ID to each schema would solve the versioning problem by removing ambiguity. Generating and including a new unique ID for each modification, assists to track changes.
I also suggest that profile tools could be modified to generate unique IDs when compiling a profile. This could include information about the profile owner, which would make it even easier to track the origin of a schema.
Centralized Schema Repository
The unique IDs would also make it easier to set up a central repository for generated schemas. Instead of sharing entire files, a URI (the $id property), points directly to the individual schema.
From my perspective, this would eliminate the need for us to store schema files locally. It would also reduce duplication and the potential for errors.
I think these two points, especially the logistics of implementation, are worth further discussion.
My specific reasons for wanting “a unique” $id value assigned to a generated schema is to simplify validation process.
An id for a schema is a URI which uniquely defines the schema. If there exists a central repositoy for the generated schemas, the URI becomes a URL which permits validation logic to remotely access the schema and avoids duplication without ther need to store json schema’s locally. You also remove the opportunity for variances in copies of copies of copies of the schema files. I appreciate that could include some very bespoke schema diverging from a base HSDS profile.
(anyone correct me if im spouting rubbish)
The validation engine, which I envision (I could be way off base), would potentially only require 2 URL values…the data feed and a schema version that it must satisfy.