HSDS Profile Generation - to compile or not to compile?

jeffc · December 2, 2025, 6:27pm

Thank you to Matt for running the walkthrough on the schema generation tooling recently. I usually harass Matt with my rants, he grabs his keyboard, opens up Vim, turns the lights off, and responds…

It seems an appropriate time to free Matt from his fear of opening his email. Or this might make it worse.

My question is…WHY DO WE COMPILE PROFILES?

Generating a profile differing from the standard HSDS offering, involves creating a set of json files which are then merged into the base schema; References are resolved and the output is a set of schema meeting your requirements. Personally I find a partial schema file, just doesn’t sit well; It keeps me up at night.

Be nice…This is one of my first open contributions to the forum. I very well may have everything completely wrong.

My thoughts would benefit from a canonical location for schema storage. For example, should there be a repository on Github which was dedicated to the set of json schema, without additional source code, it could be forked, branched, and referenced accordingly; all neat, tidy, and succinct!

To build a set of schema using the current tooling the developer still has to go through all of the schema files, decide what to include, what to omit, and where to extend…and produce a set of (almost) json schema files.

I am happy to be corrected, but wouldn’t it be easier and more succinct if a repo dedicated to the base schema, branched by versions was available. Developers creating new profiles, would fork the repo, and branch their fork based on version numbers going forward. They then edit the forked branched files, deleting and adding to the actual json files. If the new feed opted not to extend or modify a particular schema you can use $ref to directly reference the base file, or one in any other published repo. Branches and forks can be merged, rebased, compared, etc. fairly easily. One repo could even reference a particular file, in another repo, at a particular point in time by branch, or even down to individual commit level.

As a general rule all changes require a new branch, hence a new version, perhaps adding “b” (for beta) until the branch is agreed published and will not be changed (unless an increase in minor version number).

One Caveat that I discovered is that Github will rate limit you if you attempt to reference its raw files in somewhat rapid succession. I do not know what these limits are or if there is a way of increasing them. To be fair I was hammering the validation logic; I was running a few dozen asynchronous validation requests to see what would happen. Github gave me 404s. One option might be to cache json refs.

I am thinking this would result in smaller, and fewer schema files and inter-dependant but controlled. The schemas would use $ref to other schemas where appropriate and be more normalised than compiling all dependancies.

The tooling for Open Referral UK does (will shortly) resolve all external dependancies at run time, not requiring them to be compiled into the json schema. The performance hit doing it this way is negligible.

Therefore, and I know it will end up being Matt answering this (better be using Vim)…

Do we need to pre-compile schema’s?
Do we need a canonical, central repo dedicated to schema storage?

Topic		Replies	Views
Should we add `$id` properties to the HSDS Schemas? Technical	3	53	November 27, 2025
How far do we expect HSDS resources to be standalone and re-usable? Technical	1	276	July 4, 2022
Closer alignment between international and UK data structures Technical international-alignm	4	877	February 10, 2022
Approaches to extending the HSDS data structure Technical	8	542	January 25, 2023
First draft of HSDS 3.0 proposal Technical	8	479	November 2, 2022

HSDS Profile Generation - to compile or not to compile?

Related topics