Approaches to extending the HSDS data structure

At the initial working group meeting for the next version of the Open Referral data specification (HSDS) there was some discussion of approaches to extending HSDS for different use cases. We don’t want this to delay getting an upgrade of HSDS out, but ultimately we need to understand and agree on an approach.

Two approaches are:

  • Filter a wider (extended) core data structure in different ways for different applications
  • Extended a core HSDS in different ways for different applications

In the UK we took the second approach when developing Open Referral UK. HSDS 2 adopted some of our extensions but in a slightly different way, so UK and US/international structures have incompatibilities.

Work so far on aligning (see US and UK Alignment and version control from February 2022) takes the second approach of having an “extended” HSDS with filters for different application profiles.

I’m keen on the “filter an extended HSDS” approach because it allows for greater re-use of tools developed and shared, such as the Open Referral UK validator. It avoids the same properties being describned differently by different publishers.

However, it does not have to be an either/or approach. Individual publishers can add their own properties which the validator ignores as long as these properties don’t conflict with the core specification. Open API feeds can be examined so we can assess if properties added would be useful to add to the HSDS standard in future versions.

The HSDS “servce attributes” and “other attributes” also allow for extension by referencing taxonomy terms that describe attributes. For future versions, we’ve suggested that we consider adding an optional “value” field to attributes. Again attributes added can be reviewed to see if anything is worth adding as a property in its own right in future versions of HSDS.

Thanks for this, Mike.

I think what makes this hard is that HSDS is used for interoperability inside contexts where certain aspects have to work in a particular way, and that particular way might be different between contexts: it would be great to have the UK validator be re-usable by other OR contexts, but we should also recognise that some features (I can imagine UPRN checks or AIRS taxonomy term lookups, for example) will always be context-specific.

I think that we broadly align on the question of how far we expect interoperability: the conversation so far has been much more about tools than data, so I think that we can focus on that.

Unsurprisingly, I guess it then comes down to a question of governance: how will we decide what gets in? How will we decide which of several, potentially incompatible, approaches to modelling certain concepts we use?

I don’t think that either a filter or an extension approach alone solves this; I think we just have to work through the consequences of our choice.

the validator ignores as long as these properties don’t conflict with the core specification.

This is good behaviour to have, IMO, but it is different from the behaviour that someone would experience taking an extended HSDS datapackage (which is how we currently ship HSDS) and running it through the OKFN datapackage validator. In practice, I don’t think anyone does this, or wants this “closed” behaviour - but we should be mindful of making sure that our tools match up with our expectations of the standard.

Thanks Rob. I think our views are broadly aligned.

some features (I can imagine UPRN checks or AIRS taxonomy term lookups, for example) will always be context-specific

Yes. It’s a question of what goes in the standard, what goes in an application profile and what is left to local interpretation.

Regarding your specific examples:

  • Por20349 - HSDS - US and UK Row 88 proposes an “external_identifier” which UK guidance or a UK public sector application profile might require to be a UPRN (as mandated by government)
  • I think we all see Open Referral as taxonomy agnostic but again an application profile might mandate a specific taxonomy like AIRS or the LGA’s service types in specific cases. Certainly, UK users who want to combine feeds from multiple sources want consistent use of taxonomies across those feeds.

After a recent upgrade working group call, I wanted to write up here something that I mentioned in passing that I think is relevant. @skyleryoung may be particularly interested in this as well.

Fundamentally, all of what we’re talking about here is making sure that a particular bit of data (a file, API response, etc) is structured in a way that means that a particular application is able to use it. And, by “structure”, I mean field names, properties of contents, API methods, and more - anything that can be considered a container for the information that needs to be exchanged between the systems involved.

Standards like HSDS prescribe the ways in which certain concepts have to be modelled: a service has an id and a name and a status, and (even though it’s not in the schema), the standard says that the email field is for an email address. Someone putting a postal address in the email field wouldn’t be following the standard.

So far, so straightforward. But what about times when not everyone has the same needs of the data? That’s what we’re talking about in this thread: making it so that people who do have the same needs of the data are able to work together, without making the standard burdensome (or irrelevant) for everyone.

One approach is to put together a set of additional prescriptions: I might say that I need to know the fees for a service, and so I can only work with data where that field is filled in. If someone else has the same need, then we can agree to always use that field. We might even go further, and say “We offer HSDS-compilant data, always with fees”. Obviously, in practice, it’s likely to be a set of requirements, that we can bundle up and call an “extension” or a “profile”, or something. This approach is great when (and this list isn’t exhaustive):

  • there’s a clearly defined need, and agreement around what’s being described
  • there’s some level of coordination, and the opportunity for the creation of a new artefact
  • a particular approach is required for a particular context - as in the priorities of OR-UK

Another approach is to devise a framework by which data can be described. If I need data that has fees included, then I can check a bunch of data sources, and see if they include fees. The same holds if I need data that uses AIRS, or both AIRS and fees. I might even publish the results of my checking, so that anyone else in my situation can understand what data they can use.

We recently made a quality dashboard for 360Giving which is based on the idea that we can describe the qualities of data, rather than judge that data is “good quality” or “bad quality”. We’ve picked ~10 features of data that we think are useful to know about (such as grant duration, location codes, recent publication) and run a daily test of all the data in order to provide a report of what data has what qualities. This is intended to set out our idea of what’s useful in data (we chose qualities based on research) and provide a basis for potential data users to discover data that might be useful for them.
This approach is useful when (again, not exhaustive):

  • there’s no one clearly defined need, but several ideas of what’s useful in particular contexts
  • not all qualities are relevant to all data sources (e.g. an historic publication can never get a “recent publication” badge, and that’s ok, but means your data isn’t useful for a “this month in grantmaking” newsletter)
  • there’s little use for a new artefact

These approaches are, of course, not mutually exclusive - but I think could help inform our discussion on how we proceed with extending/constraining/profiling HSDS.

In practical terms, the approaches do converge, so we don’t have to bake in a choice, right now and for all time.

Thanks for the thoughtful reply @robredpath. I always learn a lot listening to you (so to speak).

One of the principles I’ve been operating by is that the broadest and most useful interoperability exists in the core specification, which should never be violated by extensions (or filters or what have you).

Doubtless, my opinion is informed largely by my experience.

For example:

My clients, the 211s, are targeting two objectives relevant to this topic:

  1. breaking down data silos to cooperatively maintain data with other organizations (who don’t necessarily use the AIRs Taxonomy), and
  2. adding additional fields to the taxonomy_term table so that they can take advantage of the full feature set inside AIRS Taxonomy.

My observation is that they can accomplish both.

For the first case, taxonomy_term fields of code and term in the core spec are more than adequate for handling the basic function of AIRS Taxonomy (Connect 211’s whole taxonomy search paradigm runs off just those to fields at the moment). But, importantly, cooperating organizations are more concerned about sharing contact information, addresses, services, and other nuts-and-bolts data than they are about using the AIRS Taxonomy.

For the second case, a 211 Application Profile can add all of the additional AIRS Taxonomy fields that open up a world of possibilities for 211 apps. As long as the extension doesn’t violate core, they can have the best of both worlds.

I recognize that this all get’s more complicated if we start down the path of sharing, or enforcing, standards around eligibility criteria, etc. However, at least in the USA, we are far from consensus on what standardized eligibility criteria should be, which is why I’ve been advocating not for standards around eligibility criteria, but rather standards around how to structure eligibility criteria, if possible. I certainly cannot enforce which age ranges should be used universally, but I can enforce how any given set of age ranges is stored in HSDS. Hopefully.

I’ll stop there lest I start rambling. I do want to note that I’m very intrigued by your model for assessing qualities vs “quality”. I think that’s brilliant. We’re potentially diving into that topic at an upcoming hackathon for Washington State data collaboration.

I’ll close with a question: are we in this group sharing the assumption that extensions should not alter the core spec in any way?

Thanks @robredpath. Interestingly, in the UK meeting to discuss use cases, the desire to see which records met a quality threshold came up.

I think the profiles we have in mind can address both the prescriptions and the descriptions you describe, by applying tooling differently. Assuming a validator checks compliance with the core HSDS (as reflected in API query responses), then you use the profile either to validate to accept/reject records or to assess the quality of data (and maybe define a threshold for using records (e.g. in an aggregator)).

Yes. I think we’re saying that application profiles constrain the standard and extensions extend it, but neither violates it.

I’ve spent this afternoon down a rabbit-hole, but I think I’ve got my own thinking straight on this after a discussion with @davidraznick and a lot of time to think.

As I see it, HSDS describes how data about services (and related concepts, such as organisations and locations) can be structured. If structured in the way that HSDS describes, organisations can publish information about those services; this might be in bulk, or on a row-by-row basis via an API.

Sometimes, an organisation or community wants to describe something that HSDS doesn’t provide the language for; this could be an entirely new concept, or additional information about an existing concept.

In other cases, an organisation or community finds that, in order to be useful, fields that are optional in HSDS are essential for them, or that fields need to use a specific format in order to be useful. For example, OR-UK makes the optional phones.contact_id field mandatory, and we’ve discussed in workgroup meetings how it would be helpful to add constraints around which taxonomies are used.

These are the kinds of changes to the standard that are often described as “extensions” and “profiles”. However, these terms are used inconsistently, which doesn’t help our conversations.

Right now, there are a limited number of these sets of changes: there’s OR-UK, and our recent work on FHIR interoperability. We’ve discussed potential for some more - such as a more ambitious baseline quality standard, a profile for 211s, and something for AIRS users.

Each of these has particular characteristics that mean that it differs from the others, but we can look for groupings that help us to reason about them and work out how to support them.

One thing that I think is true of all of these cases is that they all add additional constraints. Some - but not all - also introduce new fields or objects.

In other standards, the term “extension” is usually used to describe the addition of fields: in FHIR for example, “Every element in a resource can have extension child elements to represent additional information”. OCDS extensions are - in the main - similar; they add new functionality to OCDS without taking anything away. Extensions are often modular and composable: someone can take one or more extensions, combine them and build on them to create a new extension, and put that into use.

What constitutes a “profile” is more ambiguous; in FHIR a profile is “A set of constraints on a resource”, whereas OCDS profiles can be far-reaching, including extensions, new constraints, and additional documentation.

Other terms in this space include “application profile” and “Implementation Guides”, which also bring together the concepts described above.

I think that we need a definition for HSDS that helps us make sure that we’re all talking about the same thing.

I think that there are three components to this:

  • New objects, concepts and fields
  • New constraints - such as required fields, data formats or allowable taxonomy values
  • Guidance and documentation: help and advice tailored to specific audiences, which might speak specifically to the new additions or constraints, or might be addressing common issues among the target audience with HSDS implementation in general.

I don’t think that we need to separate these out, in the way that other standards do. The kinds of uses that we’re talking about don’t build on each other in the way that other standards’ extensions do, even if they might be applicable at the same time. And, we’re talking about less than 10 such artefacts in the foreseeable future, rather than dozens from the off.

So, here’s my proposal:

First, we define a “profile” as being a set of one or more of:

  • New objects, concepts and fields
  • New constraints
  • New guidance and documentation relating to the additions above
  • New or altered guidance and documentation relating to HSDS implementation

…with five rules:

  • Nothing a profile does can stop data being valid HSDS against the base schema
  • Wherever possible, existing HSDS fields and objects must be used
  • Wherever possible, existing HSDS patterns should be followed - or new patterns should be discussed with the community before adoption
  • Profiles must be clear about how they are governed: either through the Open Referral governance process, or by the community that uses it
  • Profiles must make it clear how they differ from base HSDS

Second, we define some more terms:

To “extend” the standard is to add new objects, concepts and fields to it.

To “constrain” the standard is to add new rules to it.

To use a profile “descriptively” is to use it to describe the properties of the data.

To use a profile “prescriptively” is to constrain “valid” data in a particular context to only data which uses a particular profile

Third, we agree to reserve the term “extension” for future use: we may want to build the kind of modular system that we see elsewhere, and it would be helpful not to already have used the word.

Finally, we build some tech to support all of this:

  • Space across the docs site for profiles to draw attention to the relevance of the documentation for the profile
  • Space within the docs site to document the profiles completely
  • A way to compare the schema for a profile with the base schema and to visualise the results
  • A pattern for extensions to follow, so that it’s easy to see examples and understand what each extension is doing.

Technologically speaking, we’re already a lot of the way there. HSDS 3.0 is encoded as a directory of JSON files, one per object. Profiles can be created as a new directory, containing the files of any objects that have been changed. We can add in folders for changes to taxonomies and codelists, and incorporate any reconciliation necessary into the build process.

Documentation changes can either be handled manually, or we can look at building a standardised way for profiles to inject content across the docs site; that’s quite straightforward. We’ll encourage developers to document their work through a comment mechanism, as well.

My apologies for a 1000-word update to a 6-month-old forum post! But, I think this is important for us to work out. I’d appreciate any reactions, responses or reckons - if we need to discuss this on a future call then of course I’m very happy to, but also it might be that I’m just saying what you all already think, in which case, thank you for indulging me.

2 Likes

@robredpath This is an outstanding summary, thanks for much for the hard work of defining these terms making them comprehensible.

I particularly like and support your recommendation to reserve the keyword “extension” for future use, and take it out of present circulation.

I support this as a draft for profile specifications.

1 Like