Our review of the HSDS Docs is available for comment

Hi folks,

As indicated in the Technical Working group meeting, I’ve recently completed a review of the HSDS documentation which has resulted in reorganising and expanding the existing Human Services Data Spec and HSDS Implementation Guidance sections. The first draft is currently staged on a branch of the docs available here. There is an accompanying Pull Request here

We are seeking feedback from the community to ensure that this represents what the community wants from the HSDS Documentation. We won’t merge anything in until we’ve resolved any blocking issues you raise, or have agreed that if something is “safe enough” for now then we can revisit it in a subsequent update.

The review was originally instigated to respond to several disparate issues raised by the community, such as explicating on 1:1 or 1:many relationships between tables, table formatting in the schedules section, clarifying the foreign key fields, and consolidating the advice on Profiles and “Extending HSDS” which was emergent from the recent Profiles Documentation.

Tackling these in isolation revealed underlying issues with the structures of the docs for 3.0. Most notably, there was a lot of guidance which was now invalid because it referenced tables which no longer exist or are handled differently. A lot of the language across the docs also referred to “tables”, whereas JSON is now the canonical format of HSDS, with Tabular Data Packages being a supported serialization.

In addition to this, there was a lot about HSDS which didn’t appear to be written down. This made it difficult to frame certain things; how can we explain what “Extending HSDS” means, without defining what it means to be conformant with HSDS in the first place? In order to address this, I formalised what I believed to be our shared implicit understanding of the way HSDS works. This is obviously open to correction! One of the benefits of writing these down is we get to question our assumptions and refine them.

As part of this, I also took the opportunity to refine some of our examples to ensure that they are presented in the canonical JSON form of HSDS. For most of these, there is also a Tabular Data Package example provided as well so that we continue to support people using this serialization.

The result at this stage is the start of what I hope becomes a more comprehensive set of technical documentation for HSDS and its community. As noted, there is plenty of room for further adjustment based on your suggestions, feedback, and concerns if you have them.

And just for very explicit clarity; there is very little content that was dropped entirely in this review. Essentially, the only thing which was omitted was the Tables to Fields Transformation section, because it didn’t seem to fit the model of HSDS 3.0 at all and thus shouldn’t be encouraged. There was a few repeated instances of “Sharing with the community”, which have been consolidated. If the community feel strongly to the contrary, I hope we can engage productively on understanding the mechanisms behind this feature and refactoring the guidance to fit alongside HSDS 3.0.

A summary of changes encapsulated in the PR:

  • There is now a clearly defined normative reference section which consolidates the reference material for HSDS.
  • Language has been brought in line with the HSDS schemas. Rather than “table”, we say “object” now when discussing the canonical HSDS models. In some cases, we still refer to table when making a comparison with or describing the tabular representations of HSDS.
  • Content such as the “Logical Model” has been formalised as part of the overview and model now contained in the reference section.
  • Assuming that HSDS and the community are still happy with the language of “core tables”, there’s now sub-sections in the Schema Reference page for “Core Objects” and “Other Objects” to make this consistent with language used in the ERD and models.
  • Identifiers guidance has been formalised as part of the reference section, and greatly expanded to provide more insight as to how identifiers should be used in HSDS.
  • Expanded documentation regarding the Page schema provided by the API documentation.
  • Where possible, there’s linking out to appropriate external documentation most notably various IETF RFCs. This is to reduce ambiguity.
  • Where appropriate, introduced some language from RFC 2199 in some normative documentation. Because of community feedback from previous engagement, this has been limited in scope to where I felt it was strictly necessary to provide an unambiguous reference.
  • Defined what it means to be conformant to HSDS by providing a conformance page, outlining the high-level conformance rules for HSDS.
  • Formalised a Profile specification, meaning that it can be de-coupled from the current implementation via HSDS Schema Tools + the example profile repository should the community wish to replace these or provide alternative implementations. This also enables us to define conformance to HSDS in terms of a Profile.
  • Removed the UK Compliance page and placed a link to the OR UK Profile docs in the Known Profiles section of “Using Profiles”. (Note: I am keen to hear where people actually want to have links to Profiles so they’re not hidden away!)
  • Removed the references to UK Compliance in the API Reference page
  • Removed the admonitions in the API reference page and replaced it with explicit REQUIRED and OPTIONAL language derived from RFC 2119.
  • Expanded the JSON section of the Publication Formats page to state that JSON should be de-referenced where possible.
  • Refactored the Publication Formats page as a reference material using the language of serialization. This enables us to explicitly state the canonical form of HSDS 3.0 is JSON but that the Tabular Data Package format is a supported serialization.
  • Refactored the Publication Guidance page by splitting it up. There’s now an explicit “Mapping Data to HSDS” page containing updated guidance on mapping data sources to HSDS, updated guidance on “Extending HSDS” in friendly terms which builds on the rules set out in the Conformance section.
  • Removed the “Profiles, Variations, and Interoperability” page and refactored the content. Removed the Tables to fields guidance, formalised the definition of a Profile in the Profiles Reference, and moved the guidance to a new “Using Profiles” guidance page.
  • Refactored the guidance on “Schedules”, “Classifications and Taxonomies”, and “Names and Descriptions” to update it in line with HSDS 3.0 and merged the guidance into a new top-level “Field and Object Guidance” page.
  • Worked examples on all of these were brought in line with HSDS 3.0, and are now provided as both JSON examples as well as Tabular Data Package examples.
  • As part of the above, fixed the errors with the tables being way too long.
  • Under-the-hood, made it easier to manage worked examples by storing them in a figures directory inside the docs directory. This means that there’s no more nasty inline markdown tables you have to edit in Vim, you can define the examples in individual JSON and CSV files and then import them into the docs.
1 Like

This looks awesome. Eager to hear what others think.

Also, note that we’ve flagged the Use Cases page in particular as guidance content that could be improved. I’d especially welcome input on how that can be made more useful, right now it’s very bare bones. We’re looking for content to describe use cases for HSDS… perhaps we should start working with content from this doc that @MikeThacker initially drafted at the start of the HSDS upgrade process.

1 Like

@bloom

We have this in the project documentation section.

We should probably review and reconcile the two sections.

Dan

Hi all,

We’ve not had any comments or feedback on the proposed updates to the docs following this review as of yet.

There are some quite quite substantial changes here so I just wanted to give a bit of a nudge and some further time to people that may not have had chance to take look over this yet.

Baring any major objections, we plan to merge the changes Friday afternoon this week.

Here are links to the current and proposed versions for comparison:

Thanks,

Dan

I usually refer to the Entity Relation Diagram which is here in the Current documentation and here in the UK profile, but I can’t see it in the proposed documentation. I’m not sure if this is because the single source of truth is now JSON objects/classes rather than a tabular data structure.

Also, as I noted in this GitHub issue, I’d love to see “crows’ feet” on lines in the ERD denoting the direction of one-to-many relationships.

Thanks for all your hard work on this.

1 Like

Thanks for your comments Mike, it’s greatly appreciated that you’ve gotten involved in the review process.

The ERDs are still there, but they’re now on the overview and model page in their own section :slight_smile:

You’re correct that they’re slightly de-emphasised due to the change of the SSOT. The fact is, the ERD diagrams are generated from the datapackage.json file, so represent the tabular serialization rather than the dereferenced JSON model, which is now the canonical SSOT due to the JSON schemas in Github. However they won’t disappear because they are inherently useful for people trying to understand how all the parts fit together.

In the short-term, I think we can find a new home for the ERD diagrams so that it’s easier to find them. I believe the best place for them would be a sub-section of Serialization and Publication Formats underneath “Tabular Data Package”. Because that’s ultimately what they relate to. They really were only on the Overview and Model page because of the legacy “Logical Model” section which I believed people would need a substitute for! Does this work for you? An alternative would be to move them into their own page in the “Guidance” section, with some lightweight copy describing how the tabular representation differs from the JSON Schema.

Greg has raised some other points on the Overview and Model page which pick at some of the underlying contradictions in the docs I’ve hinted at here. These tie in with this issue a bit. I’ll write this up and create a forum post or Github issue for community discussion, and make sure you’re tagged in it :slight_smile:

Eventually I think it’s worthwhile scoping generating (or manually creating) a comparable diagram directly from the JSON Schema, but this would obviously take effort and time which could be directed elsewhere in the short-term.

As for the crow’s feet, we can investigate looking into whether the library we use can support this. I seem to remember that it didn’t support labelling the edges as “1:n” etc. If the library supports this, we can certainly adjust the generation of the diagram. This should obviously be taken with the context that this is primarily for the Tabular format at the moment, so we might want to consider directing effort towards getting a good diagram for the official JSON Schemas first.

1 Like