Technical Report on the ORUK Validator

Hi all,

As I’ve mentioned, I’ve been investigating the ORUK Validator to see how much work it would take to adapt it to general use with “vanilla” HSDS or other Profiles.

To that end, I’ve produced a short Technical Report outlining my findings and recommendations:

The report covers my understanding of the shape of the validator and its capabilities, as well as my recommendation.

In summary:

  • You can deploy/run the back-end validation service independantly of the dashboard, and use it to validate an API feed. It’s required that the API feed is open, or otherwise you need to do some network-fu to host the validator in a location where it can access the URLs it needs to validate without any authentication
  • Schemas need to be manually loaded in, so you can load in copies of the HSDS Schemas in place of the current ORUK schemas to “trick” it to validate vanilla HSDS feeds, or your own profile.
  • It looks like it covers 11 out of 22 use cases from the use case spreadsheet, with some caveats.
  • Purely out of pragmatism, I recommend that if we want a general purpose HSDS validator we should reimplement the validation logic as a re-usable library; as the ORUK Validator is designed to be a monolithic “application” rather than a reusable tool (not a value judgement!). This also side-steps some of the licensing issues with the ORUK Validator, and gives the community a choice to determine the tech stack based on its own needs (not that there’s anything explicitly wrong with the tech stack used in the ORUK Validator).

I aim to present some headlines from this at the next Technical Committee meeting, alongside the list of use cases not explicitly covered by the tool so that the committee and work to prioritise these remaining ones for any future validation tool development.

Cheers,
Matt

3 Likes

Wow. This is a great comprehensive report. I look forward to discussing at the next Standing Technical Meetup to which the current ORUK developer has been invited as an observer. He is new to ORUK work so he is still learning and will appreciate some of the comments about needing better documentation.

I doubt much thinking went into the choice of licence and expect the ORUK people would be very receptive to a suggestion to switch to a different licence.

My reading of the report is that we should modularise (decouple) aspects of functionality so they can be used independently. This certainly applies to use cases that are deliberately not addressed by the current UK work. We also need to soft-code choice of version and profile.

I personally (and this is not an official view from the UK team working for MHCLG) would like us to migrate to one code set with responsibility for different parts allotted to different people/teams - specifically to: OR International (currently supported by ODSC) and ORUK (currently supported by TPXimpact).

There are things (such as closed feeds) which the UK team is not keen to do, but others (such as logging a history of passes of results or gathering metrics - on the last_assessed field) which it might be keen on. It would be great if we could combine experience and expertise for one validator that supports all versions and profiles of HSDA from now.

1 Like

Thanks Mike! I really hope that it came across as a relatively neutral take, as I know a lot of work went into the validator and it is well implemented and very suited for the UK context for which it was designed :slight_smile:

He is new to ORUK work so he is still learning and will appreciate some of the comments about needing better documentation.

Definitely happy to pass on my reckons about documentation although I’ll try to make it clear that I think the urgency is lower than maybe came across in the report. Jeff was kind enough to respond to my questions very quickly, so I’m keen to make sure he doesn’t panic and there’s a desperate need for thorough docs yet.

I doubt much thinking went into the choice of licence and expect the ORUK people would be very receptive to a suggestion to switch to a different licence.

Good to hear! As part of writing this report, I ended up opening a Github issue to this effect (rather than be the guy who moans about things and doesn’t offer solutions…).

My reading of the report is that we should modularise (decouple) aspects of functionality so they can be used independently. This certainly applies to use cases that are deliberately not addressed by the current UK work. We also need to soft-code choice of version and profile.

I agree. This would go a very long way towards making this a core part of the community’s tooling ecosystem.

Soft-coding choice of version and Profile should be a goal for sure, but there remains open questions around how a validator should find/fetch schemas for versions and profiles. I think this is something for wider community infrastructure, rather than having the ORUK Validator have to tackle this independantly.

I personally (and this is not an official view from the UK team working for MHCLG) would like us to migrate to one code set with responsibility for different parts allotted to different people/teams - specifically to: OR International (currently supported by ODSC) and ORUK (currently supported by TPXimpact).

I think I understand and agree with this. Just to check, are you saying that e.g. we’d all be working within the same Github organisation to create things which are all labelled/“owned” by “Open Referral”, and then responsibility for different things is then devolved to teams such as OR/ODSC and ORUK/TPXimpact? Or have I misunderstood?

There are things (such as closed feeds) which the UK team is not keen to do, but others (such as logging a history of passes of results or gathering metrics - on the last_assessed field) which it might be keen on.

I think the beauty of having reusable and flexible components/modules is that the UK context can build applications which just ignore the notion of closed feeds, whereas other contexts can use the same validation components to validate local data or data behind closed APIs.

It would be great if we could combine experience and expertise for one validator that supports all versions and profiles of HSDA from now.

I’m keen to avoid us thinking of a validator as a single product; rather I want us to drill down into what “validation work” entails and build tools to address those needs. These can be recombined in multiple ways to support different communities. But yes, I envision a future where one of those combinations is a hosted validator website which works across all versions and profiles of HSDS/HSDA.

1 Like

Well it’s partly about who funds the work and partly about where the code goes. I could see UK work starting in a separate repository and then pushed to the main OR repository for acceptance there. I’m not the best person to saay how the mechanics should work.

Well it’s partly about who funds the work and partly about where the code goes. I could see UK work starting in a separate repository and then pushed to the main OR repository for acceptance there. I’m not the best person to saay how the mechanics should work.

Great, I think I understand this.

Open Contracting have a tools directory which they use to collate and present tools. It could be that as long as everyone does work in a space that they guarantee is publicly accessible, then we could do something similar suited to our needs.

1 Like

Yes, it woukld be good to have some kind of “App store” for tools that work wiith OR API feeds.

Hi Everyone,

Here is an update on the position of the OpenReferral UK Validator for the profiles. While my primary focus has been the UK requirements, I have tried to keep in mind, and satisfy where possible, international requirements too.

:round_pushpin: Consolidated UK Profiles

One step that we have taken, is to consolidate our schema files into a single location. Previously there were a “few” copies in various repositories, floating around Github. If you have a thumb drive with a UK profile on it, it’s out of date. :slight_smile:


:clipboard: Requirements Status Report

The requirements here, were supplied to me in October, by Matt Marshall.
Source: Requirement Spreadsheet

# Requirement Status Implementation Details
1 JSON Schema validation via HSDS :white_check_mark: IJsonValidatorService uses Newtonsoft.Json.Schema.
2 Endpoint discovery from root URL :white_check_mark: IOpenApiDiscoveryService parses the OpenAPI spec to extract paths.
3 Remote data retrieval via HTTP :white_check_mark: ExecuteHttpRequestAsync() uses injected HttpClient.
4 Determine appropriate schema :white_check_mark: Extracts version from feed; defaults to HSDS-UK-1 if missing.
5 Accept optional Auth tokens :white_check_mark: The validator accepts in the body of the request an datasource auth object with several options.
6 Use bearer tokens for requests :white_check_mark: As above.
7 Supply parameters to API (Testing) :warning: Currently limited to pagination parameters.
8 Validate results against parameters :warning: Currently limited to pagination validation.
9 Report HTTP status of endpoints :white_check_mark: Returns detailed HttpTestResult (StatusCode, errors, etc.).
10 Manual Profile override :white_check_mark: OpenApiSchemaUrl option allows users to skip auto-discovery. Also supports validation if the schema requires it.
11 JSON report of pass/fail fields :warning: Reports failures and full JSON, but does not list “passed” fields.
12 Detail reason for field failure :white_check_mark: ValidationError includes Path, Message, Code, and LineNumber.
13 Highlight undefined fields :cross_mark: Not yet flagging fields present in data but missing from schema.
14 Local/Community installation :white_check_mark: Containerized .NET application with Docker/Docker-Compose.


:magnifying_glass_tilted_left: Open Questions

Requirement 4: Schema Resolution (The “UK Problem”)

There remains an issue (within ORUK at least), that data feeds don’t always provide the correct URL to the deployed version of the schema. Version 1 feeds (UK), don’t necessarily provide any metadata identifying the version. With that in mind, I have (for now) had to assume HSDS-UK-1 if there is no direct specification. Most V3 feeds list the profile as “HSDS-UK-3.0”, making my job easier…except 1! I would like to resolve this, before I leave the project.

  • Do we expect the data feeds to identify the schema uri to validate against?
  • Would we trust it?
  • Proposed Resolution: Maintain an Enum or DB of known profiles while keeping the manual override option.
  • Questions: Do international feeds suffer from this?
  • Should we implement a “whitelist” of acceptable metadata URLs?

Requirement 5 & 6: Authentication

I have now satisfied this requirement, for both requests to the data source, and the schema. I added the schema authentication because our staging environment is not publicly accessible.

Requirement 11: Reporting “Passed” Fields

Currently, we only return failures.

  • Question: Would a list of “passed” fields add value, or would it just create unnecessary bulk in the report?

Requirement 13: Undefined Fields

  • Status: This is the only “Not Satisfied” item. I do not presently flag fields that exist in the payload but are absent from the schema definitions.


:rocket: Next Steps

The application contains a docker image with instructions in the wiki, which could be used for easy local deployment.

Post Request Body Structure

Bellow is the “current” body format of the post request to the validator.

{
  "openApiSchema": {
    "url": "string",
    "authentication": {
      "apiKey": "",
      "apiKeyHeader": "X-API-Key",
      "bearerToken": "",
      "basicAuth": {
        "username": "",
        "password": ""
      },
      "customHeaders": {
        "additionalProp1": "string",
        "additionalProp2": "string",
        "additionalProp3": "string"
      }
    }
  },
  "baseUrl": "string",
  "dataSourceAuth": {
    "apiKey": "",
    "apiKeyHeader": "X-API-Key",
    "bearerToken": "",
    "basicAuth": {
      "username": "",
      "password": ""
    },
    "customHeaders": {
      "additionalProp1": "string",
      "additionalProp2": "string",
      "additionalProp3": "string"
    }
  },
  "options": {
    "testEndpoints": true,
    "validateSpecification": true,
    "timeoutSeconds": 30,
    "maxConcurrentRequests": 5,
    "skipAuthentication": true,
    "testOptionalEndpoints": true,
    "treatOptionalEndpointsAsWarnings": true,
    "includeResponseBody": false,
    "includeTestResults": true,
    "returnRawResult": true
  }
}

Notes:

  • Leave returnRawResult: true. (I might rename it) False, with discard a lot of the analytics data and return a format expected by the ORUK Website.
  • If you pass multiple validation options in the request, they will all be set on the http request to either the data source or the schema.
  • I am going to remove skipAuthentication. It isn’t needed. If there are no values to set it doesn’t send them,.
  • IncludeResponseBody: If true this will return the full response received for every endpoint test request to the data source.

I hope this is heading in the right direction.

Regards

Jeff