How would you use a hosted validator?

With the advent of Jeff’s re-working of the ORUK validator to be more flexible and suitable for multiple audiences, we’ve started discussion about what it looks like to have a hosted instance of the validator for the community.

It’d be good to get some ideas on what people’s needs are.

As far as I can see, there are two initial dimensions to understand (although please contribute additional things!!!)

  1. How adhoc vs how systematic would you be wanting to use a hosted validator? (as opposed to hosting your own instance)
  2. Are people wanting a web frontend for a hosted validator or are you happier interacting with its API? (the ORUK validator currently is back-end only, with a “bring your own front-end” approach.)

So here are my questions to the community; but again, if you think there’s something else to say about this topic then say it! These questions are just a starting point.

This obviously has links to some things we’ve covered in the User Stories around validation. This thread will be used to refine or add to the user stories, but since we’re discussing the possibility of a specific instance of a hosted community validator, we’re mostly focusing on this.

Q1: By what interface/mechanism are you likely to use the validator.

e.g.

  • “I’ll send my data directly to the API on an adhoc basis via something
    like cURL and analyse/display the results myself on my server/computer”
  • “I want a web interface to input my API endpoint details, and to
    display results to support debugging”
  • “I want to use the hosted validator as part of my production pipeline,
    so I will be systematically making calls to its API via a software library”

etc.

Q2: If you want a web front-end, do you have specific needs for the
interface?

  • “There must be a clear way to input security details to access my API
    data”
  • “I just want to put my root URL in a text box and hit a button and
    then get my results”

etc.

Misc

If you’re planning on interacting with the API of the hosted validator programmatically, it’d be good to have some more details about this. What languages or environments are important? Do you prefer to use existing frameworks to do this, or would you like to see something that abstracts that away.

e.g. a Python developer might plan to take data from their Django web application and transform it into HSDS, and therefore might envision checking it with the validator at some point in its lifecycle. They might prefer to use the hosted validator with the popular requests library by just plugging in its details, or they might want an “official” library that handles taking data and passing it to the hosted intance of the validator and passing out the results; so they don’t need to manage access themselves.

Hello Matt. This is just my UK perspective, but of course we already have a hosted validator, so your questions apply to others who may or may not be in the same situation.

Essentially, I want to avoid developers/suppliers who publish compliant APIs “marking their own homework”. Many say they are compliant or “influenced by the standard” but their feeds don’t pass validation and so can’t be processed by others without extra work (if at all).

A hosted API allows anyone to check an API feed without having to develop and host their own software. The simplest example is that of an organisation commissioning a valid feed and passing the feed’s API endpoint through the validator before paying the supplier. An open validator that can be used equally by developers and commissioners removes any ambiguity on what a data feed should look like.

I think you know my opinions on this, and I have shared with you my own deployment of the complete validation process in AWS Lambda.

The validation possesses a Swagger UI, so basically it does have a documented UI that you can plug a URL into and validate it against known HSDS profiles. It won’t accept validation against (any old) json schema, or schema versions which are not pe-configured. I think that is the right approach.

I agree with Mike’s comments on “:marking their own homework”, which is why you should let them mark their own homework…but then validate against cached known HSDS profiles.

I have NOT tried building this into a library, but I don’t see how that would work well anyway as you need to pre-load all compliant profiles and cache them (IMHO). It works best as a hosted managed service tool.

By the way…

I worked through the memory issue which I have previously mentioned, implemented validation while streaming from the source instead of loading the entire dasta response into memory.

The results are promising.

I made some changes on how memory is allocated and ran an automated set of 10 validation request to the same source, sequentially.. The system is no longer “leaking” memory until it crashes (Good); instead, it’s staying within a predictable, healthy range.

The highlights: (from AI)

  • No more “runaway” growth: Instead of the memory usage climbing forever, it now levels off between 1.83 GB and 1.87 GB. (This is mostly cached, fully resolved and hydrated schema)

  • Faster performance: Once the system “warms up” after the first task, it’s running much faster—dropping from about 13 seconds down to 5 seconds per run. (because known schemas are cached in memory)

  • Self-cleaning works: The system is now successfully cleaning up after itself. We can see it reclaiming space after big tasks, which it wasn’t doing properly before. (oops….this was the big one that hit 9GB…SORTED)

  • Stable under pressure: After the first few minutes, the memory stays very flat, which means the “stress” on the system has been significantly reduced.

Incidentally, the Validation source can be deployed on an environment with 512MB storage and 2GB ram. I think I worked out a hosted validation on AWS Lambda would typically cost around $15 a month. There are other more efficient models that might prove more cost effective. I like Lambda as you have control over the environment. I have not deployed this on other environments.

(this is not the UK validator. It’s a fork. Through configuration it supports the same validation and output as the UK validator)

1 Like