The current UK Validator of API feeds checks for compliance of lists of services, organizations etc and then checks 10 (or fewer if there are not 10) random records (e.g. 10 services) for compliance of detailed data.
There are cases where some/most records pass validation but some do not. In such cases the feeds are normally added to the Directory of feeds but may show as failing validation in the Dashboard if the last routine check randomly picked failing records.
I’d appreciate discussion as to how we address this - possibly in future work on the validator performed in the UK or elsewhere.
According to @jeffc who works on the UK validator, a full test of every record might take 40 minutes or more, depending on the size of the feed. Hence we’d be likely to need an asynchronous way of testing and advising a publisher the results rater than them waiting for a response within a web page.
Alternatively we could ask publishers to do quick checks (as now) but an administrator to do a full check before adding a feed to the Directory and/or Dashboard.
A shorter term response might be just to add a note to the Dashboard saying that some feeds occasionally fail because randomly checked records do not fully comply with the standard although most do.
My measurements are based on my development environment which is a Macbook Pro 16 M4 Apple Silicon 64GB Ram. My internet connection is achieving 150mbps download. I understand that these may or not be representative.
Performance could vary depending on the server size and scalability. I think these results should be somewhat indicitive of “out in the wild”. The issue is the same.
If I said 40 minutes, it was a typo. The dataset used for testing returned 1914 services. Including response time and validation time, it took 0.39 seconds per service. If I validated every service result synchronously, that would take 12.34 minutes.
You could take advantage of greater asynchronous processing methods. I do run multiple validation tasks concurrently tasks to increase performance. I found it to have marginal improvement when run on the same feed. You become dependant on the load on the provider server, which I am still convinced, in some cases is a raspberry pi or a commodore 64.
One option I thought of is to test a subset number of services; this could be configurable with an option, as could a percentage of the results that must pass all validation requirements and if within these parameters any minority failures resulted in a warning only. The feed would report as a pass but a notification would be returned including the warnings.
Yes. This pertains to rows 3 (Ref ID 673) and 9 (Ref ID 325) of the spreadsheet Open Referral Tools.
I’m not sure how to update it, but the main point is that a publisher or consumer probably wants to check the validity of every record (hence a full validation) whereas a simple quick validation using a sample of records is probably useful as a first step and for people wanting to see who has an API that is largely compliant.