Enhancing Open Referral UK API Validation: New Features and Architectural Progress

Hi everyone,

This will probably be my final update on the Open Referral API Validator as the project transitions to iStandUK. Keep an eye on the repository for an official URL

I wanted to share an update on the latest developments which I have been working on. I have been working on implementing several major architectural improvements and feature additions designed to deliver an API validation engine that is robust, modular, and insightful for developers. Before I hand this project to its new stewardship, I wanted to ensure it was something to be proud of. I am, and it is.

because OpenReferralUK is transfering stewardship to iStandUK, the deployed endpoint will be changing. I have been asked not to include it in this post. I will pass it to Greg should anyone want to take a look or test it.

What’s New in the Validator?

I have introduced several powerful metrics and features to help ensure your API feeds are both technically sound and developer-friendly:

Comprehensive Quality Scoring

To provide a clear, high-level overview of an API’s readiness, we’ve introduced a weighted Quality Score (0-100). This score is specifically designed to measure documentation completeness from a developer’s perspective. It aggregates four key pillars: Documentation Coverage (30%), Parameter Documentation (25%), Schema Documentation (25%), and Response Documentation (20%). By focusing on these areas, the score rewards APIs that provide the necessary context for integration, such as clear descriptions for every endpoint and well-defined data models. It serves as a single source of truth for stakeholders to understand how “developer-friendly” their Open Referral feed truly is. You will find further explanation in the wiki document linked in the title.

Quality Metrics: Diagnostic Insights into Schema Health

While the Quality Score provides the “grade,” our deeper Quality Metrics provide the “diagnostics.” These metrics are split into two categories: Documentation Completeness and Structural Health. The completeness metrics track granular details like the presence of request/response examples and summaries—features that help human developers get up to speed quickly. Meanwhile, the structural metrics analyze the “cleanliness” of the OpenAPI specification itself, measuring component reuse, reference resolution, and adherence to DRY (Don’t Repeat Yourself) principles. Together, these metrics allow you to distinguish between an API that is simply well-documented and one that is professionally engineered for long-term maintainability. I have included these features, with a view to meet future requirements.

Flexible Validation Options

Users can now toggle specific validation behaviours, such as whether to include response bodies in results, whether to test optional endpoints, and how to treat those optional endpoint results (as warnings vs. errors). See here for more details.

Report Additional Fields

To ensure complete contract fidelity, the validator includes a robust reportAdditionalFields option, which allows developers to identify “schema drift” within their data feeds. When this feature is enabled, the validator does not just check that the required fields are present; it actively flags any additional fields in the API response that have not been defined in the provided OpenAPI schema. This is a critical tool for maintaining high-quality Open Referral feeds, as it ensures that the API provides exactly what it claims to without exposing undocumented or extraneous data. By surfacing these discrepancies, the validator helps developers keep their documentation and their data sources in perfect sync, giving consumers greater confidence in the integrity of the HSDS implementation.

Robust Security & Auth

The validator provides flexible authentication options to ensure that both the OpenAPI schema discovery and live endpoint testing can be conducted securely. Users can specify authentication methods for two distinct layers: the openApiSchema layer, used for finding and reading the OpenAPI document, and the dataSourceAuth layer, used when calling live API endpoints during testing. The system supports a variety of standard authentication strategies, including API Keys (with customizable header names), Bearer Tokens, Basic Authentication (username and password), and Custom Headers. However, to maintain security, only one authentication method can be specified per object, and the server must be configured with AllowUserSuppliedAuth set to true for these user-provided credentials to be utilized. Additionally, for safety, authentication headers are only applied when the target URLs use the HTTPS protocol.

Schema Runtime Resolution

I have never made a secret of not being a fan of pre-compiling/resolving schema files. The validator features a sophisticated schema resolution engine designed to handle the complexities of modern, modular OpenAPI specifications. At its core, the system utilizes a dedicated ReferenceResolver to recursively navigate and resolve $ref pointers, ensuring that even deeply nested or distributed data models are correctly unified for validation. To support this, we implemented a RemoteSchemaLoader that securely fetches external schema files, applying necessary authentication headers and interacting with a multi-layered cache to optimize performance and reduce redundant network traffic. This architecture is built with high resilience in mind, featuring robust protections against circular references and SSRF-style URL attacks, ensuring that whether your schema is a single local file or a web of interconnected remote documents, the validator can resolve it into a consistent and actionable contract.

Embedded Security: Testing and CodeQL Analysis

To ensure the validator remains a secure and trusted tool for the Open Referral community, we have integrated a “Security-First” CI/CD pipeline. This goes beyond standard functional testing to include deep semantic analysis of the codebase.

  • Expanded Test Suite: Our recent architectural changes are backed by 243 passing automated tests. This includes new, targeted unit tests for security-critical helper classes. These tests specifically validate our defences against common API vulnerabilities, such as:
    • SSRF Protection: Strict URL scheme restrictions to block file://, ftp://, or data: requests.
    • Circular Reference Handling: Guarding against infinite loops in complex, deeply-nested OpenAPI $ref resolutions.
    • Authentication Hardening: Validating header sanitization and preventing the use of malformed or insecure credentials.
  • Static Analysis with CodeQL: We have implemented GitHub CodeQL as our primary Static Application Security Testing (SAST) engine. CodeQL treats our C# code as a queryable database, allowing us to perform deep semantic analysis that simple text-based scanners miss.
    • Data-Flow Analysis: We use CodeQL to track the flow of untrusted user input (such as provided URLs or authentication headers) to ensure it never reaches a “dangerous sink” without proper sanitization.
    • Vulnerability Detection: Our pipeline automatically scans every Pull Request for common weaknesses like SQL injection, cross-site scripting (XSS), and insecure cryptographic patterns.
    • Regression Prevention: By running these scans on every push, we ensure that new features don’t inadvertently introduce security regressions, maintaining the integrity of the validator as it transitions to its new home.

This rigorous testing and security framework provides a high degree of confidence that the validator is not only accurate in its assessments but is itself a hardened piece of infrastructure.

Roadmap:

From UK-Centric to Profile-Based Validation

The current version of the validator allows international usage by passing a specific schema URI within the options. However, my vision for the future focuses on a more flexible, trust-based model:

  • Schema Whitelisting: Instead of relying on a URI provided in the request, I envision a “whitelist” of known schema profiles. Feeds will identify which version/profile they adhere to, ensuring they meet recognized global standards.
  • Self-Exposed Schema Validation: I hope a model can emerge where the feed defines its own schema. That schema will be validated against our whitelist of authorized schemas, and then the feed itself will be validated against its own exposed OpenAPI spec.
  • Consumer Confidence: This approach will ensure that a feed exposes exactly what it claims to. It allows for local extensions or “additional functionality” to be added to HSDS while maintaining a verifiable core that consumers can trust.

Enhanced Parameter Testing and Validation

A key area for future growth is the implementation of more rigorous parameter testing. While the current validator does a good job of verifying core response structures, there is significant room to expand how it handles input parameters. I recently submitted a Pull Request to the HSDS specification to correct a formatting bug regarding page parameters; fixing these underlying specification issues is the first step toward building more intelligent validation logic.

Moving forward, I hope to see the validator extend its reach beyond basic pagination checks. By leveraging enhanced parameter definitions in the OpenAPI spec, the tool could proactively test:

  • Filter Logic: Ensuring that parameters like category, location, or taxonomy return logically consistent results.
  • Data Type Strictness: Verifying that the API correctly handles (and rejects) invalid data types or out-of-range values for custom query parameters.
  • HSDS-Specific Constraints: Automatically checking that mandatory HSDS search parameters behave according to the standard’s expectations.

Expanding into this “active” parameter testing would move the validator from a contract-checking tool to a full functional testing suite, ensuring that Open Referral feeds aren’t just syntactically correct, but operationally reliable.

I couldn’t write an update without…

The Future: AI-Augmented Semantic Validation

While our current refactors have made the validator technically robust, the next frontier is moving beyond “syntactic” correctness to “semantic” intelligence. By integrating Large Language Models (LLMs) and Machine Learning into the engine, we can solve several high-level challenges that traditional code-based rules struggle to address:

  • Intelligent Test Case Generation: Instead of static checks, an AI agent could analyze the OpenAPI specification to autonomously generate complex, multi-step test scenarios. This includes “agentic exploration”—where the AI attempts to find undocumented edge cases or logical inconsistencies by chaining API calls together in ways a human tester might not anticipate.
  • Semantic “Common Sense” Checks: Traditional validators can tell you if a field is a “string,” but they can’t tell you if the data inside makes sense. An AI layer could flag “impossible” data combinations—such as a service’s opening_time being set later than its closing_time, or a geographic_radius that is statistically improbable—adding a layer of “business logic validation” that ensures the feed reflects the real world.
  • Self-Healing Test Suites: One of the biggest pain points in API maintenance is “schema drift.” An AI-integrated validator could detect when a field name has changed (e.g., from phone_number to contact_telecom) and automatically suggest “healing” the test assertions or the consumer mapping, reducing manual maintenance by up to 70% (its a random number. other numbers are available).
  • Automated Quality Insights: Currently, our Quality Score is calculated via weighted formulas. An AI could provide qualitative feedback, acting as a “virtual developer” to explain why a certain documentation section is confusing or suggesting more descriptive summaries and examples based on best practices observed across thousands of other high-quality APIs.

By incorporating these AI capabilities, the Open Referral Validator wouldn’t just be a gatekeeper for the HSDS standard; it would become a proactive consultant, helping publishers improve their data quality and helping consumers integrate with unprecedented confidence.

(maybe I get over-excited by this stuff)
It’s been a pleasure,

Jeff

1 Like

Jeff, this is very exciting. I’m looking forward to hear what folks have to say about this, but in the meantime I’m really grateful for your enthusiasm and dedication to this project! Really appreciate your contribution here.

1 Like

We’ve entered a code freeze today ahead of our final testing and handover activities.

The last change being deployed contains the following performance and stability improvements.

What was added:

  • Persistent In-Memory Caching:
    Remote schemas are now stored at an application-level IMemoryCache. It (basically) reduces HTTP calls across different validation runs.

  • Startup Schema Warmup:
    Known schema’s are cached on start-up. Defaults are configured for the core schema files. It prevents delays on the first validation request.

  • Health Visibility:
    The health-check/live endpoint reports on the status of this warmup, making it much easier to monitor cache health in production.

  • Normalised Draft URLs & Upstream Confidence:
    Schema uris are not always consistent. The uri for the schema gets cached based on a normalised url to prevent duplication in the cache.

@jeffc , thanks for all the time you have put into this and the enthusiasm you have shown for a more robust validator that can be used in many situations. From our verbal conversation, here is my simplified understanding of your original message.

Functional changes

The ORUK validation and what it returns to the ORUK website remains unchanged.

You have added an additional endpoint to the validation API that will also validate the schema with which an API is said to conform. You have used this new functionality to validate the Open API schema against the ORUK JSON specification and of the full international specification. You have thereby found issues which you have reported back to Matt. Your references to Quality Scoring and Quality Metrics pertain to the quality of the specification that the validator is testing API endpoints against.

The logic behind these changes is that we can ask a publisher to fully document its API, including the parts which go beyond ORUK

You have also added the option to support authenticated feeds. Authentication is only applied if all of the following are satisfied:

  • AllowUserSuppliedAuth environment variable is true

  • The endpoint uses https

  • Exactly 1 authentication option is passed in the request (one of: username/password, bearer token, custom headers/api key)

Performance

You have added schema caching to radically improve the performance of validations of API endpoints.

Robustness

You have added checks which build in security, partly as a result of penetration tests that you have previously run.

Future enhancements

You have suggested further changes to improve texting of the correctness of API feeds and whether the data appears to make sense. You have introduced an “experimental” code branch to test some changes.

As an international group, we need to decide if we manage the validator as one set of code with people such as yourself offering changes for acceptance though appropriate procedures or if ODSC maintains a branch for non-UK purposes. It would be nice if we could achieve the former.