Hi everyone,
This will probably be my final update on the Open Referral API Validator as the project transitions to iStandUK. Keep an eye on the repository for an official URL
I wanted to share an update on the latest developments which I have been working on. I have been working on implementing several major architectural improvements and feature additions designed to deliver an API validation engine that is robust, modular, and insightful for developers. Before I hand this project to its new stewardship, I wanted to ensure it was something to be proud of. I am, and it is.
because OpenReferralUK is transfering stewardship to iStandUK, the deployed endpoint will be changing. I have been asked not to include it in this post. I will pass it to Greg should anyone want to take a look or test it.
What’s New in the Validator?
I have introduced several powerful metrics and features to help ensure your API feeds are both technically sound and developer-friendly:
Comprehensive Quality Scoring
To provide a clear, high-level overview of an API’s readiness, we’ve introduced a weighted Quality Score (0-100). This score is specifically designed to measure documentation completeness from a developer’s perspective. It aggregates four key pillars: Documentation Coverage (30%), Parameter Documentation (25%), Schema Documentation (25%), and Response Documentation (20%). By focusing on these areas, the score rewards APIs that provide the necessary context for integration, such as clear descriptions for every endpoint and well-defined data models. It serves as a single source of truth for stakeholders to understand how “developer-friendly” their Open Referral feed truly is. You will find further explanation in the wiki document linked in the title.
Quality Metrics: Diagnostic Insights into Schema Health
While the Quality Score provides the “grade,” our deeper Quality Metrics provide the “diagnostics.” These metrics are split into two categories: Documentation Completeness and Structural Health. The completeness metrics track granular details like the presence of request/response examples and summaries—features that help human developers get up to speed quickly. Meanwhile, the structural metrics analyze the “cleanliness” of the OpenAPI specification itself, measuring component reuse, reference resolution, and adherence to DRY (Don’t Repeat Yourself) principles. Together, these metrics allow you to distinguish between an API that is simply well-documented and one that is professionally engineered for long-term maintainability. I have included these features, with a view to meet future requirements.
Flexible Validation Options
Users can now toggle specific validation behaviours, such as whether to include response bodies in results, whether to test optional endpoints, and how to treat those optional endpoint results (as warnings vs. errors). See here for more details.
Report Additional Fields
To ensure complete contract fidelity, the validator includes a robust reportAdditionalFields option, which allows developers to identify “schema drift” within their data feeds. When this feature is enabled, the validator does not just check that the required fields are present; it actively flags any additional fields in the API response that have not been defined in the provided OpenAPI schema. This is a critical tool for maintaining high-quality Open Referral feeds, as it ensures that the API provides exactly what it claims to without exposing undocumented or extraneous data. By surfacing these discrepancies, the validator helps developers keep their documentation and their data sources in perfect sync, giving consumers greater confidence in the integrity of the HSDS implementation.
Robust Security & Auth
The validator provides flexible authentication options to ensure that both the OpenAPI schema discovery and live endpoint testing can be conducted securely. Users can specify authentication methods for two distinct layers: the openApiSchema layer, used for finding and reading the OpenAPI document, and the dataSourceAuth layer, used when calling live API endpoints during testing. The system supports a variety of standard authentication strategies, including API Keys (with customizable header names), Bearer Tokens, Basic Authentication (username and password), and Custom Headers. However, to maintain security, only one authentication method can be specified per object, and the server must be configured with AllowUserSuppliedAuth set to true for these user-provided credentials to be utilized. Additionally, for safety, authentication headers are only applied when the target URLs use the HTTPS protocol.
Schema Runtime Resolution
I have never made a secret of not being a fan of pre-compiling/resolving schema files. The validator features a sophisticated schema resolution engine designed to handle the complexities of modern, modular OpenAPI specifications. At its core, the system utilizes a dedicated ReferenceResolver to recursively navigate and resolve $ref pointers, ensuring that even deeply nested or distributed data models are correctly unified for validation. To support this, we implemented a RemoteSchemaLoader that securely fetches external schema files, applying necessary authentication headers and interacting with a multi-layered cache to optimize performance and reduce redundant network traffic. This architecture is built with high resilience in mind, featuring robust protections against circular references and SSRF-style URL attacks, ensuring that whether your schema is a single local file or a web of interconnected remote documents, the validator can resolve it into a consistent and actionable contract.
Embedded Security: Testing and CodeQL Analysis
To ensure the validator remains a secure and trusted tool for the Open Referral community, we have integrated a “Security-First” CI/CD pipeline. This goes beyond standard functional testing to include deep semantic analysis of the codebase.
- Expanded Test Suite: Our recent architectural changes are backed by 243 passing automated tests. This includes new, targeted unit tests for security-critical helper classes. These tests specifically validate our defences against common API vulnerabilities, such as:
- SSRF Protection: Strict URL scheme restrictions to block
file://,ftp://, ordata:requests. - Circular Reference Handling: Guarding against infinite loops in complex, deeply-nested OpenAPI
$refresolutions. - Authentication Hardening: Validating header sanitization and preventing the use of malformed or insecure credentials.
- SSRF Protection: Strict URL scheme restrictions to block
- Static Analysis with CodeQL: We have implemented GitHub CodeQL as our primary Static Application Security Testing (SAST) engine. CodeQL treats our C# code as a queryable database, allowing us to perform deep semantic analysis that simple text-based scanners miss.
- Data-Flow Analysis: We use CodeQL to track the flow of untrusted user input (such as provided URLs or authentication headers) to ensure it never reaches a “dangerous sink” without proper sanitization.
- Vulnerability Detection: Our pipeline automatically scans every Pull Request for common weaknesses like SQL injection, cross-site scripting (XSS), and insecure cryptographic patterns.
- Regression Prevention: By running these scans on every push, we ensure that new features don’t inadvertently introduce security regressions, maintaining the integrity of the validator as it transitions to its new home.
This rigorous testing and security framework provides a high degree of confidence that the validator is not only accurate in its assessments but is itself a hardened piece of infrastructure.
Roadmap:
From UK-Centric to Profile-Based Validation
The current version of the validator allows international usage by passing a specific schema URI within the options. However, my vision for the future focuses on a more flexible, trust-based model:
- Schema Whitelisting: Instead of relying on a URI provided in the request, I envision a “whitelist” of known schema profiles. Feeds will identify which version/profile they adhere to, ensuring they meet recognized global standards.
- Self-Exposed Schema Validation: I hope a model can emerge where the feed defines its own schema. That schema will be validated against our whitelist of authorized schemas, and then the feed itself will be validated against its own exposed OpenAPI spec.
- Consumer Confidence: This approach will ensure that a feed exposes exactly what it claims to. It allows for local extensions or “additional functionality” to be added to HSDS while maintaining a verifiable core that consumers can trust.
Enhanced Parameter Testing and Validation
A key area for future growth is the implementation of more rigorous parameter testing. While the current validator does a good job of verifying core response structures, there is significant room to expand how it handles input parameters. I recently submitted a Pull Request to the HSDS specification to correct a formatting bug regarding page parameters; fixing these underlying specification issues is the first step toward building more intelligent validation logic.
Moving forward, I hope to see the validator extend its reach beyond basic pagination checks. By leveraging enhanced parameter definitions in the OpenAPI spec, the tool could proactively test:
- Filter Logic: Ensuring that parameters like
category,location, ortaxonomyreturn logically consistent results. - Data Type Strictness: Verifying that the API correctly handles (and rejects) invalid data types or out-of-range values for custom query parameters.
- HSDS-Specific Constraints: Automatically checking that mandatory HSDS search parameters behave according to the standard’s expectations.
Expanding into this “active” parameter testing would move the validator from a contract-checking tool to a full functional testing suite, ensuring that Open Referral feeds aren’t just syntactically correct, but operationally reliable.
I couldn’t write an update without…
The Future: AI-Augmented Semantic Validation
While our current refactors have made the validator technically robust, the next frontier is moving beyond “syntactic” correctness to “semantic” intelligence. By integrating Large Language Models (LLMs) and Machine Learning into the engine, we can solve several high-level challenges that traditional code-based rules struggle to address:
- Intelligent Test Case Generation: Instead of static checks, an AI agent could analyze the OpenAPI specification to autonomously generate complex, multi-step test scenarios. This includes “agentic exploration”—where the AI attempts to find undocumented edge cases or logical inconsistencies by chaining API calls together in ways a human tester might not anticipate.
- Semantic “Common Sense” Checks: Traditional validators can tell you if a field is a “string,” but they can’t tell you if the data inside makes sense. An AI layer could flag “impossible” data combinations—such as a service’s
opening_timebeing set later than itsclosing_time, or ageographic_radiusthat is statistically improbable—adding a layer of “business logic validation” that ensures the feed reflects the real world. - Self-Healing Test Suites: One of the biggest pain points in API maintenance is “schema drift.” An AI-integrated validator could detect when a field name has changed (e.g., from
phone_numbertocontact_telecom) and automatically suggest “healing” the test assertions or the consumer mapping, reducing manual maintenance by up to 70% (its a random number. other numbers are available). - Automated Quality Insights: Currently, our Quality Score is calculated via weighted formulas. An AI could provide qualitative feedback, acting as a “virtual developer” to explain why a certain documentation section is confusing or suggesting more descriptive summaries and examples based on best practices observed across thousands of other high-quality APIs.
By incorporating these AI capabilities, the Open Referral Validator wouldn’t just be a gatekeeper for the HSDS standard; it would become a proactive consultant, helping publishers improve their data quality and helping consumers integrate with unprecedented confidence.
(maybe I get over-excited by this stuff)
It’s been a pleasure,
Jeff
