What is good vocabulary for describing Data Quality?

In our Validator and HSDS validation conversations, we’ve been dancing around different words and concepts relating to the quality of data coming through APIs. Reasons for doing so include:

  • As @oughnic pointed out most recently, is to know who to contact to “fix” problems. Is it a technology problem? Contact the technology developers. Is it a source data problem? Contact the source data stewards.
  • So that we can intelligently talk about assessing and improving data processes and tooling.

Based on my recent experiences working in various data orchestration and collaboration projects, I’ll take the bold move of proposing some standardized vocabulary and definitions for data quality. I’ll be speaking from the perspective of Open Referral, assuming that someone is using HSDA/HSDS, using the validator, and seeing some problems. How should they describe the types of problems they are encountering?

  • Validation
    • Example of Good: “I can plug into this data source and all of my HSDA tooling just works automatically.”
    • Example of Bad: “This data I’m getting isn’t even in the correct schema; I’ll have to retool for it.”
    • Description: Whether or not the shape of data is valid HSDA/HSDS.
    • Blame: Technology Vendors
  • Field Level Interoperability
    • Example of Good: “As much of this data as possible is machine readable, making it easier to parse, present, and share with users.”
    • Example of Bad: “Argh! All hours of operation are plain text… I can’t tell users basic facts like ‘what’s open now’.”
    • Description: The degree to which field level data like schedules, service areas, and languages codes are structured, standardized, and generally machine readable.
    • Blame: Technology Vendors or Data Stewards - Some technology doesn’t support creating structured data, some Stewards don’t use the features they already have
  • Completeness
    • Example of Good: “All critical objects and fields seem to be here, including contact info, Last Assured dates, addresses, basic descriptions, etc.”
    • Example of Bad: “Many of these records are missing phone numbers and Application Process notes; users won’t even know how to contact services.”
    • Description: Data is missing key data elements that degrade usability.
    • Blame: Technology Vendors or Data Stewards - data may have been accidentally omitted by Vendors, or not provided to them by Stewards in the first place
  • Freshness
    • Example of Good: “Most of these records have been Assured within the past year; I feel like I can trust the accuracy of this data.”
    • Example of Bad: “These service records haven’t been updated in over five years, I’m not sure I can trust this is still accurate.”
    • Description: How recently individual records have been Assured (AKA verified) for accuracy.
    • Blame: Data Stewards
  • Accuracy
    • Example of Good: “Users report that all the phone numbers they called are working and connect to the expected service.”
    • Example: “My users are showing up to servides with the wrong require documents; our descriptions are listing the wrong documents.”
    • Description: Whether assertions made in data and descriptions are true in reality.
    • Blame: Data Stewards
  • Richness
    • Example of Good: “It feels like everything I need is here: descriptions, how to apply, application requirements, hours and contact info; as a user I know exactly what to do next.”
    • Example of Bad: “These descriptions too terse; I feel like important information is missing.”
    • Description: Rich data improves the user experience by going beyond merely required information to include helpful information, like “Tips for Applying”, schedule notes on the best time of day to call, extra eligibility criteria or target groups, which required documents are preferred, supported languages and interpretation services, payments accepted, accessibility notes, etc.
    • Blame: Data Stewards

I’ll bet there are more dimensions to consider. Forgive me if I’m retreading ground that’s already documented somewhere.

The most subjective item in that list is “Richness”, but as a user experience designer I certainly feel the difference between “rich” vs “anemic” data. It’s easy to tell when the end-user experience was front-of-mind for Data Stewards.

Freshness and Accuracy are closely related, but distinct. Inaccuracies can occur even in the midst of fresh data: user error or misunderstanding, faulty sources, etc. Perhaps this is an academic and not very useful distinction, but there it is. I’m just brainstorming at this point.

Inform USA has some useful elements to consider in Section 2 of their Standards: Standards - Inform USA (formerly AIRS, the Alliance of Information and Referral Systems)

@mrshll @bloom @oughnic @HannahN what would you add, remove, or change about this list? Do you already operate with a rubric for overall data quality? If so, what is it?

I was wondering where Taxonomy/keywords would fit, but they could almost be their own category, especially when you look at rules around correctly applying the hierarchy, not coding secondary services, coding programs consistently across agencies. However, taxonomy granularity is similar to richness, ie do you use “Food” or “Food Pantries” and “Formula/Baby Food” and “Soup Kitchens” and “WIC” so maybe it all fits there?

Thanks @skyleryoung - good topic.
My definition of “good data quality” is about whether a datum or collection of data are good enough to be used. Consequently one person’s good data is useless for someone else’s use case. The other natural consequence of this is that data are often recorded for one purpose and then “reused” for another purpose. Good data may not be fit for other purposes.

Specificity is another challenge. For example, a service description could be very precise or quite vague. Welfare support, food bank, vegetarian food bank, vegan food bank, Kosher foodbank, Halal foodbank could all be used as labels for a service. The corect level of granularity depends on the needs of the user. Someone who is hungry and has no dietary preference may have very different needs to a halal eater in New York, but the kosher foodbank may well match their needs for a service.

A framework used in England is to mark each datum on a five-classifcation system
Valid - the datum is fit for the new purpose
Other - the datum is technically correct but not specific enough to meet the new purpose
Default - the datum is using a default value (e.g. 1 January for a date of birth) so there is no confidence in the value
Invalid - the datum cannot be understood because it doesn’t meet the rules for the field.
Missing - no datum!

… and I’ll leave it to the reader to decide whether 13/03/2026 is a valid date. It probably depends on your context.

1 Like

Skyler, can you say more about your perceived distinction between the following pairs:

Validation and field-level interoperability
Completeness and richness

These strike me from a distance as roughly synonymous. Are there differences between them?

I consider freshness to be a signifier for accuracy. it is true that they are different criteria. but the former is perhaps only relevant to inform assumptions about the latter?

Taxonomic correctness and richness are such a morass that I think I subconsciously blocked it from my list of possibilities lol. It definitely belongs in the list, though. Thanks for adding this, Hannah.

This is an awesome list for field level classification @oughnic. @bloom @CheetoBandito I think we should incorporate this into our framework of profiles between collaborative parties in ServiceNet.

Nick, the fact that one shape of data may not be what’s needed for all use cases is certainly front of mind in our data collaboration work. Our aspiration, which may never be 100% achievable but is nevertheless a necessary aspiration, is to define the ways and means by which a data supply chain produces all of the granular elements of service descriptions, such that they can be assembled to fit multiple purposes.

One example that came up recently is the concept of bundled versus unbundled services. We’re aware of a local record kept by a large central authority that describes a library as one service, but the local township describes it as ten to twelve services: all the little things that are offered at the library. The large aggregator doesn’t have a use case, given their audience, for showing that level of granularity for the library, but the local municipality certainly does. We would prefer to see it managed by the locals at the most granular level possible, because whereas we can create pretty reliable automated pipelines for aggregating ten services into one record, it’s much harder to reliably split one vague record into ten useful services going the other direction.

Does this example fit within your definition of providing different information for different purposes, or what other specific examples of different purposes did you have in mind based on your work in the UK?

@bloom The difference between validation and field-level interoperability is pretty easy to define. HSDS allows valid schedule records with only plain-text descriptions, but that’s not machine-readable. It’s much preferable to have data where the schedule information is, in fact, machine-readable.

I decided to call this “field-level interoperability” rather than “machine-readable” because it goes to the heart of transmitting that schedule information between systems. We’ve seen this in Whatcom and Washington, for example. If they weren’t both already defining their schedules in fully structured, machine-readable formats, we would never be able to swap that information between systems. Put another way, plain text descriptions for things like schedules, service areas, etc., mean that information cannot be reliably moved between systems.


Regarding richness, as I noted above, that is the most subjective category, but as a user experience designer, I definitely notice when data is merely technically complete versus when it’s been engineered to be extra consumable, understandable, or useful to users.

I think the definition of optimal richness varies by audience or use case and is closely related to Nick’s comments above about “purpose.”

I think of it in terms of localization: how do I make this information highly understandable and intuitively useful for specific audiences? That would be one way to define it, although now that I think of it, localization may belong in its own category in the list.

I think my original impulse with “Richness” was just to recognize that “the data managers went the extra mile on this to make it really helpful.” Richness is vague because I’m still sorting it out, but I think there’s something(s) here.

@sasha would love your input on this as well. It was a gross oversight to not tag you at the beginning. How do you assess data quality overall? What would you add or remove from this list?

There is a good analogy from the music industry and the rise of the MP3 file. The master recording from the studio will have the best detail, laid out with each instrument on a separate track. This may get remixed for different platforms and use cases.

  • A radio station may “brighten” the sound to fit in with their image and branding.
  • A vinyl master may be mixed and cut (half-speed or full speed) from the original. It may be pressed onto a light weight or heavy product with multiple colours of vinyl for maketing benefits. In extreme cases different track lists for each colour. Taylor Swift - I’m looking at your marketing department here!
  • A CD master data file may be mixed and down-sampled to 1,411 kbps at 16 bits
  • A casettee tape master may be created
  • An end user may create a lossless flac file or a lossy compressed MP3 file from the CD, downsampling to 128kbps or even 32kbps but also merge in additional information from an online database as metadata - album cover, track list, artist etc.

All of these transformations throw away data, repackage the original and add metadata in the packaging (albun sleeve, notes or embedded). You can’t go back to the original from any of these.

For a service catalogue, the same can happen - the data gets remixed and downsampled to fit a use case. The original record of the service will be the best fidelity of that service but we may merge information from multiple sources, find customer/user feedback, add our own contracting arrangements etc. Aggregators may throw away detail or simply list the service as “pharmacy” or “school” instead of “pharmacy providing UTI advice” or “primary school”.

In summary - the closer you get to the original source of each datum, the more confidence you have in the data.

I appreciate the MP3 metaphor @oughnic, except this statement “The original record of the service will be the best fidelity of that service” may describe something that one thinks ought to be the case, but it doesn’t describe the typical origin of service records in our domain – production doesn’t involve a single high-fidelity capture but often iterative processes. And sometimes directories that are “lower-fi” (in Word docs, for instance) will actually have more information (though less structured).

Thanks @bloom - you make a fair comment. Its an analogy so I’ll not flog it much more, but its worth thinking this through a little a bit more.

The word document is probably the highest fidelity (richest semantic content) and the most difficult for an end user to use (as is the master of an audio track). Some tracks are created by bringing multiple performers’ tracks to a mixing studio over the internet. This is similar to how a service catalogue is curated, with a single curator (person or system) taking multiple feeds.

20 years ago, the content of the English NHS Directory of Services was managed by each of the hospital providers and brought together and published by the responsible government department.

Describing the qualities of a successful iterative process of curating a standard could be useful.

I agree that there’s some important framing that we can hash out here, and I especially like where @skyleryoung is going w/r/t hashing out who is responsible for what (and I think that responsibility is probably a better linguistic frame than “blame” although i understand these are two sides of the same coin and in QI operaitons context then blame probably makes sense).

I’m just trying to see what the smallest useful number of categories might be, because it seems like some of the values in Skyler’s initial posts are more like lower-level attributes of higher-level values.

To step back – Open Referral’s core values are interoperability, reliability, accessibility, and sustainability. In this case, interoperability and reliability seem to be the primary ones at play herem –though i could see a case to be made that “Accessibility” as a value doesn’t just pertain to data license or interface design, but also to “richness” in that non-rich data could be technically interoperable and reliably verified, yet not effective at ensuring “accessibility” because it lacks pertinent information toward that end. So let’s say that interoperability and reliability and accessibility are functional outcomes of the criteria for data quality that we’re trying to define.

I wonder if maybe there’s three tiers of quality, and some of Skyler’s proposed categories are attributes of those tiers:

Validity: is the data compliant with the standard, and therefore technically interoperable? Generally, this is what the validator determines as is. Skyler, to your point, this seems to me like a context-specific consideration – which could presumably be addressed by an “implementation profile” that specifies what needs to be true beyond the standard to achieve ‘field-level interoperability,’

Richness: like with the above, this may involve generic criteria (to what extent does the data include fields that are recommended but not technically required?), and also context-specific criteria (does this include the data we agreed is important to our users above and beyond what is specified by the standard)?

Trustworthiness or reliability: for this, freshness seems like an objective signifier (when was it last updated?) and accuracy seems like a subjective determination (in that we can’t tell just by looking at the data, someone needs to indicate this).

Just riffing here! But i think to be useful we want to get to the smallest number of types of quality, and then delineate their facets.

(@oughnic whatever happened with that English NHS provider directory? is it still aggregated in consistent and federated ways? Any lessons learned from it?)

The NHS Directory is alive and used for referrals from family doctors to hospitals (we have a very different business model to the US).

Validity - I think this comes into two types -

  1. is the system providing a valid schema (vendor accountability) and
  2. are the data populated in the system valid (catalogue / directory custodian accountability).

Richness - which use cases do the data support. A set of sub-profiles could work this out using the validator if se switch it to check a set of (sub)profiles for an endpoint.

Trustworthiness or reliability - Yup - this is the hard one.

The HSDS Data Quality Framework: A Forensic Perspective

I have recently been looking into something that makes me take a slightly different view. Quality of data is multi-faceted. To evaluate it effectively, we must distinguish between Structural Integrity, Semantic Validity, and the broader concepts of Reliability and Provenance. We need to consider all three: Validity, Richness, and Trustworthiness.

1. Structural Integrity (Validity)

The validation flow ensures a data source matches its own definition while simultaneously validating that definition against the OpenAPI specification and the version of the HSDS schema the source claims to support. This avoids permitting a source to provide iit’s own version of HSDS, but forcing it to document all that it delivers. We can ensures Syntactic Integrity, allowing consumers to be confident in the technical interoperability of the fields.

  • Syntactic Integrity: Conformance to the technical contract defined by the HSDS/OpenAPI specification. This ensures that data types, required fields, and object structures adhere strictly to the schema.

  • Referential Integrity: Current validation testing involves selecting a random set of IDs and querying the source to ensure they exist. While effective for small datasets, testing 100% of references in a source containing 10,000,000+ rows is functionally impossible with traditional sequential querying. We must acknowledge that, at scale, we cannot confirm total referential validity using current standard technology.

  • Cyclic Dependency: A structural defect involves bi-directional references (e.g., an Organization referencing a Service that references that same Organization). By removing “walking” boundaries in the resolver, I exposed an infinite loop which caused a memory overrun. This is a failure of the “Tree” assumption in serialization. The OpenApi specificationv doesn’t provide a manner to guard against this.

  • Orphaned Records: Because we cannot feasibly test every record in massive datasets, orphaned records remain a persistent risk. Validating at this scale (e.g., CERN-level 9PB annually) requires a shift to statistical sampling and Stream Validation, which I have recently implemented.

2. Semantic Validity (Richness & Accuracy)

This concerns the actual meaning and logic of the data. Even if the JSON structure is “perfect,” the content may be invalid or “poor.”

  • Automated vs. Manual Processing: The strategy involves manually checking “primitive” logic (e.g., is a date a date?, general regular expressions, etc.), while offloading complex semantic checks to LLMs or micro-models.

  • The Micro-Model Agent: To process 1TB+ of data, even an eventing architecture identifying suspicious data via Regex and offloading it to small, cost-effective models in batches, would likely take an inordinate amount of time. This is one way to determine if data is functionally Deficient or merely Low-Density.

  • Forensic Impossibility: A critical category for legal evidence. This occurs when data contradicts itself logically (e.g., a last_updated timestamp that predates a created_at timestamp).

  • Temporal Consistency: If a service is marked as “Active” but lacks opening_hours or a location, its Operational Validity is zero.

  • Granularity Mismatch (Classification Precision): Common in taxonomies. If a service is tagged with a term so broad it becomes useless (e.g., a Food Bank tagged only as “General Health”), it represents a failure of Richness.

3. Reliability & Provenance (Trustworthiness)

Because HSDS data affects the lives of vulnerable people, its management carries a “Duty of Care.”

  • Data Provenance (Lineage): The ability to trace a record from its source to the API. If the chain is broken, the data lacks Evidentiary Weight.

  • Spoliation of Metadata: The removal of essential timestamps or versioning info by an API endpoint. This is Digital Spoliation—it destroys the context required to trust a record.

  • Completeness vs. Density: “Completeness” is merely the presence of fields. “Density” is the presence of useful information. An API populated with “N/A” or placeholder strings is technically complete but functionally Unreliable.