When a data standard/specificaion is used for a given field, provide a way to define _which_ standard was used

This discussion is migrated from a Github ticket: When a data standard/specificaion is used for a given field, provide a way to define _which_ standard was used. · Issue #286 · openreferral/specification · GitHub

I have two related thoughts/questions for the community that stem from that original ticket:

  1. Should we require using existing standards for formatting data within HSDS. For example, we could specify that data in email and url fields conform to the ITU-T E.123 standard. Using standards like this has the potential for making data more consistent and machine readable.
  2. Regardless of whether they are required, when there are standards in use it would be helpful to specify which standards they are. I struggle to find an example of multiple standards in use for most common data fields in the USA, but a good reason to specify which standards are in use might be for the sake of international adoption where national standards from other countries may be employed.

The first point, whether we should require additional standards to be employed run into another forum post I’m drafting about how best to make data less ambiguous for end users.

My questions in this thread might be these:

  1. If standards are being used in data, what’s the best way to specify which standards are being used? We could propose new fields. We could use attributes, which is my personal favorite.
  2. If we used attributes, what should the label for this attribute be?
  3. Can/should we create a list of additional sub-standards that can be used for specific data fields in HSDS? Which standards are you aware or employ in your data?

It’s worth noting that the fact we use JSON Schema means that a lot of this is already decided. For example if we tell JSON schema that a string field must be an email, then this means:

A string instance is valid against these attributes if it is a valid Internet email address as follows:
email: As defined by the “Mailbox” ABNF rule in RFC 5321, section 4.1.2 [RFC5321].

idn-email: As defined by the extended “Mailbox” ABNF rule in RFC 6531, section 3.3 [RFC6531].

Note that all strings valid against the “email” attribute are also valid against the “idn-email” attribute.

(source – JSON Schema 2020-12 validation schema section 7.3.2)

So to me, this means that it is implicit that the formats for emails are defined in RFC 5321, and RFC 6531. There is a similar set of standards specificed for Resource Identifiers i.e. on HSDS fields where the format is uri.

In order to support implementers and data users we could make this more explicit in the docs such as by stating we use the JSON Schema keywords to specify formats for specific fields and therefore people should reference the JSON Schema 2020-12 validation schema docs to determine which standards are in use for these fields.