After a recent upgrade working group call, I wanted to write up here something that I mentioned in passing that I think is relevant. @skyleryoung may be particularly interested in this as well.
Fundamentally, all of what we’re talking about here is making sure that a particular bit of data (a file, API response, etc) is structured in a way that means that a particular application is able to use it. And, by “structure”, I mean field names, properties of contents, API methods, and more - anything that can be considered a container for the information that needs to be exchanged between the systems involved.
Standards like HSDS prescribe the ways in which certain concepts have to be modelled: a
service has an
id and a
name and a
status, and (even though it’s not in the schema), the standard says that the
email field is for an email address. Someone putting a postal address in the
email field wouldn’t be following the standard.
So far, so straightforward. But what about times when not everyone has the same needs of the data? That’s what we’re talking about in this thread: making it so that people who do have the same needs of the data are able to work together, without making the standard burdensome (or irrelevant) for everyone.
One approach is to put together a set of additional prescriptions: I might say that I need to know the fees for a service, and so I can only work with data where that field is filled in. If someone else has the same need, then we can agree to always use that field. We might even go further, and say “We offer HSDS-compilant data, always with fees”. Obviously, in practice, it’s likely to be a set of requirements, that we can bundle up and call an “extension” or a “profile”, or something. This approach is great when (and this list isn’t exhaustive):
- there’s a clearly defined need, and agreement around what’s being described
- there’s some level of coordination, and the opportunity for the creation of a new artefact
- a particular approach is required for a particular context - as in the priorities of OR-UK
Another approach is to devise a framework by which data can be described. If I need data that has fees included, then I can check a bunch of data sources, and see if they include fees. The same holds if I need data that uses AIRS, or both AIRS and fees. I might even publish the results of my checking, so that anyone else in my situation can understand what data they can use.
We recently made a quality dashboard for 360Giving which is based on the idea that we can describe the qualities of data, rather than judge that data is “good quality” or “bad quality”. We’ve picked ~10 features of data that we think are useful to know about (such as grant duration, location codes, recent publication) and run a daily test of all the data in order to provide a report of what data has what qualities. This is intended to set out our idea of what’s useful in data (we chose qualities based on research) and provide a basis for potential data users to discover data that might be useful for them.
This approach is useful when (again, not exhaustive):
- there’s no one clearly defined need, but several ideas of what’s useful in particular contexts
- not all qualities are relevant to all data sources (e.g. an historic publication can never get a “recent publication” badge, and that’s ok, but means your data isn’t useful for a “this month in grantmaking” newsletter)
- there’s little use for a new artefact
These approaches are, of course, not mutually exclusive - but I think could help inform our discussion on how we proceed with extending/constraining/profiling HSDS.
In practical terms, the approaches do converge, so we don’t have to bake in a choice, right now and for all time.