We need a normalized `url` table in the pattern of the `phone` table

I’ve imagined this scenario would arise for a while now, and it just did. One of our clients has multiple URLs per service record. For example website, Facebook, and more. They are ruminating on adding text formatting into the single URL field such as:

Facebook: https://facebook.com/abc, Website: https://abc.org

I, of course, am trying to avoid that at all cost.

I’m also avoiding using attribute as it doesn’t provide enough context. In the above, I need to know at least two things about each URL value:

  1. That it is a URL
  2. What it’s for (a label like Facebook).

At present all I can add in attribute is the value, ie https://facebook.com/abc, with no context on what it is or what to do with it.

We considered adding multiple virtual locations, but that feels like a cumbersome use of the location table, and will obfuscate locations that are truly virtual. I don’t want to generate a service_at_location record for Facebook, unless Facebook happens to be the primary point of service for something. In most cases, it is not.

The most obvious solution is to normalize URLs into their own table. I would almost exactly follow the pattern of the phone table, allowing URLs to be linked to organization, service, location, service_at_location, perhaps even contact.

Is this is a necessary step? Have you found other workarounds for handing multiple URLs on an entity?

@bloom @MikeThacker @PollyM @devin

I think attributes can work for this where either link_entity or taxonomy_term is used as the label.

I’ve been a long time proponent of adding an additional “description:” field to the attributes table to make it even more useful for use cases like the one you describe here and many others.

This GitHub issue may be a relevant thread for reference?

@devin I agree wholeheartedly that attribute would be much more useful with a description field included.

@bloom thanks for the reference. I forgot that I was involved in that Github issues thread as well. I don’t see any recent activity there though. It does seem like a normalized table would be generally well received, based on that thread. What’s it take to promote that recommendation?

Go forth and shape it up here, then maybe bring it back there. of course you can make a pull request any time, but it will be easier (especially when we are not in standard development mode) to get consensus in discussion around it.

I can’t name implementations that have come across this although @Ian-DigitalGaps may be able to.

My first reaction was to add multiple virtual locations but attributes may well be better, even if you have to add a description field which profiling would allow.

In general, I think we should encourage attributes for extensions but periodically review how attributes are used to see if there’s a case for amending HSDS.

Cheers for raising this Skyler. It definitely sounds like something that should be designed for and addressed asap. Having a separate container for URLs to allow for one-to-many relationships is something that’s desirable and achievable, and could be added without breaking backwards incompatibility so we should scope this out for the next update.

One thing, however, is that I’d caution us against using language such as “table” since the canonical format of HSDS 3.0 is JSON. We should be thinking and designing for JSON first, since this is what the API Specification is telling people to expect. We officially support a Tabular serialization so we are right to consider this as it will ultimately result in a new urls table; but really the core issue is that our models need a way to attach multiple URLs to certain objects, with an appropriate label. Systems can store the underlying data however they want and will likely do so using a relational database, but should dereference it to embed the objects in JSON to comply with HSDS 3.0.

With that pedantry out of the way; HSDS hasn’t formally adopted semantic versioning as far as I know but I feel it’s stuck pretty close to it thusfar. Therefore we should design/implement this to avoid triggering a MAJOR version change to 4.0 and instead aim for this to be backwards-compatible new features, taking us to 3.1. Therefore every change we make must be optional from the publisher/data user perspective and can’t remove any of the existing fields and rules in HSDS 3.0. This sadly means we can’t replace the offending fields such as service.website.

However, we can definitely implement this in a way that doesn’t break backwards compatibility. Once the community and yourself have agreed a design for the url object (properties, formats, rules etc.), I see two ways of integrating this into the existing objects (using service.website) as an example.

  1. We can declare older fields as “deprecated”. This means any validator encountering service.website in a dataset can issue a warning but the data cannot fail validation because of this field (unless it’s not a uri, obviously). We can then add new optional fields in appropriate places to add in arrays of url objects. This could be as simple as declaring service.websites[], but we should consider naming carefully to avoid confusion.
  2. We can adjust the schema to allow service.website to contain EITHER a URI-formatted string (current rules) OR an array of our new url object. The former of these would be declared as deprecated since version 3.1, and while validators would be expected to validate data where service.website was a valid URI string, they could then show deprecation warnings if the data was declared to be 3.1.

I have no immediate preference to which approach is adopted, although I have a slight leaning towards the second if it’s feasible and the JSON Schema isn’t overly esoteric. In either case, it should be seen as a temporary solution until HSDS 4.0 which would have scope to make backwards incompatible changes with different field names and underlying structures.

@mrshll Thanks for the clarification of terminology :slight_smile: It is important. I’m work in databases quite a bit and so tent to have my thoughts run down that path the most.

Do you think that potentially website[] can provide links for both Services and Organizations?

Also, if we allow multiple URLs for a given entity, then I’m sure we’ll need to include some heuristics that allow ranking or prioritization, for the same reasons outlined in this related post for phones: How do you indicate the main or primary phone number for a resource? - Technical - Open Referral community forum