Hey @iansingo,
Sorry for the late reply to this @mikethacker; I was typing my reply yesterday when I realised I had to run to the Technical Meeting!
Practical stuff first:
Are we ok to set the phone UUID to be the service UUID?
This would pass validation according to the rules of the Standard, because all a validator would see is that phone.id is a UUID. Schemas-based validators donāt do any cross-checking of identifiers, as thatās a Data Quality issue.
However, itās not good practice for UUIDs. @mikethacker encapsulates the main reason for using UUIDs quite well; itās so that records from multiple sources can be ingested and uniquely identified safely without collisions, and you can then use the UUID in other systems which want to do stuff with the data.
Ideally, you should be creating UUIDs for every unique entity you have, as this is how UUID is supposed to work. How you do this will depend on your systems. Generating them initially should be straightforward, but then youād need to ensure that this UUID was actually stored with each record. Depending on how your systems work, this might be tough e.g. if your database doesnāt have a concept of a āPhoneā and instead groups that information in another record type.
If you share a little bit about your systems, I could suggest a few things.
The background/motivation of UUIDs in HSDS:
The long and short of it is that UUIDs are the most straightforward and easiest-to-implement solution for uniquely identifying different types of record in a global dataset, given the design philosophy and history of HSDS as a Standard.
I wasnāt around in the early days of HSDS so I am keen to be corrected, but I understand that HSDS comes from a place where people have been storing and exchanging sets of normalized data, where everything has an identifier in the system. Previously, it was published as a āTabular Data Packageā, with CSV files containing identifiers to tables of data in other CSV files.
A lot of that history is reflected in the current design of the HSDS. The different objects in HSDS preserve the ability to reference other ātablesā of objects by identifier, whereas a more contemporary approach is to totally abstract implementation details (i.e. normalized databases) away from the data model used for exchange. In other Standards used for Open Data, globally unique identifiers are usually only necessary for the top-level object e.g. a Grant or a Contracting Process. Everything under that just needs a local identifier to support with parsing and querying arrays. Itās very rare that someone analysing e.g. some OCDS data will need to uniquely identify a particular document in a merged dataset of documents. Instead, theyāre looking for particular contracting processes and then can drill down to find the document they need.
With HSDS, I think itās a bit different. Since phone numbers can be attached to services, organizations, locations etc. HSDS has modelled that. Although I imagine there will always be overlap in global datasets, having a UUID for the concept of a āphoneā record means that systems have the ability to ingest data about various different entities and then update or check records match appropriately. For example they might have consumed informatin about Service A from somewhere, and created the entry for that serviceās phone number in their systems. If they later ingest information about Organization A from somewhere else, they might spot that this supposed to be the same phone number if it has the same UUID. This allows the system to then: update the record if appropriate (last updated date maybe), throw a warning out, or check that Organization A is somehow associated with Service A and therefore maybe this is appropriate and well.
Hopefully that has clarified a few things!