Connect 211 is working on translating resource data in the database. We decided to do that as it’s more cost effective in the long run, and also improves the value of the raw data for our clients.
Internationalization of resource data is, I think, important, already has use cases where HSDS is in use, and generally a direction we all should go.
It raises the question though, of whether translation at the data level is a good idea, and something that should be accommodated in the API specification?
For example, should /services
be /en/services
or es/services
?
@bloom @MikeThacker
I see this has gone unanswered so far. I think it’s important but I don’t have much expertise here. Perhaps @robredpath might comment.
Just two observations:
- a very small proportion of fields hold text that might vary between languages (mainly names and descriptions)
- some people want alternative text in the same language dependent on audience, e.g. service description for a professional/commissioner vs service description for a user)
Thanks for the ping, @MikeThacker - I missed this!
HSDS is an interchange format: it’s about the interface between systems. So, it doesn’t really matter how a particular system maintains its data, as long as it’s properly structured on the way in or out.
The structure of HSDS right now is that language is assigned on a per-table level by entries in the meta_table_description
table. I understand that table isn’t really in practical use, and it’s being considered for removal in 3.0. I can’t imagine a situation where you’d want the services
table in English but the organization
table in Spanish, so this makes sense. We should be aware that we’re losing the ability to describe the language of the data within the datapackage in a standardised way, although it’s not really a loss if no-one used it! Anyone exchanging information is probably describing it out-of-band; that’s fine for bulk data interchange, but not really for APIs.
In the case of APIs, we should define two things:
- how someone can request data in a particular language (and how systems should respond to that request)
- how systems should respond if a language isn’t specified
Whether we use a language prefix (such as /en/services
) or HTTP headers (Accept-Language
, anyone?) is a conversation for another time, I think. Regardless of how the request is composed, we should allow someone to specify a language that they want to get data back in, and there should be a way for the client to know what they got (either within the data, or a Content-Language
header).
If a language isn’t specified then the client should still know what they got; the system can choose its own default, though, of course.
We should be careful not to confuse the language in which the information is described with the language in which the service is delivered: an English-speaking healthcare professional might look for information in English about a Spanish-speaking service for a Spanish-speaking client. HSDS provides a fairly rich structure for this already, which I think is fine.
I think this could be handled through an extension, if we permit it: I can imagine a couple of quite neat ways of modelling this. This mechanism might also apply if a directory wanted to offer (for example) web-ready and plain-text forms of their descriptions.
Thanks for the feedback all.
@MikeThacker Yes, we had identified only a handful of fields (mostly descriptions) that should be translated.
@robredpath I’d be keen to discuss your ideas on how to model multilingual data. We opted for something similar to the “Tables” approach described here: The Best Database Structure to Keep Multilingual Data | Phrase
1 Like