Thanks @bloom. Six out of six of our statewide 211s use a boolean “is_hidden” field in relation to their address data in order to solve this problem. They are all primarily hiding public addresses for certain types of shelters. That is the only use case I am currently aware of personally. I can reach out to my data contacts to see if they have additional use cases.
They are also hiding certain other types of fields, but those mostly fall into the administrative category that wouldn’t be considered public information, and therefore falls outside the scope of HSDS. Per @MikeThacker’s earlier points, HSDS isn’t officially for exchange of private data.
I think this can be specified. I have been looking into this whilst investigating if a “profile” could enforce the use of certain taxonomies.
The “slightly more” accurate JSON Schema pseudo-code actually looks like.
IF address has a property "attributes" which is an array and CONTAINS an object which has a property "taxonomy_term" which is an object that has (a property "value" whose value is "redacted" AND a property taxonomy_id with value "some_id") THEN "address_1" should be empty
It is a handful to write and I have not got experience of using Contains but I think its possible.
@bloom I asked my clients about use cases where they might need redacted information. I got responses from three of them, and permission to share them. I apologize in advance for the lengthy post here.
Hannah Newton, Washington 211
Our I&R’s stance is to not use redacted fields because hackers could potentially find the on/off toggle and turn it on thus exposing the sensitive data. We have a separate “confidential address” field where this information is stored.
I can’t think of another example where we hide sensitive information by field, but we do hide entire records from the public if they are for internal use only.
Lindsay Paulsen, United Way of the Midlands 211
Addresses:
We have the option to check the box as ‘private’ for both physical and mailing address. We don’t use it a lot, but it is available to agencies when they review their information and mark any needed changes.
Sometimes an agency will mark the physical address as private if they don’t take walk-ins. One example is a crisis hotline – they only take phone calls and do not want people showing up at their location, so they marked it as private.
Sometimes an agency/entity will mark the mailing address as private. I can think of a support group run by volunteers, and the mailing address is a person’s home. No need for that to get out to the public.
Phones:
Our I&R has the option to select each phone as ‘private’. Same story as addresses – the private option is available to agencies when they review their information and mark any needed changes.
Examples might be an emergency cell phone number for a rural pantry
We do not mark the following fields as private under any circumstances (no option for it anyway): Eligibility, hours, description, languages, application process, geo area served, fees, documents required.
Jane Cramb, Wyoming 211
Currently within our database we have domestic violence shelters with redacted locations but not all of them. Some of them have included their address as the shelter itself is in a different location.
For other information that might be withheld we used to have the director’s contact information withheld previously however that information is not kept with this system now.
In cases where there is a website only and not a phone number to call, we then don’t add a phone number at all.
Summary
Based on this limited feedback, I draw a few conclusions relating to my earlier questions:
Domestic violence shelters are the primary use case for redacted public data.
If any other piece of data might also warrant being redacted, it might be phone numbers.
Most other hidden data falls into the “private/administrative” category, and can simply be omited.
My summary might be this:
Contact information, by virtue of being expected to exist on public records, may warrant some explanation to end user’s when it exists by should not be displayed.
I think this is worth highlighting: we’ve focussed entirely on field-level redaction here; record-level redaction would be an entirely different conversation. Unless there’s a compelling reason to think about it, I think it’s fine to leave that as just “don’t publish what you don’t want to publish”.
I absolutely agree. All of the feedback mentioned from clients included the idea of “private records” in their thinking, but they are also intuitively omitting that from public access, so I’m comfortable saying “there’s nothing to see here”.
I spoke with our own data engineer at Connect 211 (newly hired) and he indicated he would prefer using attribute to indicate redaction. The longer we think on that solution the more we like it. Much thanks @robredpath for offering that solution.
What would it look like to finish gathering consensus and resolve this topic?
The good news is that all the solutions we’ve seriously discussed either don’t require an upgrade at all, or a MINOR one; we don’t need to tie this into the 3.0 discussion. phew.
The bad news is that I think we’ve got a few of these kinds of things around and I think the HSDS Workgroup should really be focussing on the API spec, tooling questions and getting the upgrade out, so I don’t really want to add this - and any other such questions - into the mix.
I think you can largely just go ahead and declare that this is how you are choosing to implement redaction; in time this might warrant getting written up as non-normative guidance or a profile or something like that. Or, it might be adopted as normative in 3.1
@bloom can I just check that’s your understanding as well?
Agreed, i think we can reference this thread for now (actually, can you reference it in this old open issue here?) and skyler you can just go ahead and document how you handle it, and plan to report back. How does that sound?
I think it would be helpful for everyone to hear how it’s going in a few weeks / months, @skyleryoung - it might be that you want to warn everyone off ever considering this, or it might be that this is a great solution. I think what’s important is that, however it goes, it sets us on a path towards greater alignment. Your experience, after appropriate discussion and refinement, should form the basis of future guidance, or even normative content.
Just providing an update on this. We are actually liking one of @robredpath 's earlier suggestions the most, and I wish I had been astute enough to forecast that at the time.
Adding a location.type of redacted most neatly communicates the most and most useful information:
We can interpret how to handle addresses, but I still know what type the addresses themselves are.
It’s important to know whether a location is redacted at the location level, because if the address should be redacted, then the lat/long usually should be too.
The heuristics for our most common types of locations are already located here at the location level: A) Call Centers (et al) are “virtual” indicating a non-physical intake point, B) Domestic Violence Shelters are “redacted” indicating confidential information, and everything else is “physical”. We have few or no cases so far for labeling a location as “mailing”, but I’m sure those edge cases exist.
I would recommend that we add a new location.location_type of “redacted” in a future, minor update. Thoughts @devin@robredpath@bloom ?
this was not approved by the committee @klambacher commented ‘this doesn’t cover the privacy use cases adequately IMO, where we may need to share the data but also indicate that it is private vs just completely redacting the information.’
in the technical committee meeting we discussed the possibility of having a separate field such as location.privacy with options private, redacted, public
I can look at writing this up for 3.2. One consideration is that “private” could potentially mean different things in different contexts e.g.
you can access this address but cannot share it
you can share this address but only with specific agencies
is this the case? and if so is that something that also needs to be represented? how do we define “private” in this hypothetical codelist
If we include private data, we need some kind of value that denotes the data is “not open”, meaning it should not be possible to access anonymously in public (e.g. via an open web page). Whether that means it’s just internal, limited to some agencies, subject to the terms of a separate sharing agreement or anything else is probably too much detail. So guidance should just say that associated sharing terms should be described elsewhere.
I don’t want the inclusion of any private data to be used as an excuse for not making a whole feed public, as is arguably the case for Ofsted services data in the UK where a very small portion of the data is confidential so organisations have to go through a long approvals process to access the data at all. Hence we may want a validator to highlight any private data or to exclude it from the main HSDA end points.
Thanks Mike - so in that case we’d explicitly leave “private” as somewhat open to interpretation with the idea that agencies could apply their own definitions as needed?
Hopefully we wouldn’t end up with an Ofsted like situation here as this is specifically focused on addresses only but it is important to think about.