Handling confidential addresses in HSDS 3.0

Thanks @bloom. Six out of six of our statewide 211s use a boolean “is_hidden” field in relation to their address data in order to solve this problem. They are all primarily hiding public addresses for certain types of shelters. That is the only use case I am currently aware of personally. I can reach out to my data contacts to see if they have additional use cases.

They are also hiding certain other types of fields, but those mostly fall into the administrative category that wouldn’t be considered public information, and therefore falls outside the scope of HSDS. Per @MikeThacker’s earlier points, HSDS isn’t officially for exchange of private data.

I think this can be specified. I have been looking into this whilst investigating if a “profile” could enforce the use of certain taxonomies.

The “slightly more” accurate JSON Schema pseudo-code actually looks like.

IF address has a property "attributes" which is an array and CONTAINS an object which has a property "taxonomy_term" which is an object that has (a property "value" whose value is "redacted" AND a property taxonomy_id with value "some_id") THEN "address_1" should be empty

It is a handful to write and I have not got experience of using Contains but I think its possible.

@bloom I asked my clients about use cases where they might need redacted information. I got responses from three of them, and permission to share them. I apologize in advance for the lengthy post here.

Hannah Newton, Washington 211

Our I&R’s stance is to not use redacted fields because hackers could potentially find the on/off toggle and turn it on thus exposing the sensitive data. We have a separate “confidential address” field where this information is stored.

I can’t think of another example where we hide sensitive information by field, but we do hide entire records from the public if they are for internal use only.

Lindsay Paulsen, United Way of the Midlands 211

Addresses:

  • We have the option to check the box as ‘private’ for both physical and mailing address. We don’t use it a lot, but it is available to agencies when they review their information and mark any needed changes.
  • Sometimes an agency will mark the physical address as private if they don’t take walk-ins. One example is a crisis hotline – they only take phone calls and do not want people showing up at their location, so they marked it as private.
  • Sometimes an agency/entity will mark the mailing address as private. I can think of a support group run by volunteers, and the mailing address is a person’s home. No need for that to get out to the public.

Phones:

  • Our I&R has the option to select each phone as ‘private’. Same story as addresses – the private option is available to agencies when they review their information and mark any needed changes.
  • Examples might be an emergency cell phone number for a rural pantry

We do not mark the following fields as private under any circumstances (no option for it anyway): Eligibility, hours, description, languages, application process, geo area served, fees, documents required.

Jane Cramb, Wyoming 211

Currently within our database we have domestic violence shelters with redacted locations but not all of them. Some of them have included their address as the shelter itself is in a different location.

For other information that might be withheld we used to have the director’s contact information withheld previously however that information is not kept with this system now.

In cases where there is a website only and not a phone number to call, we then don’t add a phone number at all.

Summary

Based on this limited feedback, I draw a few conclusions relating to my earlier questions:

  1. Domestic violence shelters are the primary use case for redacted public data.
  2. If any other piece of data might also warrant being redacted, it might be phone numbers.
  3. Most other hidden data falls into the “private/administrative” category, and can simply be omited.

My summary might be this:
Contact information, by virtue of being expected to exist on public records, may warrant some explanation to end user’s when it exists by should not be displayed.

I think this is worth highlighting: we’ve focussed entirely on field-level redaction here; record-level redaction would be an entirely different conversation. Unless there’s a compelling reason to think about it, I think it’s fine to leave that as just “don’t publish what you don’t want to publish”.

I absolutely agree. All of the feedback mentioned from clients included the idea of “private records” in their thinking, but they are also intuitively omitting that from public access, so I’m comfortable saying “there’s nothing to see here”.

I spoke with our own data engineer at Connect 211 (newly hired) and he indicated he would prefer using attribute to indicate redaction. The longer we think on that solution the more we like it. Much thanks @robredpath for offering that solution.

What would it look like to finish gathering consensus and resolve this topic?

Heh, well.

The good news is that all the solutions we’ve seriously discussed either don’t require an upgrade at all, or a MINOR one; we don’t need to tie this into the 3.0 discussion. phew.

The bad news is that I think we’ve got a few of these kinds of things around and I think the HSDS Workgroup should really be focussing on the API spec, tooling questions and getting the upgrade out, so I don’t really want to add this - and any other such questions - into the mix.

I think you can largely just go ahead and declare that this is how you are choosing to implement redaction; in time this might warrant getting written up as non-normative guidance or a profile or something like that. Or, it might be adopted as normative in 3.1

@bloom can I just check that’s your understanding as well?

Agreed, i think we can reference this thread for now (actually, can you reference it in this old open issue here?) and skyler you can just go ahead and document how you handle it, and plan to report back. How does that sound?

Good enough for me :slight_smile: We are planning to use attribute, and will report back here if that changes for some unforeseen reason.

I think it would be helpful for everyone to hear how it’s going in a few weeks / months, @skyleryoung - it might be that you want to warn everyone off ever considering this, or it might be that this is a great solution. I think what’s important is that, however it goes, it sets us on a path towards greater alignment. Your experience, after appropriate discussion and refinement, should form the basis of future guidance, or even normative content.

Just providing an update on this. We are actually liking one of @robredpath 's earlier suggestions the most, and I wish I had been astute enough to forecast that at the time.

Adding a location.type of redacted most neatly communicates the most and most useful information:

  1. We can interpret how to handle addresses, but I still know what type the addresses themselves are.
  2. It’s important to know whether a location is redacted at the location level, because if the address should be redacted, then the lat/long usually should be too.
  3. The heuristics for our most common types of locations are already located here at the location level: A) Call Centers (et al) are “virtual” indicating a non-physical intake point, B) Domestic Violence Shelters are “redacted” indicating confidential information, and everything else is “physical”. We have few or no cases so far for labeling a location as “mailing”, but I’m sure those edge cases exist.

I would recommend that we add a new location.location_type of “redacted” in a future, minor update. Thoughts @devin @robredpath @bloom ?

I like this.

Seems like another opportunity for us to make a codelist for the standard and put “redacted” in there for location.location_type.

Have added related this conversation to #469 & #166

Happy to discuss on Wed.

@mrshall - FYI