Resource Data Federation: let's discuss

bloom · January 17, 2025, 8:42pm

We should make some time to talk about the challenges of resource data federation – by which I mean, enabling multiple directory maintainers to collaborate on data management across different information systems.

I think there are technical questions in here (like how to enable cooperation among people with different answers to questions like “what is a service?”) and also governance questions (like how to divide up responsibilities, incentives, conflict resolution etc) although the former might blur into the latter.

We certainly will need more than one conversation to really make progress, but maybe we could just start with a fishbowl discussion among people who have some experience. I’m happy to facilitate.

@klambacher has agreed to participate. I’m sure @skyleryoung will be interested. I also know of at least a couple of groups of people interested in pursuing this path who might not be here, but who I’d invite to listen in and bring questions.

NB: I want to wait for @MikeThacker to get back before picking a date because he’s asked for this for a while.

Who else is interested? What should be on our agenda? What questions would you want such a conversation to consider?

skyleryoung · February 6, 2025, 1:38pm

I’m definitely in. When is Mike back?

MikeThacker · February 7, 2025, 2:25pm

When is Mike back?

I’m back now.

bloom · February 12, 2025, 11:50pm

OK great. I assume that since we’re spanning pacific coast to uk time, we’re looking at the 11a ET hour – sound right?

@klambacher @mikethacker @skyleryoung and anyone else who might be interested, let me know if you have any days in which 11a ET normally doesn’t work.

Next week i’ll schedule something after reaching out to the folks in the field who asked me for this convo.

bloom · March 11, 2025, 6:08pm

OK I have folks from at least one state eager to join this conversation, so i’m ready to start scheduling.

I’m looking at 11a Eastern during any Monday, Tuesday, Thursday, or Friday during the last week of March or first week of April. I’ve created a poll here: Doodle

Please fill it out! And let me know if there’s anyone you think i should specifically invite. thank you

bloom · March 13, 2025, 1:21pm

OK we have picked March 25th at 11am Eastern for a discussion about resource directory data federation.

We’ll invite some opening remarks from @klambacher and also @skyleryoung. Then we’ll have discussion.

Please help set the agenda in advance: what questions come to mind when you consider the challenges and opportunities for different organizations using different information systems to collaborate in management of shared resource directory data? Please send in the questions you would most like to discuss, and I will try to shape an agenda that is worthwhile to everyone

bloom · March 26, 2025, 1:56pm

Hi folks – thanks so much to @klambacher for sharing with us the story of her epic journey through resource data federation. I’ve got notes here (feel free to add or clarify).

I know that several attendees had questions first and foremost about politics and trust and organizational strategy issues, so it was great to hear Kate’s lessons learned about those things. I don’t think we got very far into the technical challenges as pertains to some of our questions about evolving the specification – so I assume we may want at least one more round of discussion on this.

I will first check to see whether the same day/time will work for most folks next week or the week after – so, April 1st or 8th at 11a Eastern. Please let me know if that won’t work for you.

And please also share your observations and questions for the next agenda!

DARS_NWD · March 26, 2025, 4:22pm

Good Afternoon, I am interested in participating in the next conversation(s) if possible please.
To your question about dates, I am available on the dates you mentioned as option (April 1st or 8th at 11a Eastern).
Thanks
Erika (No Wrong Door Virginia)

bloom · April 2, 2025, 2:26pm

Hi folks – we’re scheduled to resume this conversation at 11a Eastern on tuesday next week.

Now that we’ve heard a real-world story of a federation experience from Kate, I’d like this next conversation to open up a bit to explore the technical and tactical challenges.

I know we have technical questions about structuring identifiers among distributed sources, and provenance metadata.

Also there’s the question of how to establish “official” or “core” or “golden” records that are reliably verified by a designated steward in a way that still enables others to add their own custom data.

We also have questions about structuring governance, and incentives – i.e. how organizations can be equitably compensated for distributed contributions.

What are your questions? Please let me know so we can do a bit of work in advance to shape the agenda

MikeThacker · April 3, 2025, 12:11pm

@bloom it would be good to hear:

what motivates publishers to keep up-to-date with the latest version of HSDS/HSDA?
what additional properties do people add to help their federation? (e.g. Kate mentioned tags to decide what appeared in what outputs)

bloom · April 9, 2025, 7:30pm

Hi folks – Thanks to those who joined for the time yesterday.

Notes are here: Standing Technical Committee Meeting Notes – Open Referral - Google Docs

I went through to bold the parts that I thought were most significant. I would welcome more comments and key takeaways from you all.

I know that this question pops out at me, and seems relevant to the feedback loop design question, and in turn the metadata specifications:
can/should there be nested (vertical) levels of responsibility? So vertical rather than horizontal redundancy – i.e. local/topical responsibility for record stewardship, overseen/bottom-lined by higher-level umbrella stewardship. (Is vertical vs horizontal the right frame?)

Should we pick up there? Are there other questions you want to discuss next?

I’m going to send out a calendar item to reconvene and continue the discussion for two weeks from now. Let me know if you’d like to keep this going and if that would work for you.

bloom · April 23, 2025, 5:11pm

OK notes from this conversation below and here, please feel free to help clarify / add / organize. This is a bit circular but a good discussion. we have a couple of specific questions we could pick up next. what do you want to discuss next?

Incentives –

“Local stewards are more likely to have relationships and are able to get people on the phone at the organization. More experienced data managers, however, understand the needs for the funders and how to craft a highly specialized and high quality record.”

Kate: “reconciling these two strengths is both necessary and possible.” experienced data managers can pass down their knowledge / build up data quality, while locals can build and maintain relationships.

“Duplicate record management is not always bad, in-fact some overlap in management of a singular resource drives more rich information and more frequent updates. Reconciling those records to match (suggest things are the same) and then map (human editing and confirmation of the matching)…” Creating a golden record set that distills this (?)

Who should get paid for what?

Locals / domain experts/stewards for managing records on a per-record basis.
Regional/state-level for bottom-lining comprehensive data, quality assurance, etc.
Producing “opinionated” data sets for specific
Participation in cooperative processes i.e. governance activities
CBOs should get paid for providing services! This is important; it may or may not be relevant.

First two bullets can be gamed / played unequally. Organizations (local OR high-level/utility) might be incentivized to maximize their benefits in ways that aren’t in the best interests of users or the network as a whole. Example of incentivizing the wrong things: paying more for complex records leads to needless complexity.

^ but these challenges can be addressed through governance.Contract based payments with incredibly specific restrictions around what records are being compensated has really been proven to be effective in disincentivizing cheating in the system. Agreements on fair terms, and monitoring and conflict resolution processes to manage and evolve the agreements over time. These can also yield technical safeguards, though that has to follow agreements. “Tight standards and inclusion policies, especially coupled with auditing, make it so that each steward can keep their own records, that’s their business, and whether they get paid or not is on the basis of standards.”

Get clear on what we are trying to incentivize?

Reliability.
Responsiveness
Relationships.

Are there different thresholds – some kinds of data are objective, can be collected with relatively minor levels of expertise; some kinds are specialized and need expertise.

Skyler: “Paying a per record fee when we are at a very early stage can be tricky. Knowing the actual cost of production of a record and having that be itemized is critical.”

Kate: “We need to find a way to recognize the value of checking a record even when significant information does not churn. We need to investigate measuring how an entire record is checked.”

Sasha: Time is tricky. a lot of the work will not be reflected in the metadata captured in digital tools. “Sometimes you need to chase an organization for 2 months, occasionally an organization will be really on top of it and hand the details to you proactively” “Phone conversations are currently offline and aren’t always able to be traditionally programmatically tracked. Beyond that, if you get sucked into a 50 minute conversation about a random side tangent with one organization” (David side thought: it helps build rapport and may help them be more helpful in the future so this is not a bad thing)

Kate: re expecting people to meet standards to participate, there is a level of expertise that involves personnel investment and specialization that can be difficult where the role is part-time or volunteer or not a primary responsibility.

Chris: “Being able to find the money in the system to support even just one FTE at a local level can really be differential for the quality of the data in that locality. Being able to earmark that funding and then allowing the local organization to get creative with how they spend that to drive results, can be a useful blend”

Can there be thresholds for compensation,

Thresholds for simple vs complex
Thresholds for minimal compliance with standards vs full compliance

We do want to track costs. But is it worth tracking all the metadata in order to compensate on a granular level?

Skyler:

1. Incentives are connected to Responsibilities, and

We don’t assume that data driven metrics are the exclusive method for establishing compensation

Also, I’m personally thinking in terms of ratios of payment, not necessarily cost of record maintenance. I’m inclined to decouple those concepts for at least.

Kate –

Assuming we get funding for data maintenance:

A funder especially gov don’t want to pay a bunch of people. They want to fund one. So it naturally falls to a model where they fund a top-level organization, and the top-level picks the next level, and those pick their local partners. And funding flows differently across levels.

What are non-monetary incentives? Often at the lowest levels, the exchange might need to be other than money: training, software, access to data + analytics, support resources… what else?

Need roles for Auditing / review / standards-setting. This makes it hard to pass on top-level funding to lowest level, cuz it eats a lot up.

Who should pay for what?

There might be different kinds of payers in the market –

Funders that just want to pay for production of data (government?)

Funders that want to pay for specialized products – curation

We want proper staff… and we also want those local relationships that may be more informal.

So – Utility concept of bottom-line / overarching responsibility to aggregate and publish

Steward concept for designated responsibility that adheres to standards.

Community partner concept for leveraging local relationships.

A utility plays stewardship roles as well as administration, auditing, and curation.

A community partner may play a stewardship role, if they have the capacity / interest in assuming a higher level of responsibilities.

Should the concept of a ‘curator’ be distinct from (even if co-assigned with) ‘steward’? What about ‘auditor’?

Can / should we try to unbundle stewardship responsibilities for complex organizations, i.e. to distribute responsibilities across the service / location level?

David: “One thing that I think will be significant to investigate for capturing the value of the administration, education, and licensing is the travel agent and travel agency business model, and how those organizations interact with the major airlines, hotels, cruise lines, etc. Its a random spot of knowledge I have a decent amount of detail on, and the structure is really relevant to what I have learned”

bloom · May 14, 2025, 7:32pm

Thanks again to those who joined. I went through the notes and tried to pull out a set of takeaways, itemized at the top of the notes and pasted below.

Next week many of us will be at the Inform USA conference, in which these converastions can continue informally.

Is there interest in re-starting the conversation in June? Say June 3rd or 10th at 11aEastern? If so, what should the agenda be?

see below and chime in – thanks!

Takeaways:

Roles:

Utility: responsible for aggregate, publish, and quality control. May have specific front-line stewardship responsibilities in addition to bottom-line / umbrella responsibilities; may also play role of curator and auditor for other stewards.
Steward: responsible for a specified level of data management of a given set of resources.
Auditor: responsible for assessing the quality of data managed by a given set of stewards, to ensure compliance with standards. (performed by Utility
Curator: responsible for ensuring consistency in subjective elements of resource records, especially category/taxonomy

There should be designated stewardship responsibilities for the ‘core’ part of the record. That said, taxonomies might benefit from being managed (curated) centrally.

For aggregated vs unbundled service information – i.e. Programs that might involve multiple subsidiary services – Perhaps bundled program record is shared broadly, and discrete service records are kept locally and/or provided for a fee.

^ Question: Can modern transformer tools help accommodate a both/and balance between loose and strict? So that systems can have it both ways with tooling to automate bundling/unbundling.

Style guide : data standard :: style template : exchange profile

in order to get alignment on complex orgs, develop a ‘template’ for certain kinds of organizations (like gov agencies) to specify how information about them should be structure.

In order to get alignment on specific data point to share for specific purposes/users, develop an exchange profile – “Export profile” or “import profile” – that specifies particular fields for particular purposes, and sets up the crosswalking to be automatable.

Who should get paid for what?

Locals / domain experts / stewards should get paid for managing records –
- on a per-record basis?
Regional/state-level utility should get paid for aggregating comprehensive data, quality assurance, etc.
Producing “opinionated” data sets for specific consumers
Participation in cooperative processes i.e. governance activities
Some local work (non-standard contributions, unstructured input) can be contributed by community partners who might not get paid but can benefit in nonmonetary ways (training, support)
CBOs should get paid for providing services! This is important overall; but it may or may not be relevant to data management strategy.

Topic		Replies	Views
Open Referral community convening General	1	579	January 24, 2022
Hello from Outpost! General	11	410	February 16, 2023
Proposals for HSDS 3.1 Governance	0	38	September 6, 2024
Whitepaper: producing resource directory information as a public good General	1	171	May 7, 2024
Demo of taxonomy/terminology alignment tool Technical	20	107	March 11, 2025

Resource Data Federation: let's discuss

Related topics