Resource Data Federation: let's discuss

We should make some time to talk about the challenges of resource data federation – by which I mean, enabling multiple directory maintainers to collaborate on data management across different information systems.

I think there are technical questions in here (like how to enable cooperation among people with different answers to questions like “what is a service?”) and also governance questions (like how to divide up responsibilities, incentives, conflict resolution etc) although the former might blur into the latter.

We certainly will need more than one conversation to really make progress, but maybe we could just start with a fishbowl discussion among people who have some experience. I’m happy to facilitate.

@klambacher has agreed to participate. I’m sure @skyleryoung will be interested. I also know of at least a couple of groups of people interested in pursuing this path who might not be here, but who I’d invite to listen in and bring questions.

NB: I want to wait for @MikeThacker to get back before picking a date because he’s asked for this for a while.

Who else is interested? What should be on our agenda? What questions would you want such a conversation to consider?

I’m definitely in. When is Mike back?

When is Mike back?

I’m back now.

OK great. I assume that since we’re spanning pacific coast to uk time, we’re looking at the 11a ET hour – sound right?

@klambacher @mikethacker @skyleryoung and anyone else who might be interested, let me know if you have any days in which 11a ET normally doesn’t work.

Next week i’ll schedule something after reaching out to the folks in the field who asked me for this convo.

OK I have folks from at least one state eager to join this conversation, so i’m ready to start scheduling.

I’m looking at 11a Eastern during any Monday, Tuesday, Thursday, or Friday during the last week of March or first week of April. I’ve created a poll here: Doodle

Please fill it out! And let me know if there’s anyone you think i should specifically invite. thank you :slight_smile:

OK we have picked March 25th at 11am Eastern for a discussion about resource directory data federation.

We’ll invite some opening remarks from @klambacher and also @skyleryoung. Then we’ll have discussion.

Please help set the agenda in advance: what questions come to mind when you consider the challenges and opportunities for different organizations using different information systems to collaborate in management of shared resource directory data? Please send in the questions you would most like to discuss, and I will try to shape an agenda that is worthwhile to everyone :slight_smile:

Hi folks – thanks so much to @klambacher for sharing with us the story of her epic journey through resource data federation. I’ve got notes here (feel free to add or clarify).

I know that several attendees had questions first and foremost about politics and trust and organizational strategy issues, so it was great to hear Kate’s lessons learned about those things. I don’t think we got very far into the technical challenges as pertains to some of our questions about evolving the specification – so I assume we may want at least one more round of discussion on this.

I will first check to see whether the same day/time will work for most folks next week or the week after – so, April 1st or 8th at 11a Eastern. Please let me know if that won’t work for you.

And please also share your observations and questions for the next agenda!

Good Afternoon, I am interested in participating in the next conversation(s) if possible please.
To your question about dates, I am available on the dates you mentioned as option (April 1st or 8th at 11a Eastern).
Thanks
Erika (No Wrong Door Virginia)

Hi folks – we’re scheduled to resume this conversation at 11a Eastern on tuesday next week.

Now that we’ve heard a real-world story of a federation experience from Kate, I’d like this next conversation to open up a bit to explore the technical and tactical challenges.

I know we have technical questions about structuring identifiers among distributed sources, and provenance metadata.

Also there’s the question of how to establish “official” or “core” or “golden” records that are reliably verified by a designated steward in a way that still enables others to add their own custom data.

We also have questions about structuring governance, and incentives – i.e. how organizations can be equitably compensated for distributed contributions.

What are your questions? Please let me know so we can do a bit of work in advance to shape the agenda :slight_smile:

@bloom it would be good to hear:

  • what motivates publishers to keep up-to-date with the latest version of HSDS/HSDA?
  • what additional properties do people add to help their federation? (e.g. Kate mentioned tags to decide what appeared in what outputs)

Hi folks – Thanks to those who joined for the time yesterday.

Notes are here: Standing Technical Committee Meeting Notes – Open Referral - Google Docs

I went through to bold the parts that I thought were most significant. I would welcome more comments and key takeaways from you all.

I know that this question pops out at me, and seems relevant to the feedback loop design question, and in turn the metadata specifications:
can/should there be nested (vertical) levels of responsibility? So vertical rather than horizontal redundancy – i.e. local/topical responsibility for record stewardship, overseen/bottom-lined by higher-level umbrella stewardship. (Is vertical vs horizontal the right frame?)

Should we pick up there? Are there other questions you want to discuss next?

I’m going to send out a calendar item to reconvene and continue the discussion for two weeks from now. Let me know if you’d like to keep this going and if that would work for you.

OK notes from this conversation below and here, please feel free to help clarify / add / organize. This is a bit circular but a good discussion. we have a couple of specific questions we could pick up next. what do you want to discuss next?

Incentives –

“Local stewards are more likely to have relationships and are able to get people on the phone at the organization. More experienced data managers, however, understand the needs for the funders and how to craft a highly specialized and high quality record.”

Kate: “reconciling these two strengths is both necessary and possible.” experienced data managers can pass down their knowledge / build up data quality, while locals can build and maintain relationships.

“Duplicate record management is not always bad, in-fact some overlap in management of a singular resource drives more rich information and more frequent updates. Reconciling those records to match (suggest things are the same) and then map (human editing and confirmation of the matching)…” Creating a golden record set that distills this (?)

Who should get paid for what?

  • Locals / domain experts/stewards for managing records on a per-record basis.
  • Regional/state-level for bottom-lining comprehensive data, quality assurance, etc.
  • Producing “opinionated” data sets for specific
  • Participation in cooperative processes i.e. governance activities
  • CBOs should get paid for providing services! This is important; it may or may not be relevant.

First two bullets can be gamed / played unequally. Organizations (local OR high-level/utility) might be incentivized to maximize their benefits in ways that aren’t in the best interests of users or the network as a whole. Example of incentivizing the wrong things: paying more for complex records leads to needless complexity.

^ but these challenges can be addressed through governance.Contract based payments with incredibly specific restrictions around what records are being compensated has really been proven to be effective in disincentivizing cheating in the system. Agreements on fair terms, and monitoring and conflict resolution processes to manage and evolve the agreements over time. These can also yield technical safeguards, though that has to follow agreements. “Tight standards and inclusion policies, especially coupled with auditing, make it so that each steward can keep their own records, that’s their business, and whether they get paid or not is on the basis of standards.”

Get clear on what we are trying to incentivize?

  • Reliability.
  • Responsiveness
  • Relationships.

Are there different thresholds – some kinds of data are objective, can be collected with relatively minor levels of expertise; some kinds are specialized and need expertise.

Skyler: “Paying a per record fee when we are at a very early stage can be tricky. Knowing the actual cost of production of a record and having that be itemized is critical.”

Kate: “We need to find a way to recognize the value of checking a record even when significant information does not churn. We need to investigate measuring how an entire record is checked.”

Sasha: Time is tricky. a lot of the work will not be reflected in the metadata captured in digital tools. “Sometimes you need to chase an organization for 2 months, occasionally an organization will be really on top of it and hand the details to you proactively” “Phone conversations are currently offline and aren’t always able to be traditionally programmatically tracked. Beyond that, if you get sucked into a 50 minute conversation about a random side tangent with one organization” (David side thought: it helps build rapport and may help them be more helpful in the future so this is not a bad thing)

Kate: re expecting people to meet standards to participate, there is a level of expertise that involves personnel investment and specialization that can be difficult where the role is part-time or volunteer or not a primary responsibility.

Chris: “Being able to find the money in the system to support even just one FTE at a local level can really be differential for the quality of the data in that locality. Being able to earmark that funding and then allowing the local organization to get creative with how they spend that to drive results, can be a useful blend”

Can there be thresholds for compensation,

  • Thresholds for simple vs complex
  • Thresholds for minimal compliance with standards vs full compliance

We do want to track costs. But is it worth tracking all the metadata in order to compensate on a granular level?

Skyler:

​​1. Incentives are connected to Responsibilities, and

  1. We don’t assume that data driven metrics are the exclusive method for establishing compensation

Also, I’m personally thinking in terms of ratios of payment, not necessarily cost of record maintenance. I’m inclined to decouple those concepts for at least.

Kate –

Assuming we get funding for data maintenance:

A funder especially gov don’t want to pay a bunch of people. They want to fund one. So it naturally falls to a model where they fund a top-level organization, and the top-level picks the next level, and those pick their local partners. And funding flows differently across levels.

What are non-monetary incentives? Often at the lowest levels, the exchange might need to be other than money: training, software, access to data + analytics, support resources… what else?

Need roles for Auditing / review / standards-setting. This makes it hard to pass on top-level funding to lowest level, cuz it eats a lot up.

Who should pay for what?

There might be different kinds of payers in the market –

Funders that just want to pay for production of data (government?)

Funders that want to pay for specialized products – curation

We want proper staff… and we also want those local relationships that may be more informal.

So – Utility concept of bottom-line / overarching responsibility to aggregate and publish

Steward concept for designated responsibility that adheres to standards.

Community partner concept for leveraging local relationships.

A utility plays stewardship roles as well as administration, auditing, and curation.

A community partner may play a stewardship role, if they have the capacity / interest in assuming a higher level of responsibilities.

Should the concept of a ‘curator’ be distinct from (even if co-assigned with) ‘steward’? What about ‘auditor’?

Can / should we try to unbundle stewardship responsibilities for complex organizations, i.e. to distribute responsibilities across the service / location level?

David: “One thing that I think will be significant to investigate for capturing the value of the administration, education, and licensing is the travel agent and travel agency business model, and how those organizations interact with the major airlines, hotels, cruise lines, etc. Its a random spot of knowledge I have a decent amount of detail on, and the structure is really relevant to what I have learned”