HSDS Schema Summary in Markdown for LLM Inference

CheetoBandito · March 21, 2025, 10:42pm

I cannot be sure that CSV or JSON would perform differently, better or worse, when communicating knowledge to an LLM. I think in most cases previously I would just pass in regular text or maybe stuff I copy and paste from a webpage. I thought markdown was a nice balance of human readability with the ability to mimic the tabular format in a machine readable way that you find in the schema documentation.

I do know that when embedding documents for vector inference, you have to chunk them into logical pieces. Markdown makes that easy because it has a defined way of capturing headers with the # symbol. When i embed documents for inference, I was finding it easy to chunk the documents based on those headers, and i figure engineers training models likely have similar pipelines using tools like Llama Parse.

As with a lot of AI / ML stuff, it is great when you can pass in your question in a format similar to the the format it has seen a lot of when training the model… I bet YAML does a good job, because of the thousands of stack overflow posts… lol. But maybe markdown is more broadly usable and easier to keep organized on my computer

Topic		Replies	Views
Our review of the HSDS Docs is available for comment Technical	5	308	October 19, 2023
First draft of HSDS 3.0 proposal Technical	8	467	November 2, 2022
What might a stricter version of the standard look like? Technical	4	26	January 10, 2025
Request for Comments: Data Guides for HSDS Technical	1	23	June 4, 2025
State of the Upgrade General	0	306	February 6, 2023

HSDS Schema Summary in Markdown for LLM Inference

Related topics