DKAN + AI: From Catalog to Conversation

Session Category Emerging Technology Audience All Attendees

The promise of open data is straightforward: governments and institutions publish what they know and let anyone use it. The infrastructure behind that promise has gotten remarkably good — fast APIs, reliable catalogs, carefully maintained metadata. What hasn't kept pace is the distance between those capabilities and a person actually getting their answer. Somewhere in a twelve-million-row CSV is the information a reporter, a researcher, or a neighbor was looking for. Between them and that information is real work: knowing which question to ask, recognizing what the data actually represents, and interpreting results in context. That gap is where a lot of people stop.

I've built a set of Drupal modules that close that gap. They sit on top of DKAN, the Drupal distribution for open data catalogs, and they use a protocol called MCP (Model Context Protocol) to connect DKAN to AI agents like Claude. The result feels less like "searching a catalog" and more like talking to someone who already knows the data.

This talk is a practical tour, designed for all levels. You don't need to know DKAN or MCP walking in. We'll open with a short introduction to open data — who publishes it, why it matters, and the gap between "a dataset is published" and "a person got their answer" — then a quick tour of DKAN's metastore and datastore, and a look at why AI and open data turn out to be a natural pairing: schema discovery, query generation, and chart selection are things LLMs do well, and they map directly onto the work end-users are already trying to do.

Then we'll get into the demos:

  • A live demo of dkan_nl_query, a chat-style interface where users ask questions in plain English and get streaming answers, data tables, and inline Vega-Lite charts. I'll run live queries against a real dataset.
  • A live demo of dkan_mcp, a Drupal module that exposes more than fifty tools over DKAN's catalog, datastore, and internals. We'll connect Claude Code to the site and watch an agent explore it like a new team member.
  • A remote demo, pointing the same tools at a different DKAN instance over HTTP. Any DKAN site becomes an AI-accessible data source with no extra work from the host.
  • Architecture notes for the developers in the room. How the agentic loop is structured, how tool calls route directly to Drupal services, and how we keep token usage reasonable.
  • A frank section on the hard parts. Cost, latency, hallucinations on numeric answers, and the safety considerations of a write-enabled MCP server.

Like the developer learning to work with AI agents, open data is in the middle of its own transition. The question isn't whether these tools replace the platforms we've built. It's whether we can use them to help the data we've already published reach the people it was published for.

If you build Drupal sites, work with structured data, or are curious about what AI on Drupal looks like in practice, this talk is for you.