How Change Data Capture Powers GenAI Chatbots: A Real-World Case Study — Part 1

Introduction: Why Data Drives GenAI Chatbots

Picture this scene: we’re in a high-value, deeply technical meeting with Microsoft’s elite “black-belt” engineers, hashing out the design of the next multi-agent AI solution. The digital room is buzzing with ideas and questions about agent workflows, system architecture, and the promise of what’s to come. Then, one of the black belts drops a statement that stops me in my tracks. “The agents and their workflows? That’s the easy part — maybe 10% of the effort,” he says (I’m paraphrasing here). “The real challenge — the other 90% — is all about the data.”

It was one of those moments that hit like a lightning bolt of clarity. In the whirlwind of Generative AI’s rapid evolution — where every day feels like a race to master the latest models and tools — here was a reminder of a timeless truth. It’s the same principle drilled into us by every AI course, webinar, and whitepaper: data is the foundation, the make-or-break factor, the unsung hero (or villain) of any AI project. And in that moment, amid the complexity of a futuristic solution, I found myself nodding in agreement, grounded by a challenge I’ve faced time and again.

In a previous article (https://orfin.ca/llm-routing-new-tech-same-problems/), I argued that even the shiniest new AI innovations often boil down to solving familiar problems with a modern twist. 

This blog dives into a real-world example from a recent client project — where I tackled a data hurdle head-on by building a Change Data Capture (CDC) mechanism to sync ServiceNow articles with a GenAI chatbot’s search system.

While the industry buzzes about the next big model or tool, practitioners like me — and the companies we serve — are wrestling with the bedrock of AI success: getting the data right.

Let’s explore how we turned that 90% challenge into a win.

Context: A Global Airline’s IT Data Challenge

Imagine you’re running IT for a global airline giant where hundreds of employees face daily tech hiccups. Some can’t access their email. Others get locked out of critical services. It’s the kind of chaos that’s all too familiar in large organizations.

But here’s where it gets interesting: one question alone — “How do I change my password?” — accounts for nearly 50% of all IT service desk calls. That’s over 4,200 calls a month for this airline. Now, factor in the cost: each call requires a human agent, averaging $32.39 per incident. Do the math, and that single question balloons into a staggering $138,000 monthly hit — or about $1.6 million annually.

That’s right: one routine query is quietly draining millions that could be fueling innovation elsewhere.

This was precisely the business challenge — a classic case of an IT operation stretched thin by preventable volume. The opportunity? Flip the script with an AI-powered fix.

Enter the IT Chatbot: an intuitive solution designed to let every employee ask any IT-related question and get instant answers. The goal was simple but ambitious — resolve issues like password resets upfront, slashing the need for human intervention. Only if the chatbot couldn’t crack the problem would a call escalate to the service desk.

What we’re talking about here is a game-changer solution — one that tackles a multimillion-dollar pain point with data and AI. Now, let’s dive into how we made it happen.

ServiceNow Data Sync: The Core Challenge

Building the IT Chatbot’s agent was, as expected, straightforward: capture the user’s query, embed it, summarize chat history if needed, retrieve context through semantic search, and generate an LLM response. Textbook stuff. But true to our Microsoft black belt’s insight, the real effort — 90% of the project’s complexity — lay in managing the data. And that’s where the journey got interesting.

The airline’s IT knowledge resided in ServiceNow, a robust platform requiring seamless data sync for the chatbot. However, the way that knowledge was organized varied across teams. Some groups maintained well-structured HTML articles, ready for immediate use. Others relied on attachments as their primary knowledge source — spanning formats like PDFs, GIFs, and even email screenshots.

This diversity presented a unique challenge, and our solution hinged on three critical principles: accuracytraceability, and consistency.

Accuracy

ServiceNow is a dynamic environment. Articles are updated daily — perhaps a process evolved, a new guide was added, or an outdated piece was retired. Our chatbot had to reflect these changes; employees needed the most current answers, no exceptions. This wasn’t the basic “chat with your documents” scenario from tutorials, where static files suffice. We needed a system to continuously sync and refresh the knowledge base, ensuring every query tapped into the latest data.

Traceability

Data ingestion isn’t foolproof — errors happen. When they did, we couldn’t afford blind spots. We required a clear method to trace issues back to their source: which article failed, and why? This traceability was essential for diagnosing problems — whether a formatting glitch or an unreadable attachment — and implementing lasting fixes to strengthen the system moving forward.

Consistency

Lastly, consistency was non-negotiable. We couldn’t settle for a pipeline that excelled in development but faltered in production. The airline expected a reproducible process, with performance in the live environment matching what we’d achieved in testing. Our goal was a seamless, dependable workflow that delivered the same high standard every time.

With the context clear and the important intricacies of the ServiceNow data source in view, we’ve got the foundation of the challenge we tackled.

Change Data Capture (CDC): Solving AI Data Sync

For this article, I’ll zoom in on the data-syncing challenge — specifically, how we kept ServiceNow articles flowing into the IT Chatbot’s knowledge base while upholding accuracy, traceability, and consistency. The agentic core of the solution (query handling, embeddings, and responses) will take a backseat here as we tackle the heart of the data problem.

What Is CDC, and Why Does It Matter?

Let’s address the elephant in the room: what exactly is Change Data Capture (CDC)? At its core, CDC is a technique for identifying and tracking changes in a database — insertions, updates, and deletions — without burdening the source system’s performance. It captures these changes, logs them, and makes them available for downstream processes. For the IT Chatbot, CDC in AI meant keeping tabs on every tweak to ServiceNow articles and attachments.

Why Build a CDC for This Project?

So, why did we lean on CDC to bridge ServiceNow and Azure AI Search? The answer ties directly to the project’s demands:

  1. ServiceNow’s Capabilities: ServiceNow provides robust data storage but lacks a native, automated way to reflect these changes elsewhere. CDC steps in as a tailored solution to capture and relay those updates seamlessly.
  2. Real-Time Sync: With CDC, changes in ServiceNow — say, a revised process or a new article — reflect in Azure AI Search almost instantly. This keeps the chatbot’s responses fresh and authoritative.
  3. Efficiency: Full data reloads are resource-heavy and slow. CDC focuses on just the changes — inserts, updates, deletions — lightening the load on both ServiceNow and Azure AI Search. It also makes it cheaper because we’re not calling the embedding model unnecessarily!
  4. Consistency: By applying updates as they happen, CDC ensures the chatbot’s knowledge base aligns with ServiceNow, delivering reliable answers every time.
  5. Scalability: As the airline’s data grows, CDC scales effortlessly, handling high volumes of changes without compromising performance — a must for an enterprise-grade solution.
  6. Low Latency: CDC minimizes the gap between a change in ServiceNow and its availability in AI Search, critical for a chatbot where timing matters.

By weaving CDC into the architecture, we created a pipeline that’s robust, efficient, and responsive — ensuring the IT Chatbot could tap into a knowledge base that’s always current and trustworthy. In the next part, we’ll unpack how we brought this solution to life and what it delivered.

So, CDC emerged as our key solution — a precise method to manage the data challenge and ensure the IT Chatbot’s reliability. But how did we bring it to life? 

In Part 2, I’ll reveal the implementation details: how we constructed the pipeline, its impact for the airline, and the critical lessons that could guide your next AI project through complex data demands. Stay tuned — you’ll want to see how we transformed this 90% hurdle into a complete success.


Discover more from Joshua Orfin

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *