ChatGPT, Claude, and their many cousins

09/07/2025
  • images
  • images

ChatGPT, Claude, and their many cousins are being much more widely adopted than you would guess even from the hype. Enthusiasts will argue for this or that variant are appropriate or inappropriate use, but that does not change the reality of what is happening in businesses today.

I have been stunned by the wildly creative ways are being applied, and equally amazed that oversight is being relegated to a To-do list. I believe this is because, when humans are replaced with or assisted by AI, the humans were not being monitored either. Investigating complaints is not oversight. It’s the difference between locking your doors or just waiting to be robbed and then complaining to the police. To avoid regulatory, legal, and reputation risks, we should have been proactively monitoring these activities already, so why haven’t we?

The problem lies with humans. Human-in-the-loop oversight of any low event rate process is doomed to failure. Psychologists have known this for years. Humans get bored. Humans don’t find what they don’t expect to see. Humans get so accustomed to pushing the “Approve” button that they stop thinking about it. Humans are slow.

The irony of AI or human oversight is that humans cannot do it without help. Enter … of course … AI. Academic studies have shown that you cannot ask an AI to self-assess, that guardrails are necessary but insufficient, and that a second LLM can have the same biases and the frontline AI.

DFA has been studying this problem, literally, for a decade. Long before LLMs were available, I was just scientifically curious about how one would do oversight of an LLM? I even wrote a book that never quite got published, but I chopped into papers that you’ll find in journals like AI and Ethics. The chapter I never got around to publishing offered a solution that became DFA’s AI Monitor™, www.DeepFutureAnalytics.AI.

The solution is that you need a human to interpret the regulations, business rules, and ethical standards and put them into a clear set of assertions. Our clients usually have 20 to 30 such assertions against which frontline communications (AI or human) must be compliant. The question to our 2nd line LLM is simply, “Does this communication comply with this assertion?” and ask for a green, yellow (uncertain), or red (doubtful) answer. The yellows and reds go into a dashboard for human review. The greens get archived for computing performance metrics for agent comparison and audit review.

Now, the human is not being asked to scan hundreds of messages for dozens of possible failures – finding an apple tree in a forest. Instead, we tune the system such that 1 in 5 to 1 in 10 of the flagged messages is expected to be bad. A human can stay alert when they know they’re just looking for the grocery store aisle that sells fruit.

So, the punchline is that if you’d like an AI monitoring solution that is working today and ready for deployment today, give us a call. Honestly, it’s pretty cool.

Joseph Breeden
Posted on LinkedIn