The Gap Between AI Ethics Frameworks and What's Actually Measurable

Samson Lingampalli
Feb 10
5 min read

-By Rajeev Chakraborty

Your organisation probably has an AI ethics framework by now. Maybe you adopted one from government guidance. Maybe a working group wrote one. Maybe you signed up for industry principles. They all say roughly the same thing: be fair, be transparent, be accountable.

The problem isn't the principles. The principles are universal, well-intentioned, and almost entirely useless when you're trying to govern an AI system in production.

You deploy an AI system. Something that screens applications, prioritises cases, makes recommendations, and automates decisions. And immediately you discover that 'be fair' doesn't tell you what to actually do on Tuesday morning when the system is running, and real people are affected by its decisions.

Everyone Agrees on What, Nobody Agrees on How

Pick any three AI ethics frameworks and compare them. The EU AI Act. The OECD AI Principles. The UK AI Playbook. The NIST AI Risk Management Framework. They all say 'fairness matters'. Not one of them tells you which fairness metric to use.

That's not an oversight. It's because fairness means different things in different contexts. Demographic parity? Equalised odds? Predictive parity? Each optimises for something different, and you can't optimise for all of them simultaneously. Mathematics won't let you.

What we have collectively created are three major illusions.

Illusion 1: AI Framework says 'Be Fair', so it is?

Let's say your AI system screens job applications. Your framework says you must ensure fairness. Good. But what does that mean in practice?

Does it mean the system should approve the same percentage of applications from different groups? Or does it mean that among people it approves, they should be equally likely to succeed in the role? Or does it mean something else entirely?

These lead to different outcomes. You often can't achieve all of them at once. Mathematics doesn't allow it. So which one matters for your context? Your framework doesn't say.

Now imagine explaining this to someone whose application was rejected. They ask: 'Was this fair?' You check your framework. It says 'we ensure fairness'. But you don't actually know if this specific decision was fair, because you haven't defined what fair means for this system, and you're not measuring it.

That's the gap. Your AI framework gives you a principle. It doesn't tell you what to measure or how to know if you're meeting it.

Illusion 2: AI Framework says 'Be Transparent', so it is?

Transparency is even harder. Your framework says decisions must be transparent. So you tell people, 'an AI system was involved in this decision'. Job done?

Not really. Because transparent to whom? For what purpose?

The person affected by the decision needs to understand why it went the way it did. Your auditors need to verify that the system is operating correctly. Your risk team needs to spot when behaviour is changing. Parliament might want to understand how these systems work at scale.

Each of those groups needs different information. 'We have logs' doesn't help any of them. Logs tell you what happened. They don't tell you why, or whether it was appropriate, or whether this is part of a pattern you should be concerned about.

Real transparency means you can trace a decision backwards and explain it to whoever needs to understand it.

Illusion 3: AI Framework says 'Be Accountable', so it is?

Then there's accountability. Your framework says someone is accountable for AI decisions. Let's say it's the Chief Risk Officer. Good. But what does that actually mean when the system makes 10,000 decisions a day?

Can the CRO explain how any specific decision was made? Do they know if the system is treating different groups differently? Do they know when the system's behaviour has changed from what was intended?

More importantly, can they actually do something about it if needed? Is there a clear process for 'we need to pause this system'? Who has that authority? How quickly can it happen?

Being accountable on paper is easy. Being accountable in practice means you can see what's happening, you know when it's wrong, and you can act to fix it.

Why This Matters More for AI

You might be thinking: we run lots of systems, we can't monitor everything in real time, why is AI different?

AI is different because the impact is less visible and harder to catch.

If your payment system breaks, you know immediately. Transactions fail. People complain. It's obvious.

If your AI system is quietly screening out qualified candidates from certain backgrounds, you might not know for months. The system appears to be working. It's processing applications. It's making decisions. Nothing looks broken from a technical perspective.

But you're creating harm. At scale. In ways that compound over time. And because it's not obviously broken, nobody escalates it.

That's why frameworks alone aren't enough. You need the capability to see impacts that aren't obviously technical failures. Impacts on people. Impacts that show up as patterns across thousands of decisions.

What This Means in Practice

Closing this gap means building three capabilities:

First, translate principles into specific things you can actually measure. Not 'be fair' but 'we will measure whether this system approves applications at similar rates across different demographic groups, and we'll review this weekly'.

Second, measure it continuously. Not once when you launch. Because AI systems change. Data changes. Usage patterns change. What was fair last month might not be fair this month.

Third, give someone authority to act on what you're measuring. Clear thresholds that trigger escalation. Clear authority to intervene. Clear processes that work at the speed the system operates.

If your measurements show a problem, but nobody can do anything about it until the next governance committee meeting, you don't really have accountability.

Where to Start

If you're running AI systems, or thinking about deploying them, here's what to ask:

Can we explain who this system affects and how? Not just what it does, but who experiences the outcomes and what those outcomes mean for them.

Do we know if it's treating different groups fairly? And I mean actually know - measured, monitored, verified - not assumed based on vendor promises or pre-deployment testing.

If something goes wrong, can we explain what happened? To the person affected. To regulators. To the press. With evidence, not just reassurances.

Who has authority to intervene if needed? Not theoretical authority in a document. Actual authority, with clear triggers for when they should act.

If you can't answer those questions confidently, your framework isn't working. Not because the principles are wrong, but because principles alone don't create capability.

The Bottom Line

The gap between frameworks and measurability isn't a criticism of frameworks. They serve a purpose. They create shared language. They signal organisational commitment.

But they're not sufficient. And treating them as sufficient is dangerous, because it creates the illusion of governance whilst the actual capability to govern is missing.

The organisations that understand this aren't asking 'do we have a framework?'

They're asking: 'Can we see the impact our AI systems are having? Can we prove we're being fair? Can we explain our decisions? Can we act if we need to?'

And when something goes wrong, saying 'we had a framework' won't be enough. Because the question won't be 'did you have principles'. The question will be 'could you see what was happening, and could you do something about it'.

The answer to that question depends on measurement. Not frameworks. Measurement.

Over the next 12 weeks, I'm explaining RAI monitoring in plain English - what it is, why it matters, and how it works in practice. All free. Follow our page.

#ResponsibleAI #DigitalGovernment #PublicSector #ResponsibleAITracker #RAITracker #RAIT #RAITFramework