November 19, 2024

Why bias-free structured data must be the foundation on which to build AI systems.

At the recently held DataEngBytes conference, Joe Reis presented his Mixed Modal Arts concept; it looks something like this:

- Unstructured data (files, documents, audio, video)

- Semi-structured data (Excel, csv)

- Structured data (databases)

What follows is our take on this concept, and how it relates to the Occam Data Framework.

The underlying control system regardless of the output (dashboard or AI) has to be structured data.

‘Structured data provides the irrefutable record’.
David Ding (futurist, philosopher, and former Callaghan Innovator)

Bias-free data:

David Ding also considers the importance of ‘bias-free’ data. That is, data that is free from prejudice, stereotypes or systemic discrimination. In the world of business where we operate, we interpret this as data that doesn’t lean towards a specific purpose, such as accounting or CRM. Data that has a business bias renders the data incapable of reuse for other purposes. The bias-free irrefutable record is the foundation record on which to build your AI. This is because it is slowly changing, while your AI requirement will be constantly changing.

The goal of structured data is to create a bias-free irrefutable record of the world.

Machine Learning should convert semi-structured and unstructured data into structured data:

During this process, Machine Learning should be used to apply the constraints that are missing from semi-structured and unstructured data, in order for this data to be converted to structured data.

Constraints are the magical gatekeepers of data quality and why databases exist and we don’t use Excel for all things data. Constraints are the database telling the user that the data is compliant.

Semi-structured data isn’t constrained like structured data:

Therefore machine learning can be used to enforce the constraints that lead to the bias-free irrefutable record. A constraint on data is a rule that ensures the data accurately records reality. A simple and obvious constraint is that a date of birth cannot happen in the future. This is a constraint the unstructured data will never have.

Another example, taking sensor data — which arrives at high speed into a fast write database such as a NoSQL DB — there are no constraints. Machine learning can be used to test patterns for data accuracy, while converting the semi-structured sensor data into an irrefutable structured record.

That top layer of unstructured data:

Sending a foundation of bias free irrefutable data to your AI along with your unstructured Excel, PDFs and Word documents will produce far higher quality results than sending a bunch of non verified data.

The bias-free Occam Data Framework:

Structured data and the bias-free irrefutable record must remain the target baseline of any analytical or AI-based plan the enterprise may have.

The Occam Data Framework provides the bias-free irrefutable record. This record forms the foundation of any successful data requirement, be it for operational, analytical, or AI needs.

Details on the DataEngBytes community
https://dataengconf.com.au/
https://www.meetup.com/auckland-data-engineering-meetup

Note from the author:
This document has been 100% written by me , with no help from AI. Not that I’m against AI, it’s just that I like to do my own thinking. FYI, I do love AI generated poems.

more insights

Data 2.0 and the Pinch Point the Industry Can’t Escape

October 8, 2025

Occam founder Steven MacLeod took the stage at the recent #DataEngBytes Conference in Sydney with a stark message: Data 2.0 is coming, and it will trigger one of the biggest structural shifts the industry has ever faced. Here’s the key insight from his presentation:

Why We Need to Move to Data 2.0

September 29, 2025

…but why your tech company won’t tell you that

At the recent DataEngBytes conference in Sydney, most of the 500 people in the room were talking about how to add agentic AI to their stack. Occam founder Steven MacLeod was one of the few who didn’t.

Instead, he stood up and told them: “The modern data stack is a Frankenstein’s monster — stitched from mismatched parts and somehow expected to dance.”

That line got their attention.

Rod’s Been Thinking About AI — So Have We

August 1, 2025

Rod Drury recently put out a call to action looking for 10 meaningful things we can do to drive change through AI. In response, Craig Hampton from