At the recently held DataEngBytes conference, Joe Reis presented his Mixed Modal Arts concept; it looks something like this:
-
- Unstructured data (files, documents, audio, video)
-
- Semi-structured data (Excel, csv)
-
- Structured data (databases)
What follows is our take on this concept, and how it relates to the Occam Data Framework.

The underlying control system regardless of the output (dashboard or AI) has to be structured data.
‘Structured data provides the irrefutable record’.
David Ding (futurist, philosopher, and former Callaghan Innovator)
Bias-free data:
David Ding also considers the importance of ‘bias-free’ data. That is, data that is free from prejudice, stereotypes or systemic discrimination. In the world of business where we operate, we interpret this as data that doesn’t lean towards a specific purpose, such as accounting or CRM. Data that has a business bias renders the data incapable of reuse for other purposes. The bias-free irrefutable record is the foundation record on which to build your AI. This is because it is slowly changing, while your AI requirement will be constantly changing.
The goal of structured data is to create a bias-free irrefutable record of the world.
Machine Learning should convert semi-structured and unstructured data into structured data:
During this process, Machine Learning should be used to apply the constraints that are missing from semi-structured and unstructured data, in order for this data to be converted to structured data.
Constraints are the magical gatekeepers of data quality and why databases exist and we don’t use Excel for all things data. Constraints are the database telling the user that the data is compliant.
Semi-structured data isn’t constrained like structured data:
Therefore machine learning can be used to enforce the constraints that lead to the bias-free irrefutable record. A constraint on data is a rule that ensures the data accurately records reality. A simple and obvious constraint is that a date of birth cannot happen in the future. This is a constraint the unstructured data will never have.
Another example, taking sensor data — which arrives at high speed into a fast write database such as a NoSQL DB — there are no constraints. Machine learning can be used to test patterns for data accuracy, while converting the semi-structured sensor data into an irrefutable structured record.
That top layer of unstructured data:
Sending a foundation of bias free irrefutable data to your AI along with your unstructured Excel, PDFs and Word documents will produce far higher quality results than sending a bunch of non verified data.
The bias-free Occam Data Framework:
Structured data and the bias-free irrefutable record must remain the target baseline of any analytical or AI-based plan the enterprise may have.
The Occam Data Framework provides the bias-free irrefutable record. This record forms the foundation of any successful data requirement, be it for operational, analytical, or AI needs.
Details on the DataEngBytes community
https://dataengconf.com.au/
https://www.meetup.com/auckland-data-engineering-meetup
Note from the author:
This document has been 100% written by me , with no help from AI. Not that I’m against AI, it’s just that I like to do my own thinking. FYI, I do love AI generated poems.


