The future of data is not only the unification of siloed and isolated data, but also the convergence of the flow of data

DataEngBytes is an independent data engineering conference featuring speakers unaffiliated to a particular platform or vendor. Their 2024 conference was recently held in Auckland, New Zealand.

The subjects covered combined with expert speakers from around the world ensures the event is an up-to-date cross section of thought leadership in the data world.

This is a series of short posts relating the Occam Data Framework to what was discussed at the conference.

 

Data Eng Conference — The Good Bits, Part 1

 

The highlight of the conference would have to be Joe Reis’ talk on the Mixed Modal Arts of data. I was a guest on the Joe Reis podcast a few months ago and we agree on many things, especially the importance of data modelling.

What stood out on the future of data is his assessment that apps (operational systems), analytical systems, and ML/AI need to converge. This thread discusses my opinion on how the convergence needs to happen.

 

 

 

Before convergence, an explanation

 

 

Ordinary architecting

 

Currently, in the enterprise there are usually many operational systems that describe a business process.

 

    • The business process is described using a data model, code, and UI.

    • Then the data engineer re-describes the same business process using a completely fresh set of data models and code, and most likely document-writing (that no-one reads and is immediately out of date).

    • Then the data scientist comes along and once again creates another new set of rules around the very same business process.

    • Ditto for the API integrator.
    •  

Looking at the diagram below, this should already make you uncomfortable, as though this doesn’t feel right. Yet in many instances, the reaction is “look at my clever architecture”.

 

 

 

Solving for one isn’t solving for the other

 

In fact, it’s usually making things worse.

The problem is that software developers create operational systems that serve the purpose of the specification of that requirement, without regard for how the data may be used in the analytical and ML/AI systems. Once the operational system has met those requirements in the specification, data becomes someone else’s problem.

It’s not just the overused moniker of siloed apps and data

 
It is well known that data in the enterprise is siloed and isolated.

But the very same data and related programming is also siloed through code and toolsets along its own information life cycle — from operational data, to analytical data, to dashboards, AI, and the API.

 

 

It’s really the flow of data

 

At Occam, the future of data is the convergence of not just siloed and isolated data, but also the convergence of the flow of data.

It involves writing one set of rules that describe the operations of the business process, and using those instructions to create multiple outputs, all synchronised by the same instructions. The obvious starting point for making this happen is using metadata or a semantic layer. This is exactly what the Occam Data Framework does.

Looking at the diagram below, it now just feels right doesn’t it?

 

 

 

The Occam Data Framework

 

The Occam Data Framework uses one set of instructions to describe the business requirements, and then uses that requirement to create an entirely unified flow of data from the operational system, through ETL, API, and the analytical database.

Joe Reis calls for the convergence between the multiple flows of data and business processes. At Occam, this is exactly how it’s done.

 

Details on the DataEngBytes community:
https://dataengconf.com.au/
https://www.meetup.com/auckland-data-engineering-meetup

more insights

Why We Need to Move to Data 2.0

…but why your tech company won’t tell you that

At the recent DataEngBytes conference in Sydney, most of the 500 people in the room were talking about how to add agentic AI to their stack. Occam founder Steven MacLeod was one of the few who didn’t.

Instead, he stood up and told them: “The modern data stack is a Frankenstein’s monster — stitched from mismatched parts and somehow expected to dance.”

That line got their attention.

Read more >