The boring part of Building AI Systems: Data governance and Security
- Stefan Vodilovski
- Mar 11
- 3 min read
AI doesn't live without data.
If you want to leverage AI in your business, you also need to make a serious effort to protect your data.
The reality is that many organizations rushing into AI adoption overlook governance and security.
This can lead to data breaches, exposing sensitive information, data poisoning attacks and so on.
You don't want that!
So here is an overview of what you need to think about when giving AI access to data!

Data Classification
Forget AI for a second, and think of this question:
"Do I understand what data I have in my system?"
If the answer is no, then there is work to do before you can responsibly use AI.
The first step is data classification.
Look at your data and determine what kind of data they contain.
For example:
Does it contain Personally Identifiable Information (PII)?
Does it include confidential business data?
Is it public or low-risk information?
If your data contains PII, you cannot simply feed it into AI systems. You must properly handle it first through techniques such as:
Masking
Anonymization
Access restrictions
Otherwise, you risk violating regulations like GDPR, which can lead to serious fines.
Even if the data isn’t personal, it may still be confidential. Leaking proprietary data can be extremely costly for a business.
Before anything else, identify what data you have. Only then can you decide how it should be handled.
Manage access
If everyone in your organization can access your data, problems are inevitable.
Even if the data isn’t sensitive, unrestricted access creates risk. People could accidentally or
intentionally modify it, and suddenly your source of truth is gone.
You should clearly define:
Which person can access which data
Which application can access which data
Which AI model can access which data
And you should follow the principle of least privilege.
That means giving users and systems only the permissions they absolutely need.
Most of the time:
Read-only access is enough.
Anything more should only be granted when absolutely necessary.
Also avoid direct user-based access whenever possible. Instead, assign roles and grant permissions to those roles.
Handling the Privileged Users
At some point, certain people will need more than read-only access.
But privileged access must be handled carefully.
Every privileged user should have:
A unique identity
A clearly defined role
Minimal elevated privileges
Avoid creating large groups of privileged users.
The same principle applies to applications and services. If an application needs database access, give it
its own identity and only the permissions required to perform its task.
Monitoring is also critical.
You should track what privileged users do, so if something unusual happens you can answer:
Who did it
When it happened
What exactly was changed
Behavior monitoring also helps detect anomalies. For example, if a user suddenly logs in at unusual times or accesses data they normally never touch, that could indicate a compromised account.
Encrypt everything
When thinking about data security, it’s useful to assume the worst case:
What if someone steals your data?
Even then, they should not be able to read it.
That’s why encryption is essential.
Data should be encrypted:
At rest - when stored in databases or storage systems
In transit - when moving between systems
Decryption should only be possible with secure keys, which must be stored and managed carefully with limited access.
Data is moving fast
One of the most important aspects of data governance and security is this:
"It has to be repeatable!"
Security cannot be a one-time task. You can't do it once and forget about it.
Data systems constantly evolve:
New pipelines appear
New datasets are created
New AI models require access
You need processes and systems that continuously enforce governance and security.
Different tools can help with this, but the core idea remains the same:
Data governance must be built into the system - not added later.



Comments