top of page
Search

The boring part of Building AI Systems: Data governance and Security

  • Writer: Stefan Vodilovski
    Stefan Vodilovski
  • Mar 11
  • 3 min read

AI doesn't live without data.


If you want to leverage AI in your business, you also need to make a serious effort to protect your data.

The reality is that many organizations rushing into AI adoption overlook governance and security.

This can lead to data breaches, exposing sensitive information, data poisoning attacks and so on.


You don't want that!


So here is an overview of what you need to think about when giving AI access to data!






Data Classification


Forget AI for a second, and think of this question:


"Do I understand what data I have in my system?"


If the answer is no, then there is work to do before you can responsibly use AI.


The first step is data classification.


Look at your data and determine what kind of data they contain.

For example:

  • Does it contain Personally Identifiable Information (PII)?

  • Does it include confidential business data?

  • Is it public or low-risk information?


If your data contains PII, you cannot simply feed it into AI systems. You must properly handle it first through techniques such as:

  • Masking

  • Anonymization

  • Access restrictions


Otherwise, you risk violating regulations like GDPR, which can lead to serious fines.


Even if the data isn’t personal, it may still be confidential. Leaking proprietary data can be extremely costly for a business.


Before anything else, identify what data you have. Only then can you decide how it should be handled.


Manage access


If everyone in your organization can access your data, problems are inevitable.


Even if the data isn’t sensitive, unrestricted access creates risk. People could accidentally or

intentionally modify it, and suddenly your source of truth is gone.


You should clearly define:

  • Which person can access which data

  • Which application can access which data

  • Which AI model can access which data


And you should follow the principle of least privilege.


That means giving users and systems only the permissions they absolutely need.


Most of the time:

Read-only access is enough.


Anything more should only be granted when absolutely necessary.


Also avoid direct user-based access whenever possible. Instead, assign roles and grant permissions to those roles.


Handling the Privileged Users


At some point, certain people will need more than read-only access.


But privileged access must be handled carefully.


Every privileged user should have:

  • A unique identity

  • A clearly defined role

  • Minimal elevated privileges


Avoid creating large groups of privileged users.


The same principle applies to applications and services. If an application needs database access, give it

its own identity and only the permissions required to perform its task.


Monitoring is also critical.

You should track what privileged users do, so if something unusual happens you can answer:

  • Who did it

  • When it happened

  • What exactly was changed


Behavior monitoring also helps detect anomalies. For example, if a user suddenly logs in at unusual times or accesses data they normally never touch, that could indicate a compromised account.


Encrypt everything


When thinking about data security, it’s useful to assume the worst case:


What if someone steals your data?


Even then, they should not be able to read it.


That’s why encryption is essential.


Data should be encrypted:

  • At rest - when stored in databases or storage systems

  • In transit - when moving between systems


Decryption should only be possible with secure keys, which must be stored and managed carefully with limited access.


Data is moving fast


One of the most important aspects of data governance and security is this:


"It has to be repeatable!"


Security cannot be a one-time task. You can't do it once and forget about it.


Data systems constantly evolve:

  • New pipelines appear

  • New datasets are created

  • New AI models require access

You need processes and systems that continuously enforce governance and security.


Different tools can help with this, but the core idea remains the same:


Data governance must be built into the system - not added later.



Want to read more about data governance and security? Subscribe to our newsletter below





 
 
 

Comments


bottom of page