Governing AI Agents

Let’s start with the why

While interviewing for a AI engineer position, I was surprised that many questions were related to data governance and how to make sure agents don’t cause damage.

Until then, my only concern was creating a performant agent, which can understand its task. Never in my mind came the thought that they might cause damage when not working as expected (which is to be expected for a non-deterministic system).

The reality is that if you want your Agent to access any database containing PII data, you need to address the question of governance.

The solution

TL;DR: This whole topic is addressed in this free course on Andrew Ng’s deep learning platform, which explains and solves exactly this problem using the Databricks infrastructure.

1. Create a technical user (service principal) for your agent

In this course, they used Databricks as a platform, but the principle of creating a technical user is a best practice when letting work get done. From now on, I’ll call the technical user a service principal.

Whatever platform you choose or where you create the service principal, make sure that you give it the least amount of permissions following the principle of least privilege.

2. Create tags for sensitive data

Tagging the data is again a best practice.

A very naive approach would be to add an extra column in the existing table like confidential and alter the original table structure.

Let’s use a simple employees table as an example:

employee_id	name	department	ssn	salary
101	Jane Doe	Engineering	123-45-678	120000
102	John Smith	Sales	987-65-432	95000
103	Alice Brown	HR	456-78-901	80000

In this case, they used the following tags:

Public: Anyone can access (company policies)
Internal: Employees only (e.g., employee_id, name, department)
Confidential: Limited access (employee records, reviews)
Restricted: Highly sensitive (e.g., ssn, salary)

Based on this, we would tag the ssn and salary columns as ‘Restricted’.

3. Create an SQL View to filter out data

The SQL view creates a new table view, where one can only see pre-selected columns. Usually, neither a data analyst nor an agent should see all the data.

For our table, we could create a v_employee_directory view:

1
2
3
4
5
6
7
CREATE VIEW v_employee_directory AS
SELECT
  employee_id,
  name,
  department
FROM
  employees;

Now, the agent can query this view without ever seeing the ‘Restricted’ data.

4. Configure SQL Group permissions

Use granular permissions for the database.

First, the agent needs a Catalog (a Databricks concept) and SQL Schema container permissions to be able to work with the underlying data, like an SQL table.

The object permissions are inheritable and they apply to the MODEL (a Databricks MLFlow model), SELECT (reading data from tables), EXECUTE (running SQL functions) and CREATE TABLE (allows use of tables).

For our agent, we would grant SELECT only on the v_employee_directory view, not the base employees table.

5. Column masking

Masking is needed because the view can be bypassed (e.g., if an agent has broader permissions or finds another way to query the base table).

With masking, even if the agent could query the employees table, a rule would show a masked value for the ssn column, like XXX-XX-678.

6. Build tools for the agents

Instead of building python functions to query the data, we define the operations we need using SQL functions.

They inherit the caller’s permissions, can validate input, and every function call is logged.

For example, we could build a safe function get_department(employee_name) that only returns the department, which is much safer than letting the agent run SELECT * ....

Why Databrics?

I have to be honest and say that this is the first time I used Databricks and I just love it.

The second part of the course is where I see the benefits of using an integrated platform like it, and by that I mean:

Functions: we can directly reference the SQL Functions created previously.
Governance: direct connection to our service pricipal
Deployment: There is an automatic deployment after we test the agent.

For all implementation details make sure to check out the repository and finish the course.

Governing AI Agents#

Let’s start with the why#

The solution#

1. Create a technical user (service principal) for your agent#

2. Create tags for sensitive data#

3. Create an SQL View to filter out data#

4. Configure SQL Group permissions#

5. Column masking#

6. Build tools for the agents#

Why Databrics?#