Obfuscating data in SQL Server

profile
Tim Deschryver
timdeschryver.dev

Obfuscating data is a common practice to protect sensitive information, such as Personally identifiable information (PII). In this post, I'm going to show how you can obfuscate data in SQL Server.

You have different options to obfuscate data. Probably the easiest way developers do this, is by having a obfuscation layer right after the data is retrieved from the database, or when the data is returned to/from the UI layer.

However, this approach is not very secure, as developers can still access the original data by inspecting the data in the database directly. This is not what you want, especially when you are working with sensitive data.

A better solution is to obfuscate the data in the root, in the database itself.

Masking Update Script link

This approach makes use of a user-defined function to transform a value into a masked value, which replaces some (or all) characters with asterisks. You can use this function in an UPDATE statement to obfuscate selected columns containing sensitive data.

Only use this in development (and maybe testing) environments to prevent developers from accessing sensitive data. Do not use this in production, as it will permanently change the data.

For example, many systems I've worked on use, or have been using, datasets copied over from a production environment. While this makes your application easier to test because you can use real data with a representable load, it's also a security risk as it exposes sensitive data to development teams.

The following script, which is generated by GitHub Copilot, creates the mask function. For more variations and other techniques, you can also take a look at the Red Gate blog.

Note

You can also mask the data client- or serverside with your application code (e.g. an ORM), but users with database access can still see the data.

The function retrieves the value, the length of the unmasked portion, and the position of the unmasked portion. It then calculates the mask and the unmasked portion based on the input parameters and returns the masked value. Some examples of how to use the function are shown below.

Input Value Unmasked Length Masked Position Masked Value
1234567890 3 start 123*******
1234567890 4 end *****7890
1234567890 2 middle 12******90
1234567890 2 full ******

If you know your database schema in-depth, you can use the dbo.MaskData function within UPDATE statements to obfuscate sensitive data. Otherwise, you will need to inspect the schema for sensitive columns. Luckily, you can use the following query to find columns containing a specific string, for example, 'SocialSecurityNumber'.

Based on the results, you can write UPDATE statements to mask the data.

For specialized data structures, you may need to use a different masking function to handle the data correctly. For example, for email addresses, you may want to mask the username and keep the domain name. This results in a valid email address.

Dynamic Data Masking link

Of course, you cannot use the above approach in production environments as it will erase the original data. Instead, you can use SQL Server's built-in feature called Dynamic Data Masking to prevent users from seeing PII data.

Note

The Dynamic Data Masking feature is available in SQL Server 2016 and later versions.

With Dynamic Data Masking, you can give access to sensitive data to users who need it, while masking the data for users who don't. This is useful to not only prevent developers from accessing sensitive data but also to prevent different user groups from seeing data they shouldn't. For example, the customer service should not see the financial data of a customer.

To add a mask to a column, you can define the mask while creating the column or alter the column to add a mask.

Now, when you query the table as an unauthorized user, you will see the masked data. The Name column is partially masked, showing only the first two characters, the rest is replaced with 'X'. The Email column is masked with the email function, which shows only the first letter and also replaces the domain with XXXX. The Age column is masked with the random function, which replaces the original value with a random number between 1 and 255.

By default, only the admin can query the original data. To give a user access to the masked data, you need to grant the UNMASK permission to the user. The permission can be finely specified from the column level to the entire database, you can give a user access to only the masked data of a specific column, a table, a schema, or the entire database.

Take a look at the official documentation for more information on how to use Dynamic Data Masking, such as more masking options or how to revoke access from a user.

Conclusion link

You have different options to obfuscate data.

While the most comfortable way, for us developers, is to have an obfuscation layer when the data is retrieved from the database, this doesn't fix the root. Doing this still keeps the original data in the database, which can be accessed by developers (or others) with database access.

As I've shown in this post, a better solution is to obfuscate the data in the database itself.

For development environments, you can update the original data with a masked value using the user-defined function MaskData, which I've shown in Masking Update Script. This is handy when the development database is a full or partial copy of the production database.

In production environments, I don't recommend updating the original data, as this is a permanent change that results in data loss (and unhappy users). Instead, you can use SQL Server's built-in feature called Dynamic Data Masking to prevent users from seeing sensitive data, as shown in this post. To give users access to the original data, you can grant the UNMASK permission to the user.

Feel free to update this blog post on GitHub, thanks in advance!

Join My Newsletter (WIP)

Join my weekly newsletter to receive my latest blog posts and bits, directly in your inbox.

Support me

I appreciate it if you would support me if have you enjoyed this post and found it useful, thank you in advance.

Buy Me a Coffee at ko-fi.com PayPal logo

Share this post