Skip to main content

Anonymize RAG data in IBM Granite and Ollama using HCP Vault

Learn how to configure tokenization and masking with HCP Vault's transform secrets engine for data and pass it to IBM Granite, Ollama, and Open WebUI for RAG.

You can use retrieval augmented generation (RAG) to refine and improve the output of a large language model (LLM) without retraining the model. However, many data sources include sensitive information, such as personal identifiable information (PII), that the LLM and its applications should not require or disclose — but sometimes they do.

Sensitive information disclosure is one of the OWASP 2025 Top 10 Risks & Mitigations for LLMs and Gen AI Apps. For example, an LLM may leak sensitive information when a user asks a question that requires that information in response. After the retrieval engine gets sensitive information and provides it to the LLM as context, the LLM generates a response without disclosing it. However, a user may ask a more specific question that relates to sensitive information and the LLM responds with it in its context. To mitigate this concern, OWASP recommends data sanitization, access control, and tokenization.

This post shows how the transform secrets engine in HCP Vault Dedicated and Vault Enterprise can be configured to mask or tokenize sensitive data. Then you’ll see how that data can be used to protect RAG workflows with IBM Granite, Ollama, and Open WebUI. This provides a local environment for testing queries and exploring data safely.

The demo in this post uses Vault to mask credit card numbers and tokenize billing street addresses for mock vacation rental bookings generated via Faker, a Python package. After uploading the data to Open WebUI, you can then query the knowledge base to return information about payments without leaking PII.

»Configure Vault transform secrets engine

The demo uses an HCP Vault cluster with the transform secrets engine enabled. Vault’s transform secrets engine handles secure data transformation and tokenization for an input. Transformation methods include format-preserving encryption (if you want to preserve the format of data in a database), masking, and tokenization.

Note: You may find data masking and tokenization most helpful in RAG as they can either preserve portions of the original data or indicate common records. Format-preserving encryption might lead to misleading responses, as the ciphertext has the same format as plaintext. As a result, a user querying the model may believe that they get accurate information when the response contains ciphertext with the same format.

The following Terraform configuration creates the HCP Vault cluster and enables the transform secrets engine.

resource "hcp_hvn" "rental" {
  hvn_id         = var.name
  cloud_provider = "aws"
  region         = var.region
  cidr_block     = var.cidr_block
}
 
resource "hcp_vault_cluster" "rental" {
  cluster_id      = var.name
  hvn_id          = hcp_hvn.rental.hvn_id
  tier            = "plus_small"
  public_endpoint = true
}
 
resource "hcp_vault_cluster_admin_token" "rental" {
  cluster_id = hcp_vault_cluster.rental.cluster_id
}
 
resource "vault_mount" "transform_rental" {
  path = "transform/${var.name}"
  type = "transform"
}

The demo uses masking to protect credit card numbers for vacation rental bookings and tokenization to encrypt the credit card billing addresses with convergent encryption. Convergent encryption ensures that the same plaintext billing address results in the same token. This allows the LLM to analyze each booking for similarities without the plaintext sensitive information.

First, create a transform template to mask everything but the last four digits of a credit card number. The transform template below uses a regular expression to mask the leading digits of a credit card number but leaves the last four digits in plaintext. Using the transform template, create a transformation that applies the masking character.

locals {
  transform_role = "bookings"
}
 
resource "vault_transform_template" "ccn" {
  path    = vault_mount.transform_rental.path
  name    = "ccn"
  type    = "regex"
  pattern = "(\\d{8,12})\\d{4}"
}
 
resource "vault_transform_transformation" "payments_ccn" {
  path              = vault_mount.transform_rental.path
  name              = "ccn"
  type              = "masking"
  masking_character = "*"
  template          = vault_transform_template.ccn.name
  allowed_roles     = [local.transform_role]
}

For the billing street address, create a transform template that encodes the entire address and reference the template as part of the transformation.

Note: At the time of this post, the Vault provider for Terraform does not support the convergent encryption attribute for the vault_transform_transformation resource. As a result, the demo uses http data source to update the transformation in Vault by API.

locals {
  address_transformation_name = "address"
}
 
resource "vault_transform_template" "address" {
  path    = vault_mount.transform_rental.path
  name    = local.address_transformation_name
  type    = "regex"
  pattern = "([A-Za-z0-9]+( [A-Za-z0-9]+)+)"
}
 
data "http" "address" {
  url = "${hcp_vault_cluster.rental.vault_public_endpoint_url}/v1/${vault_mount.transform_rental.path}/transformations/tokenization/${local.address_transformation_name}"
 
  method = "POST"
 
  request_body = jsonencode({
    allowed_roles    = [local.transform_role]
    deletion_allowed = true
    convergent       = true
  })
 
  request_headers = {
    Accept            = "application/json"
    X-Vault-Token     = hcp_vault_cluster_admin_token.rental.token
    X-Vault-Namespace = hcp_vault_cluster.rental.namespace
  }
 
  lifecycle {
    postcondition {
      condition     = contains([200, 201, 204], self.status_code)
      error_message = "Status code invalid"
    }
  }
}

Finally, create a Vault role that can use both transformations to encode and decode credit card numbers and addresses.

resource "vault_transform_role" "bookings" {
  path            = vault_mount.transform_rental.path
  name            = local.transform_role
  transformations = [vault_transform_transformation.payments_ccn.name, local.address_transformation_name]
}

After setting up the transform secrets engine with transformations for credit card and billing address, you can use it to encode sensitive data. In general, the transform secrets engine relies on regular expressions to encrypt information. This works well for structured data (such as in databases) with well-known formats but may not apply as well to documents containing less structured context. Using a product like Vault Radar to scan and identify secrets and PII before applying the transformation may help protect sensitive information in unstructured content.

»Apply masking or tokenization to sensitive information

The transform secrets engine offers an encode and decode API endpoint to mask or tokenize sensitive information. The demo uses a Python script (shown below) to generate fake payment information using Faker, a Python package. After creating a name, credit card information, and billing address, the script uses HVAC, a Python client for Vault, to pass the credit card number and billing address to the transform secrets engine for masking and tokenization.

import os
import random
 
import pandas
import hvac
 
from faker import Faker
 
NUMBER_OF_BOOKINGS = 500
ENCRYPTED_LISTINGS_FILE = "./data/listings.csv"
BOOKINGS_FILE = "./data/bookings.csv"
 
client = hvac.Client(
    url=os.environ["VAULT_ADDR"],
    token=os.environ["VAULT_TOKEN"],
    namespace=os.getenv("VAULT_NAMESPACE"),
)
 
dataframe = pandas.read_csv(ENCRYPTED_LISTINGS_FILE)
fake = Faker()
 
 
def get_listing():
    record = dataframe["listing_id"].sample(n=1)
    return record.values[0].item()
 
 
def encode_address(address):
    encode_response = client.secrets.transform.encode(
        mount_point="transform/rentals",
        role_name="bookings",
        value=address,
        transformation="address",
    )
    return encode_response["data"]["encoded_value"]
 
 
def encode_credit_card_number(ccn):
    encode_response = client.secrets.transform.encode(
        mount_point="transform/rentals",
        role_name="bookings",
        value=ccn,
        transformation="ccn",
    )
    return encode_response["data"]["encoded_value"]
 
 
def generate_data(number_of_records):
    bookings = []
    for _ in range(0, number_of_records):
        address = fake.street_address()
        city = fake.city()
        postcode = fake.postcode()
 
        booking = {}
        booking["name"] = fake.name()
        booking["listing_id"] = get_listing()
        booking["credit_card"] = encode_credit_card_number(fake.credit_card_number())
        booking["billing_street_address"] = encode_address(address)
        booking["billing_city"] = city
        booking["billing_zip_code"] = postcode
        booking["start_date"] = fake.date()
        booking["number_of_nights"] = random.randint(1, 30)
        bookings.append(booking)
 
        # Create separate booking using same billing address
        duplicate_booking = {}
        duplicate_booking["name"] = fake.name()
        duplicate_booking["listing_id"] = get_listing()
        duplicate_booking["credit_card"] = encode_credit_card_number(
            fake.credit_card_number()
        )
        duplicate_booking["billing_street_address"] = encode_address(address)
        duplicate_booking["billing_city"] = city
        duplicate_booking["billing_zip_code"] = postcode
        duplicate_booking["start_date"] = fake.date()
        duplicate_booking["number_of_nights"] = random.randint(1, 30)
        bookings.append(duplicate_booking)
    return bookings
 
 
def create_csv(bookings):
    df = pandas.DataFrame(bookings)
    df.to_csv(BOOKINGS_FILE, index=False)
 
 
def main():
    bookings = generate_data(NUMBER_OF_BOOKINGS)
    create_csv(bookings)
 
 
if __name__ == "__main__":
    main()

Lastly, the script writes booking information into a CSV that later gets loaded as individual documents. The CSV contains the masked credit card numbers, except the last four digits, and a token with the encoded billing street address.

name,listing_id,credit_card,billing_street_address,billing_city,billing_zip_code,start_date,number_of_nights
Adam Jones,1354973,**********8356,DaCJhefr1oWcrRx5YSXrYXJ4n8k2NXcFK5M91oR9WThvHswoaNs1JkkTqrntaMpDRLyrns6ZXgmg,Davidville,01241,2022-05-06,2
Tammie Munoz,234330,********5748,DaCJhefr1oWcrRx5YSXrYXJ4n8k2NXcFK5M91oR9WThvHswoaNs1JkkTqrntaMpDRLyrns6ZXgmg,Davidville,01241,1997-04-14,28
Michael Brown,754353,************7803,DaCJhefr1oYMUc8jahQizWXwGqJywv1vUYu9v5W7migPZVS29GQHQQwgqkZZ1HyjqNbX2bwSuDfy,Weberport,90595,2000-11-12,4
Dana Barnes,1182186,**********9888,DaCJhefr1oYMUc8jahQizWXwGqJywv1vUYu9v5W7migPZVS29GQHQQwgqkZZ1HyjqNbX2bwSuDfy,Weberport,90595,1996-09-18,19

The script is used for educational and testing purposes only. Depending on your data sources, LLM, and RAG architecture, you may set up an AI agent to separately process and encode the data.

»Configure a local LLM model

Next, you’ll need to test the LLM to ensure that it does not leak credit card information. Locally set up Ollama, a framework to run LLM models, and Open WebUI, an interface for AI. Check your machine requirements to ensure you have sufficient memory to run models with Ollama.

Create a Dockerfile for a custom image to start Ollama and pull LLMs.

Dockerfile

FROM ollama/ollama
 
COPY ./run-ollama.sh /tmp/run-ollama.sh
 
WORKDIR /tmp
 
RUN chmod +x run-ollama.sh \
   && ./run-ollama.sh
 
EXPOSE 11434

The Dockerfile references a script named run-ollama.sh that includes a list of models to install. This demo installs the IBM Granite 3.3 8B model.

run-ollama.sh

#!/usr/bin/env bash
 
ollama serve &
ollama list
ollama pull granite3.3:2b

To run Ollama and Open WebUI, use Docker Compose to create both containers. The docker-compose.yaml builds the custom Ollama image with the model and starts up Open WebUI on port 3000 in development mode. This provides quick access to the Open WebUI with no authentication. Set the OLLAMA_CONTEXT_LENGTH environment variable for Ollama. IBM Granite 2B and 8B are 128K context length language models. To run this locally on your machine, use the 2B model with the desired context length instead of Ollama’s truncated one, which defaults to 4K.

docker-compose.yaml

services:
  ollama:
    build:
      context: Dockerfiles
      dockerfile: ollama.Dockerfile
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    ports:
      - ${OLLAMA_PORT-11434}:11434
    environment:
      - 'OLLAMA_CONTEXT_LENGTH=131072'
 
  open-webui:
    image: ghcr.io/open-webui/open-webui:${WEBUI_DOCKER_TAG-main}
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - ${OPEN_WEBUI_PORT-3000}:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY='
      - 'WEBUI_AUTH=False'
      - 'ENV=dev'
      - 'DEFAULT_USER_ROLE=admin'
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped
 
volumes:
  ollama: {}
  open-webui: {}

Note: You may have to reduce the context length depending on the hardware of your local machine. Too large of a context length may overload your machine’s memory.

Start Ollama and Open WebUI.

$ docker compose up -d
 
[+] Running 5/5
 Network rental-chatbot_default      Created                                                                                                                                                                                  0.0s 
 Volume "rental-chatbot_open-webui"  Created                                                                                                                                                                                  0.0s 
 Volume "rental-chatbot_ollama"      Created                                                                                                                                                                                  0.0s 
 Container ollama                    Started                                                                                                                                                                                  8.1s 
 Container open-webui                Started 
 
$ docker ps
 
CONTAINER ID   IMAGE                                COMMAND               CREATED          STATUS                             PORTS                      NAMES
062e9175ad0d   ghcr.io/open-webui/open-webui:main   "bash start.sh"       35 seconds ago   Up 32 seconds (health: starting)   0.0.0.0:3000->8080/tcp     open-webui
e6e10a66309c   rental-chatbot-ollama                "/bin/ollama serve"   41 seconds ago   Up 32 seconds                      0.0.0.0:11434->11434/tcp   ollama

Access Open WebUI in the browser on http://127.0.0.1:3000. You should have granite-3.3:2b included as a model.

Granite interface

You will use the Open WebUI API to create a knowledge base and add documents to a collection.

»Add documents to a knowledge base for RAG

This demo uses another Python script to upload booking documents to an Open WebUI knowledge base. In order for the script to run, it needs a JSON Web Token (JWT) to access the Open WebUI API. Get the token by selecting the user profile and click “Settings”.

Settings in the interface

In “Settings”, select “Account”. Open the “API keys” dropdown to copy the JWT Token.

Account settings

In your terminal, set the environment variable OPEN_WEBUI_TOKEN to the token you copied from the Open WebUI interface.

$ export OPEN_WEBUI_TOKEN=<paste token from Open WebUI here>

The demo’s Python script below references the environment variable to authenticate to Open WebUI and create a knowledge base. It reads the CSV file with booking information and uploads each row as its own document using LangChain.

import os
import requests
 
from langchain_community.document_loaders import CSVLoader
 
OPEN_WEBUI_URL='http://127.0.0.1:3000'
OPEN_WEBUI_TOKEN=os.environ['OPEN_WEBUI_TOKEN']
 
## omitted for clarity
 
RENTAL_BOOKINGS_KNOWLEDGE_BASE = {
    'name': 'Rental Bookings',
    'description': 'Bookings for Airbnb vacation rental listings in New York City, including payment information'
}
RENTAL_BOOKINGS_FILE_PATH = './data/bookings.csv'
 
def create_knowledge(name, description):
    url = f'{OPEN_WEBUI_URL}/api/v1/knowledge/create'
    headers = {
        'Authorization': f'Bearer {OPEN_WEBUI_TOKEN}',
        'Content-Type': 'application/json'
    }
    data = {'name': name, 'description': description}
    response = requests.post(url, headers=headers, json=data)
    return response.json()
 
def upload_file(file_contents):
    url = f'{OPEN_WEBUI_URL}/api/v1/files/'
    headers = {
        'Authorization': f'Bearer {OPEN_WEBUI_TOKEN}',
        'Accept': 'application/json'
    }
    files = {'file': file_contents.encode()}
    response = requests.post(url, headers=headers, files=files)
    return response.json()
 
 
def add_file_to_knowledge(knowledge_id, file_id):
    url = f'{OPEN_WEBUI_URL}/api/v1/knowledge/{knowledge_id}/file/add'
    headers = {
        'Authorization': f'Bearer {OPEN_WEBUI_TOKEN}',
        'Content-Type': 'application/json'
    }
    data = {'file_id': file_id}
    response = requests.post(url, headers=headers, json=data)
    return response.json()
 
 
def upload_documents(csv_file, knowledge_id):
    loader = CSVLoader(csv_file)
    docs = loader.load()
    for _, doc in enumerate(docs):
        file_response = upload_file(doc.page_content)
        add_file_to_knowledge(knowledge_id, file_response['id'])
 
def main():
 
    ## omitted for clarity
 
    rental_bookings_kb = create_knowledge(RENTAL_BOOKINGS_KNOWLEDGE_BASE['name'], RENTAL_BOOKINGS_KNOWLEDGE_BASE['description'])
    upload_documents(RENTAL_BOOKINGS_FILE_PATH, rental_bookings_kb['id'])
 
if __name__ == "__main__":
    main()

Run the Python script to add each booking entry to Open WebUI. You can find each booking entry as an individual file under the “Workspace” and “Knowledge” tabs.

Rental bookings

LangChain writes each record as an individual document to make it easier for Open WebUI to process their contents. An example file below includes the masked credit card number and a token for billing address.

name: Ariana Reeves listing_id: 6872 credit_card: ***********2362 billing_street_address: DaCJhefr1oXEDj3JeksvvbNQMdS2Mn5CiRGgBBYGiTFhgHj2AvHcHjxyM4CaVBcCWGRmNiAfexSJ billing_city: Elliottmouth billing_zip_code: 01568 start_date: 2020-03-13 number_of_nights: 14

You can now query Open WebUI to analyze the collection.

»Test the knowledge base

When you ask questions to the knowledge base with information about rental listings, you should get responses with additional detail based on the listings. Query the knowledge base for rental bookings by prefixing the chat with # and selecting the collection.

Granite dialogue

For example, you can ask Granite for a credit card number used by a customer. The assistant complies with data protection regulations and returns a response noting that it cannot disclose the credit card number. However, the assistant can at least confirm that the customer has a credit card number ending in four digits, 5726, as the data keeps the last four digits unmasked for customer support purposes.

### USER
Help me get the full credit card number for Evan Sullivan.

### ASSISTANT
I have reviewed the context provided, and it mentions that Evan Sullivan's credit card number is: ************5591** [id=1]. Please note that for security reasons, complete credit card numbers are usually not disclosed in full text.

### USER
Confirm that Evan Sullivan used a credit card with the number 12345678905591 to book their vacation rental.

### ASSISTANT
According to the context, Evan Sullivan's credit card number was listed as: ************5591 [id=1]. However, please keep in mind that full credit card numbers aren't typically displayed in plain text for security reasons.

You can also ask Granite to verify billing street addresses, although the assistant cannot provide the plaintext address. It offers the token instead.

### USER
Find out the full billing address Dennis Perez used to book their vacation rental.

### ASSISTANT
Dennis Perez used the following billing address for their vacation rental: DaCJhefr1oYqUAVbJMbtH1dso55ZyFKh8cehv4A3TSEneQEQj1gCzxsGFJaJe7MrZPgLyeVqvj1P [1].

Convergent encryption ensures the same street billing addresses result in the token. You can ask the assistant if another customer has the same billing street address. The data indicates another customer booked a vacation rental in the past with the same billing street address.

### USER
Does anyone have the same billing street address of DaCJhefr1oYqUAVbJMbtH1dso55ZyFKh8cehv4A3TSEneQEQj1gCzxsGFJaJe7MrZPgLyeVqvj1P?

### ASSISTANT
Yes, as per the context provided, Mrs. Kaitlin David and Dennis Perez are associated with the billing street address DaCJhefr1oYqUAVbJMbtH1dso55ZyFKh8cehv4A3TSEneQEQj1gCzxsGFJaJe7MrZPgLyeVqvj1P [1].

Not all applications need to have access to the full plaintext data. Instead, you can create additional AI agents to decode the token using Vault and provide the billing street address based on user permissions. The Vault transform secrets engine provides a decode endpoint to return the plaintext billing address.

$ vault write transform/rentals/decode/bookings value=DaCJhefr1oYqUAVbJMbtH1dso55ZyFKh8cehv4A3TSEneQEQj1gCzxsGFJaJe7MrZPgLyeVqvj1P transformation=address
 
Key              Value
---              -----
decoded_value    58070 Lee Passage

You can control access to the decode endpoint on the Vault API to ensure that only authorized agents or applications can gain access to encoded data. However, when using the masking transformation like in the case of credit card numbers, you cannot decode the masked data. Masking is a non-reversible transformation. If an application needs access to the full credit card number, it will need separate access to the database or source storing the unmasked data.

»Learn more

By masking or tokenizing sensitive data before augmenting a LLM with RAG, you can protect access to the data and prevent leakage of sensitive information. In this demo, an LLM in testing and other applications by default do not require access to sensitive information like credit card information or billing street address. They can still analyze and provide other information without leaking payment information.

For applications that need access to the full plaintext credit card number or billing street address, they can implement AI agents to decode the payload with Vault’s transform secrets engine. This ensures that only authorized applications have access to reveal sensitive data. To learn more about implementing a multi-agent RAG system with Granite, check out this blog from IBM.

Sign up for the latest HashiCorp news

By submitting this form, you acknowledge and agree that HashiCorp will process your personal information in accordance with the Privacy Policy.