Self-hosted PostgreSQL data masking

Give every engineer production-realistic data. Leak nothing.

PrivaCI is an open-source engine for postgres data masking. It streams your PostgreSQL database into a safe, referentially-intact copy for staging, CI, and demos — entirely inside your own VPC. No PII ever leaves your network.

Apache-grade engineering, Elastic License 2.0. Read the docs or browse the source.

0 bytes
of PII leave your VPC — no phone-home, runs offline
100 GB+
databases stream in bounded memory (one batch in RAM)
100%
referential integrity — deterministic masking keeps FKs intact
Every column
recorded in a tamper-aware audit log on the target

The problem

Real customer data does not belong in staging

Every team needs realistic data to test against — and most shortcuts to get it create risk you cannot take back.

Hand-rolled scripts rot

pg_dump plus sed/UPDATE scripts break the moment the schema changes, silently leak new PII columns, and nobody owns them.

Broken foreign keys break tests

Naive masking randomizes each table independently, so joins dangle and staging stops mirroring production behavior.

Copying prod to staging is a breach waiting to happen

Real customer data in lower environments is the most common avoidable compliance finding — and the hardest to undo.

SaaS maskers want your data

Most managed tools exfiltrate rows to their cloud to mask them. That is the exact thing you are trying to prevent.

How it works

Source database in, safe database out — in five passes

01

Introspect

Read the source catalog — tables, columns, foreign keys, partitions, and implied keys — straight from pg_catalog.

02

Replicate

Recreate the DDL on an empty target in dependency order, deferring constraints to break FK cycles.

03

Stream & mask

COPY-binary rows through an in-memory three-tier masking pipeline and load them in foreign-key order.

04

Checkpoint

Commit a per-table checkpoint every batch so an interrupted run resumes exactly where it stopped.

05

Audit

Write every masking decision to the _privaci schema on the target for compliance review.

See it run

One command in your pipeline

Declare PII columns in YAML, dry-run against your schema, preview masked samples, then stream a referentially-intact copy — with verification and a signed audit trail on the way out.

PrivaCI terminal session: dry-run, preview masked samples, run the mask job, verify integrity, and export a signed compliance report
docker run --rm ghcr.io/boundarylogic/privaci:latest --help
privaci generate-ci

See the CI ephemeral guide for a full GitHub Actions workflow.

Open source, auditable, yours to run

The whole engine is free. Forever.

Streaming, foreign-key integrity, PII auto-detection, deterministic masking, crash-safe resume, and the audit log are all open source under ELv2. You can read every line.

Deterministic, FK-safe faking

Masking is a pure function of (config, salt, row). The same input always maps to the same output, so related rows stay consistent across every table.

Constant-memory streaming

COPY TO STDOUT (binary) → mask → COPY FROM STDIN, both legs concurrent. At most one batch (default 10k rows) is ever in RAM.

PII auto-detection

Scans column names, types, and pg_stats to classify PII with high/medium/low confidence. Ships patterns for email, SSN, phone, names, cards, secrets, and freeform text.

Two-tier masking pipeline

L1 deterministic rules plus L2 local SpaCy NER for freeform text — all driven by one YAML file. Pattern libraries cover email, SSN, phone, cards, secrets, and more.

Dry-run, verify, and drift gates

dry-run --report produces a reviewable plan before any write. verify audits a completed run without re-reading PII. detect-drift (commercial) blocks CI when the schema changes.

Referential integrity by design

Topological table ordering loads parents before children; cycles are broken with deferred constraints. Sequences and identity columns are re-synced.

Crash-safe resume

Per-batch checkpoints mean privaci resume continues from the last committed row — and refuses to resume if the source schema, config, or salt drifted.

Audit trail for compliance

_privaci.runs and _privaci.audit_log record every run and every column decision, with stable run identity for evidence.

Secure by default

Salt is required at startup (no silent default), PII never appears in logs, nothing is written to disk mid-run, and all SQL is parameterized.

Drop into any CI

generate-ci emits ready-to-commit GitHub Actions, GitLab CI, or Kubernetes CronJob workflows with least-privilege secrets baked in.

Compare approaches

Enterprise platforms vs. what mid-market teams buy

Evaluating Delphix, Tonic.ai, Greenmask, PG Anonymizer, or hand-rolled scripts? The matrix compares deployment models and capabilities across hosted SaaS, enterprise DDM, OSS tools, and PrivaCI. Large orgs often standardize on enterprise data platforms; PrivaCI is built for SMB and mid-market teams who need referentially intact staging data, audit evidence, and CI gates without six-figure contracts.

Capability pg_dump + scripts hand-rolled PG Anonymizer Postgres extension Greenmask OSS CLI Hosted SaaS Enterprise DDM PrivaCI OSS PrivaCI Commercial
Published pricing Yes Yes Partial Partial No Yes Yes
No platform team required Yes Partial Yes Yes No Yes Yes
Runs entirely in your VPC Partial Yes Yes Partial Yes Yes Yes
Preserves foreign keys No Partial Yes Partial Yes Yes Yes
Streams 100 GB+ in bounded memory No No Partial Partial Partial Yes Yes
Auto-detects PII columns No Yes Partial Yes Yes Yes Yes
Audit log of every change No Partial No Partial Yes Yes Yes
Crash-safe resume No No Partial Partial Yes Yes Yes
Schema-drift detection No Partial Partial Partial Partial No Yes
FK-aware data subsetting No No Yes Partial Yes No Yes
JSONB path masking No Partial Partial Partial Partial No Yes
Signed compliance reports No No No Partial Partial No Yes
CI preview & policy diff No No No Partial Partial No Yes

Published pricing

  • pg_dump + scripts · hand-rolled Yes
  • PG Anonymizer · Postgres extension Yes
  • Greenmask · OSS CLI Partial
  • Hosted SaaS Partial
  • Enterprise DDM No
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

No platform team required

  • pg_dump + scripts · hand-rolled Yes
  • PG Anonymizer · Postgres extension Partial
  • Greenmask · OSS CLI Yes
  • Hosted SaaS Yes
  • Enterprise DDM No
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Runs entirely in your VPC

  • pg_dump + scripts · hand-rolled Partial
  • PG Anonymizer · Postgres extension Yes
  • Greenmask · OSS CLI Yes
  • Hosted SaaS Partial
  • Enterprise DDM Yes
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Preserves foreign keys

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension Partial
  • Greenmask · OSS CLI Yes
  • Hosted SaaS Partial
  • Enterprise DDM Yes
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Streams 100 GB+ in bounded memory

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension No
  • Greenmask · OSS CLI Partial
  • Hosted SaaS Partial
  • Enterprise DDM Partial
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Auto-detects PII columns

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension Yes
  • Greenmask · OSS CLI Partial
  • Hosted SaaS Yes
  • Enterprise DDM Yes
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Audit log of every change

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension Partial
  • Greenmask · OSS CLI No
  • Hosted SaaS Partial
  • Enterprise DDM Yes
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Crash-safe resume

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension No
  • Greenmask · OSS CLI Partial
  • Hosted SaaS Partial
  • Enterprise DDM Yes
  • PrivaCI OSS Yes
  • PrivaCI Commercial Yes

Schema-drift detection

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension Partial
  • Greenmask · OSS CLI Partial
  • Hosted SaaS Partial
  • Enterprise DDM Partial
  • PrivaCI OSS No
  • PrivaCI Commercial Yes

FK-aware data subsetting

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension No
  • Greenmask · OSS CLI Yes
  • Hosted SaaS Partial
  • Enterprise DDM Yes
  • PrivaCI OSS No
  • PrivaCI Commercial Yes

JSONB path masking

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension Partial
  • Greenmask · OSS CLI Partial
  • Hosted SaaS Partial
  • Enterprise DDM Partial
  • PrivaCI OSS No
  • PrivaCI Commercial Yes

Signed compliance reports

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension No
  • Greenmask · OSS CLI No
  • Hosted SaaS Partial
  • Enterprise DDM Partial
  • PrivaCI OSS No
  • PrivaCI Commercial Yes

CI preview & policy diff

  • pg_dump + scripts · hand-rolled No
  • PG Anonymizer · Postgres extension No
  • Greenmask · OSS CLI No
  • Hosted SaaS Partial
  • Enterprise DDM Partial
  • PrivaCI OSS No
  • PrivaCI Commercial Yes

Yes · Partial or extra setup · No

Commercial

Enterprise-grade masking without enterprise procurement

Commercial v1 plugs into the same engine and adds tamper-evident compliance reports, FK-aware subsetting, JSONB path masking, schema-drift detection, and CI preview — from $99/mo on AWS Marketplace, not a six-figure DDM quote.

Answers

Frequently asked questions

Is it safe to run in our own VPC?

Yes. The engine runs entirely inside your network and never sends source data or PII to any external service. It does not phone home and works fully offline. The source is auditable under the Elastic License 2.0.

Does it preserve foreign keys and referential integrity?

Yes. Masking is deterministic from a salt, so the same input always maps to the same output. Foreign keys, composite keys, and cross-table relationships stay consistent, and sequences are re-synced.

How is this different from pg_dump plus some scripts?

PrivaCI streams large databases with bounded memory, auto-detects PII, preserves referential integrity, resumes after a crash, and writes an audit log of exactly what was masked — with no bespoke scripts to maintain as your schema evolves. Postgres extensions embed rules in the database; enterprise data platforms target larger org budgets and operating models.

Who is PrivaCI for?

SMB and mid-market engineering teams who need production-realistic Postgres staging data inside their VPC — with deterministic masking, FK integrity, and (on Commercial) signed reports and drift gates — at published Marketplace pricing, not a six-figure data-platform contract.

Is PrivaCI a Delphix alternative?

For Postgres-focused SMB and mid-market teams, often yes. Enterprise DDM platforms such as Delphix, Informatica Test Data Management, IBM InfoSphere Optim, Oracle Data Masking, and K2view serve large organizations with broad data estates and dedicated platform teams. PrivaCI targets the same masking outcomes — referential integrity, auditability, in-VPC execution — with a Postgres-native engine and published pricing for teams that do not need a full enterprise data platform.

How does PrivaCI compare to Tonic.ai?

Hosted synthetic-data platforms such as Tonic.ai, Gretel, Mostly AI, and Synthesized offer quick time-to-value for many teams. PrivaCI runs entirely inside your VPC: source data never leaves your network, masking is deterministic from your salt, and foreign keys stay intact across tables. Choose PrivaCI when in-VPC execution and Postgres referential integrity are requirements; evaluate hosted SaaS when a managed cloud workflow fits your security model.

How does PrivaCI compare to Greenmask or PG Anonymizer?

Greenmask is an open-source CLI for PostgreSQL masking; PG Anonymizer is a Postgres extension that applies rules inside the database. PrivaCI adds auto-detect, crash-safe resume, bounded-memory streaming, and (on Commercial) signed compliance reports and CI drift gates — while staying self-hosted like Greenmask. All three keep data in your environment; compare the matrix above on features your team needs beyond basic column transforms.

How big a database can it handle?

It is built for 100 GB+ sources. COPY-binary streaming keeps at most one batch (default 10,000 rows) in memory, so memory stays flat regardless of table size.

What happens if a run is interrupted?

Every batch commits a checkpoint inside the same transaction as the data write. privaci resume continues from the last committed row and refuses to resume if the source schema, config, or salt changed.

How does it find PII we didn't configure?

Auto-detect inspects column names, types, and pg_stats against a built-in pattern library, scoring each column high/medium/low. Run dry-run --report for a reviewable plan, or --strict-autodetect to fail CI on any uncovered PII column.

What is open source versus commercial?

The full masking engine — streaming, FK integrity, auto-detect, L1/L2 masking, dry-run, verify, resume, and the audit log — is open source under ELv2. Commercial v1 adds signed compliance reports, drift detection, FK-aware subsetting, JSONB path masking, CI preview, notifiers, and Marketplace entitlement.

What database engines are supported?

PostgreSQL today, including partitioned tables, identity/serial columns, and deferred constraints. The architecture is Postgres-native by design.