Reproduce a customer bug without production access
When a bug only reproduces against one customer's data, the shortest path to a fix is access to that data. Access to production isn't on the table. Shared staging is stale and probably doesn't contain that tenant anyway. The DBA export will arrive next week. Meanwhile the ticket sits.
This guide covers a specific alternative: a one-button CI job that reads from your DR warm standby, subsets to one tenant, masks PII, and hands back a connection string to a clean repro database inside the VPC. No production access. No manual export. No six-week-old data.
The schema you're actually debugging
A typical B2B billing repro touches organisations, users, subscriptions,
invoices, line items, payment methods, and an audit_events table
with JSONB payloads. Miss one hop in the FK graph and your repro lies to you.
CREATE TABLE organizations (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_slug text NOT NULL UNIQUE,
legal_name text NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
organization_id bigint NOT NULL REFERENCES organizations(id),
email text NOT NULL,
full_name text NOT NULL,
role text NOT NULL DEFAULT 'member'
);
CREATE TABLE subscriptions (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
organization_id bigint NOT NULL REFERENCES organizations(id),
plan_code text NOT NULL,
status text NOT NULL
);
CREATE TABLE invoices (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
subscription_id bigint NOT NULL REFERENCES subscriptions(id),
issued_at timestamptz NOT NULL,
total_cents bigint NOT NULL,
currency char(3) NOT NULL DEFAULT 'USD'
);
CREATE TABLE invoice_line_items (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
invoice_id bigint NOT NULL REFERENCES invoices(id),
sku text NOT NULL,
quantity int NOT NULL,
unit_cents bigint NOT NULL
);
CREATE TABLE payment_methods (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id),
last_four char(4) NOT NULL,
provider_token text NOT NULL
);
CREATE TABLE audit_events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
organization_id bigint NOT NULL REFERENCES organizations(id),
occurred_at timestamptz NOT NULL,
payload jsonb NOT NULL
);
The bug lives in the join between invoices and invoice_line_items, but the
smoking gun is often in audit_events.payload — nested JSON with
actor emails and support notes. Regex over payload::text corrupts
the JSON and burns an afternoon. A proper JSONB masker handles this at the path
level.
Why a manual pg_dump script breaks
The usual workaround: dump organizations WHERE tenant_slug = 'acme-corp', manually add users, forget invoice_line_items, run sed
over emails, restore to staging. Monday morning the engineer reproduces
something, but totals still don't match because half the child rows never
made the trip — and the payment_methods for those users weren't masked
consistently with their user records.
PrivaCI Commercial subsetting starts from a root predicate and walks foreign
keys transitively: subscriptions because they reference the org, invoices because
they reference subscriptions, line items because they reference invoices, and
so on. The masking is deterministic from your salt — the same real value always
produces the same fake within a given field, and across fields when you configure
a shared seed_alias. Reruns with the same salt stay stable.
Read from the DR warm standby, not primary
SOURCE_DB_URL points at the warm standby in another AZ — the same
instance you'd fail over to during an outage, streaming replication caught up
to within seconds. You're not adding read load to primary, and you're not asking
a DBA for a one-off export. The CI job runs inside the VPC with credentials
that already exist for DR drills; no rows leave your network.
One practical note: if replication lag spikes during a bad deploy, treat it like any other DR concern and pause repro jobs until the standby catches up. You're debugging billing, not chasing ghosts from stale WAL.
Subsetting to organisation 451 (Acme)
Predicates are operator-authored SQL, not free-form input. Validate the org ID is numeric before it touches any config.
# commercial-extensions.yaml
version: "1.0"
subset:
- table: public.organizations
predicate: "id = 451"
- table: public.audit_events
predicate: "organization_id = 451 AND occurred_at >= now() - interval '90 days'"
The organizations root pulls subscriptions, invoices, line items,
users, and payment methods via FK expansion automatically.
audit_events has a FK to organizations too, so it
would also be included — but without a root predicate of its own, the closure
would pull Acme's entire event history. Declaring it as a second root lets you
replace that with a bounded predicate. The organization_id = 451
condition isn't optional: a predicate with only the time bound would scope the
closure to all tenants' events in that window, which is the opposite of what
you want.
Masking scalars and JSONB paths
Column rules live in mask-rules.yaml; JSONB path rules live
in
commercial-extensions.yaml alongside subsetting.
# mask-rules.yaml (excerpt)
version: "1.0"
global_salt: env://ANONYMIZATION_SALT
tables:
public.users:
strategy: transform
columns:
email: { action: fake, provider: email, seed_alias: user_email }
full_name: { action: fake, provider: full_name }
public.payment_methods:
strategy: transform
columns:
provider_token: { action: hash }
last_four: { action: static, value: "9999" } provider_token is hashed deterministically — the masked value is
stable across reruns with the same salt, which matters if your repro spans multiple
tables referencing the same token. last_four is a display-only
char(4) field; there's no business logic that depends on it being a realistic
digit string, so static is the right call over inventing a fake
that might suggest false precision.
# commercial-extensions.yaml (json_mask section)
json_mask:
- column: public.audit_events.payload
paths:
- path: $.actor.email
action: fake
provider: email
seed_alias: user_email
- path: $.actor.ip
action: hash
- path: $.internal.support_notes
action: remove
Shared seed_alias: user_email on the scalar column and the JSON
path keeps users.email and payload.actor.email in
sync when the audit log copies the same normalized string. Alignment assumes the scalar rule is not applying uniqueness suffixes (
is_unique) — JSON paths have no equivalent — and that casing
matches between column and payload (
Jane@Acme.com vs jane@acme.com diverge).
You only need that alignment when the debugging workflow compares email
strings across tables. Most billing repros trace by numeric ID — invoice
totals against line items via invoice_id, payment state via user_id, org scope via organization_id. If your
payload stores actor.user_id, join audit events to users on that key and skip email correlation entirely. Use seed_alias when the payload carries a copied email (or username,
token, etc.) with no stable FK key in the JSON.
support_notes is removed entirely: free-text fields in audit logs
frequently contain customer-identifying context that a fake provider can't reliably
neutralise.
Wiring it as a self-service job
Scaffold a starting workflow with privaci generate-ci, then
add the dispatch input and input validation. The engineer on call fills in
one field; the rest runs unattended.
# .github/workflows/support-repro-db.yml
name: support-repro-db
on:
workflow_dispatch:
inputs:
organization_id:
description: "Numeric organization id (no slug, no quotes)"
required: true
type: string
jobs:
build-repro:
runs-on: ubuntu-latest
env:
SOURCE_DB_URL: ${{ secrets.DR_WARM_STANDBY_URL }}
TARGET_DB_URL: ${{ secrets.SUPPORT_REPRO_DB_URL }}
ANONYMIZATION_SALT: ${{ secrets.ANONYMIZATION_SALT }}
PRIVACI_MARKETPLACE_PRODUCT_CODE: ${{ secrets.PRIVACI_MARKETPLACE_PRODUCT_CODE }}
steps:
- uses: actions/checkout@v4
- name: Validate org id
run: |
[[ "${{ inputs.organization_id }}" =~ ^[0-9]+$ ]] || \
{ echo "organization_id must be numeric"; exit 1; }
- name: Render subset predicate
run: |
sed -i "s/id = 451/id = ${{ inputs.organization_id }}/g" \
commercial-extensions.yaml
sed -i "s/organization_id = 451/organization_id = ${{ inputs.organization_id }}/g" \
commercial-extensions.yaml
- name: Mask tenant slice
run: |
docker run --rm \
-e SOURCE_DB_URL -e TARGET_DB_URL -e ANONYMIZATION_SALT \
-e PRIVACI_MARKETPLACE_PRODUCT_CODE \
-v "$PWD/commercial-extensions.yaml:/config/commercial-extensions.yaml:ro" \
-v "$PWD/mask-rules.yaml:/config/mask-rules.yaml:ro" \
ghcr.io/boundarylogic/privaci-commercial:1.0.1 \
privaci run \
--config /config/mask-rules.yaml \
--commercial-extensions /config/commercial-extensions.yaml
- name: Post connection string to Slack
if: success()
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
run: |
curl -s -X POST https://slack.com/api/chat.postMessage \
-H "Authorization: Bearer $SLACK_BOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"channel": "#support-repro",
"text": "Repro DB ready for org ${{ inputs.organization_id }}. DSN in 1Password → support-repro. TTL: 48h."
}'
A few things to note in this config: the image is the Commercial build (ghcr.io/boundarylogic/privaci-commercial:1.0.1
— pin the version tag from your Marketplace subscription in production). The sed substitution is blunt but effective for a trusted internal
tool — the numeric validation above it is what makes it safe. The connection
string itself goes to a private Slack channel with a 48-hour TTL reminder; the
DSN lives in 1Password, not in the message. Before merging any change to mask-rules.yaml, run a policy-diff gate in PR CI with strict_autodetect: true so new columns need an explicit rule. Commercial
v1.0.1+ writes the diff JSON before exiting non-zero — see the CI ephemeral guide
for a full check-masking-policy job.
privaci preview --config mask-rules.yaml --policy-diff /tmp/policy-diff.json What the engineer gets
The target database starts empty. After the job completes: something like 40,000 rows instead of 400 million. The engineer connects with whatever SQL client they use day to day. The invoice totals still disagree with the line items — good, that's the bug.
-- FK integrity on the target before handoff
SELECT COUNT(*) AS orphan_line_items
FROM invoice_line_items li
LEFT JOIN invoices i ON i.id = li.invoice_id
WHERE i.id IS NULL;
-- The bug: invoice totals vs line items (trace by ID — no email needed)
SELECT i.id,
i.total_cents,
SUM(li.quantity * li.unit_cents) AS computed_cents
FROM invoices i
JOIN subscriptions s ON s.id = i.subscription_id
JOIN invoice_line_items li ON li.invoice_id = i.id
WHERE s.organization_id = $1
GROUP BY i.id, i.total_cents
HAVING i.total_cents <> SUM(li.quantity * li.unit_cents);
-- Optional: email alignment when payload copies actor.email (needs seed_alias)
SELECT u.email,
ae.payload #>> '{actor,email}' AS payload_actor_email
FROM audit_events ae
JOIN users u ON u.email = ae.payload #>> '{actor,email}'
WHERE ae.organization_id = $1
LIMIT 10;
The first query should return zero — a quick FK orphan check on the target
before you post "repro ready". It catches incomplete closure (a root
predicate that didn't match, for example). privaci verify is built for full-database runs where source and
target row counts should match; on a single-tenant slice from a multi-tenant
primary, use this orphan query instead of verify as your pass/fail gate.
The second query is usually where the ticket lives: mismatched totals
traced entirely through foreign keys. The third is optional — run it only
when your workflow joins audit payloads to user rows by email string and
you've configured matching seed_alias on both rules (see JSONB masking — seed_alias).
How long does it actually take?
For a mid-size tenant — a few thousand invoices, a year of audit events — expect 10–20 minutes end to end, most of which is the closure query walking the FK graph and the masked write. A whale tenant with dense JSONB audit history can run longer; the engine streams rather than buffers, so memory stays flat regardless. The job is slower than "query prod directly" and faster than waiting until Tuesday for a DBA export. That is the correct comparison.
What PrivaCI doesn't handle here
Provisioning the empty target. The engine expects an empty writable Postgres database and writes the masked graph into it. Spinning that up — an RDS snapshot restore, an ephemeral Cloud SQL instance, a Postgres pod in a preview namespace — is Terraform or your platform team's domain.
Teardown after TTL. Masked data accumulates if you don't enforce expiry. Auto-drop the repro database after 48 hours. This is policy, not product.
Composite FK edge cases. Multi-column foreign keys traverse normally in both directions. The one exception is pulling root-table rows referenced by a composite FK on a non-root table — that path is skipped and logged as a warning, so check job logs if a root table referenced through a composite key looks underpopulated. See the Commercial subsetting docs.
What you need
- PrivaCI OSS engine — masking, streaming, verify, deterministic salt, audit log on the target database.
- PrivaCI Commercial — FK-aware subsetting and JSONB path masking. Required for this workflow.
- Network path — CI runner to DR standby (read) and to the empty repro target (write), both inside the VPC.
- Secrets —
ANONYMIZATION_SALT(32+ random bytes, treated as a production credential), DR read DSN, repro target DSN, Marketplace product code.
Commercial plans · Engine quickstart · Full-database staging mask · Ephemeral CI test database