Anonymize your PostgreSQL production databases in minutes
Anonyx connects to your production PostgreSQL instances and produces anonymized copies ready for development, testing and analytics. Referential constraints, indexes, custom types and statistical consistency are preserved. No application code changes required.
- PostgreSQL 12 to 16 supported, self-hosted and cloud (RDS, Aurora, Cloud SQL, Azure)
- Foreign keys and unique constraints integrity preserved automatically
- 80+ ready-to-use transformations (emails, SSN, IBAN, phone, geolocation)
- Deterministic anonymization by default to preserve inter-table correlation
- Native PostgreSQL types preserved: JSONB, ARRAY, HSTORE, ENUM, RANGE
- EU sovereign hosting, GDPR compliance built-in
How Anonyx anonymizes a PostgreSQL database
Anonyx establishes two standard PostgreSQL connections: a read connection to the source database (production or snapshot) and a write connection to a target database (typically empty, created by your team). You define anonymization rules at the project level - by column, by table, or by naming pattern - and Anonyx orchestrates the full pipeline.
The engine introspects the PostgreSQL schema through information_schema and pg_catalog, identifies constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK), triggers and materialized views, then computes the optimal processing order to avoid integrity violations when writing to the target. Columns not covered by an explicit rule are copied verbatim or flagged as likely PII via automatic detection.
Anonymization is deterministic by default: for a given input value and a given project salt, the anonymized output is stable. This property is essential to preserve inter-table joins: a user_id = 4218 in orders will be transformed into the same target value as id = 4218 in users, guaranteeing full relational consistency in the anonymized database.
Supported PostgreSQL types
Anonyx natively understands the full PostgreSQL standard type set: integers, floats, arbitrary-precision numerics, text, dates and timestamps with or without timezone, UUID, MAC address, INET, CIDR. Semi-structured JSONB and JSON types are parsed and can be anonymized in depth via JSON Path. PostgreSQL arrays (text[], integer[]) are processed element by element. Enumerated types (ENUM) are preserved or remapped based on your policy.
For geospatial types (PostGIS), Anonyx provides dedicated transformations: coordinate fuzzing within a configurable radius, grouping into administrative cells, or replacement with administrative center. HSTORE types are treated as key-value pairs. Range types (INT4RANGE, DATERANGE, TSTZRANGE) are anonymized while preserving interval width when needed.
Performance and scaling
Anonyx leverages PostgreSQL native parallelism: source reads are partitioned by primary key range (or by time slice for time-series tables), and multiple workers process distinct partitions simultaneously. On databases up to 100 GB, full anonymization typically completes in 8 to 40 minutes depending on rule complexity and worker count.
For larger databases (up to 5 TB on the Business plan), Anonyx supports incremental mode: only rows modified since the last run are re-processed, leveraging tracking columns (updated_at, xmin, or custom triggers). CDC (Change Data Capture) mode via logical replication is in beta for real-time streams.
Integration into your workflow
Anonymization is exposed as a job triggerable manually from the Anonyx UI, schedulable via cron (Quartz syntax), or driven through the API. Native CI/CD integrations cover GitHub Actions, GitLab CI, Jenkins and Bitbucket Pipelines, with templates provided to refresh your test database before every integration suite run.
On the observability side, each run emits webhook events (success, failure, duration, processed volume) and standard Prometheus metrics. Structured JSON logs can be ingested into your ELK, Datadog or Grafana Loki stack with zero additional transformation.