Question 1

What is the difference between anonymization and pseudonymization?

Accepted Answer

Pseudonymization (GDPR art. 4(5)) replaces identifiers with pseudonyms, but re-identification remains possible with additional information: the data is still personal data and stays within the scope of the GDPR. Anonymization, as described by recital 26, makes re-identification reasonably impossible: the data falls outside the scope of the GDPR. The EDPB assesses that impossibility with three criteria: singling out, linkability and inference.

Question 2

Is pseudonymization enough for my test environments?

Accepted Answer

It is often a good starting point: pseudonymization is a minimization and security measure recognized by GDPR articles 25 and 32, and it limits the impact of a leak. But the data remains personal data under the GDPR: your obligations (legal basis, security, retention) still apply to test environments. To go further, you need to neutralize the re-identification risk, which is what the k-anonymity controls in Anonyx measure.

Question 3

What is k-anonymity?

Accepted Answer

A dataset is k-anonymous if every combination of quasi-identifiers (zip code, birth date, gender…) is shared by at least k people. The higher k is, the harder it is to single out an individual. Anonyx detects quasi-identifiers, computes equivalence classes on every run and applies the policy you choose: report, generalize, suppress or fail the run.

Question 4

Who decides whether my data is anonymous under the GDPR?

Accepted Answer

The data controller, ideally together with their DPO. The qualification depends on the dataset, the rules applied and the context (what external data could be cross-referenced). Anonyx provides the technical evidence: the re-identification risk report of each run documents the k threshold reached, the quasi-identifiers handled and the rows affected.

Anonymization vs pseudonymization: what actually matters for the GDPR

What the GDPR says

The three EDPB criteria (Opinion 05/2014)

What Anonyx covers, concretely

How to qualify your dataset

Measure the re-identification risk of your test data