We are delighted to inform you that an automatic text anonymisation tool has just been added to the free set of Natural Language Processing (NLP) tools offered through the EC language tools portal available to all public administrations, small and medium-sized enterprises, non-governmental organisations and academic bodies in the EU. 

Automatic text anonymisation is a valuable addition to the existing NLP toolkit (automated translation, speech transcription, named-entity recognition, and classification). Whilst not yet 100% accurate, this tool provides a useful support to  users who need to anonymise data that does not fall under the scope of application of the GDPR, protecting data subjects and alleviating the administrative burden of those entities that would otherwise be considered data controllers.

-

Indeed, pursuant to the GDPR, the processing of personal data is subject to many rules and requirements that the data controller must comply with. Among others, personal data can only be processed if a legal basis has been identified and needs to be deleted after the end of the retention period, and access to personal data is restricted.

If you want to keep documents and data containing personal data, then this text anonymisation tool could be what you need. The tool will automatically help you identify and either replace with invented details (replacement) or black out (obfuscation) proper nouns, dates, places and numbers, so that the anonymised document can be stored or shared without GDPR concerns.

By using the anonymisation tool, more data can be made available for data sharing and trading, the aim of the upcoming European Language Data Space. However, similarly to automated translation, please be aware that where complete accuracy is needed, your automatically processed text should be still subject to human review.

The main advantages of the anonymisation tool:

  • helps sharing confidential documents,

  • helps removing personal data, thus your documents could fall outside the scope of GDPR requirements,

  • is available in all 24 EU official languages,

  • has an easy-to-use interface through the EC's language tools website,

  • offers 2 options: black out or replacement of the anonymised data,

  • supports most common file formats (Word, Excel, PowerPoint, pdf …),

  • grants fully secured access through HTTPS direct download,

  • is available through an API for specific users (e.g., public administration).

Try it yourself: https://language-tools.ec.europa.eu/