Many information and communication needs of our society can benefit from advanced language technologies, notably those based on deep learning. Documents and other textual data need to be maximally accessible to citizens across the EU, requiring them to be available in a multitude of languages. One challenge in this respect is that organisations have to process their documents in a way that individuals described in them cannot be identified anymore, i.e. to remove personal identifiable data, in order to be able to store and share these documents in a GDPR-compliant way (respecting the EU’s General Data Protection Regulation). Given the scale at which information is produced in various domains, deidentification efforts should be supported by tools that automate a part of the process and provide their results for review to domain experts.