ELRC Experience Café at COTSOES New Technologies Working Group


On 23 May 2018, ELRC was invited to speak at and work with the COTSOES New Technologies Working Group. Focus of the vivid discussions was (i) questions about the sharing of language resources for machine translation (not only with ELRC, but also among different public bodies in general) and (ii) questions about the training of MT systems - in particular neural machine translation (NMT).

Following the opening presentation by Andrea Lösch (DFKI) about ELRC and eTranslation, several issues preventing the sharing of LR from public services were discussed. Anonymisation proved to be a good solution to make translations sharable, if the original translations contain personal data. However, it became also evident that still several language resources are not sharable because they contain confidential information. In order to still be able to train MT systems, it might be an option to simply share the corresponding language models instead and use them for adapting the MT system.

With regard to the training of NMT, Marcis Pinnis (MT expert from Tilde) showed different training scenarios for MT systems. Ideally, the customer has sufficient in-domain data. But even if this is not the case, Pinnis illustrated how domain adaptation can still be achieved using a smaller in-domain corpus or, if no translations were available at all, the system can be trained using backtranslation.

