In collaboration with LT-Bridge, ELRC is organising a WMT21 shared task on European low-resource multilingual translation, which will focus on multilinguality in the cultural heritage domain for two Indo-European language families, i.e. North-Germanic and Romance.
Massively multilingual machine translation has shown impressive capabilities, including zero-shot and few-shot translation of low-resource languages. However, these models are often evaluated (and trained) from or into English, where the most data is available, and assuming that models generalise to other pairs and low-resource languages.
With our shared task on multilingual low-resource translation, we want to explore how information in one language can be transferred to other related languages by evaluating translation quality in low-resourced language pairs, but explicitly encouraging the use of data of the high-resourced language pairs in the same family. In doing so, we want to find out to what extent English and/or Spanish are required to obtain high-quality machine translation output of related languages – and if it turns out that translation of low-resourced languages can actually be improved by transfer learning, we want to jointly identify the best ways of combining the available data.
The shared task will be divided into two subtasks: Europeana thesis abstracts translation (North-Germanic languages from/to Icelandic, Norwegian Bokmål and Swedish) and Wikipedia cultural heritage articles translation (Romance languages from Catalan to Occitan, Romanian and Italian). The evaluation period will run from 29 June to 6 July 2021.
Further information and all important deadlines are also provided here. This joint effort is only possible thanks to the support of the Government of Catalonia and the EC-funded actions ELRC and LT-Bridge – and we are very much looking forward to putting a spotlight on these special language combinations with you!