ELRC as Best Practice Example for Enabling On-demand Machine Translation

Saarbrücken, Germany


In his recent post, Common Sense Advisory (CSA Research) analyst Arle Lommel stresses that “the path to statistical MT success is paved with big data”. Organisations wanting to develop good machine translation (MT) engines hence need to collect enough language resources to train their systems. According to CSA Research, two general approaches can be used for collecting such data: Harvesting open data sources and mining one’s own translations.

With ELRC, the European Commission has undertaken an unprecedented language data collection effort in a first step to support the adaptation of CEF AT - the Automated Translation platform of Connecting Europe Facility - to better respond to the needs of public services across all EU Member States, Iceland and Norway. As a kick-off for data collection, ELRC is organising local workshops in each of the 30 countries to understand the needs of national public sector administrations with regard to automated translation, to jointly identify relevant sources of multi-lingual language resources and, where relevant, discuss potential technical and/or legal issues involved in the use of such data for automated translation.

The fact that public sector information is governed by the PSI Directive (Directive 2003/98/EC on the re-use of public sector information) is of benefit for ELRC’s mission. Although personal information is in any case protected from public disclosure, the PSI paves the way for sharing such re-usable data between national public administrations and the EC in a way that private sector organisations cannot because of data access and privacy laws.

As such, ELRC presents a unique opportunity for participating countries to shape the automated translation platform CEF AT according to their translation needs. Lommel concludes: “Because the EC can centralize these datasets and build on its expertise in MT, it will provide services of a quality and breadth that no member state could develop on its own. The results help meet the EC’s internal needs, but will also support the member states’ requirements for translation, thus giving them an incentive to contribute.” For further information and for reading Arle Lommel’s complete post, please visit the CSA Research blog.