Newsletter Issue #1
Discover the European Language Resource Coordination (ELRC)
The main goal of the ELRC initiative is to provide relevant language resources (LRs) to help improve the quality, coverage and performance of the European Commission’s CEF (Connecting Europe Facility) eTranslation solutions in the context of current and future CEF digital services. The purpose is to overcome existing language barriers within Europe and, in particular, within the different European digital service infrastructures (DSIs). This aims to achieve a fluid multilingual information exchange for the EU citizens.
Bearing this in mind, ELRC manages, maintains and coordinates the provision of LRs in all official languages of the EU and CEF associated countries. The EC makes use of these resources to improve the aforementioned eTranslation platform and to render it at the service of the DSIs and their users.
For further details on ELRC’s work please visit http://lr-coordination.eu/
A general definition for “language resources” can be found here: http://lr-coordination.eu/resources
The reasons for gathering language resources for the purpose of Machine Translation in the public sector (why language resources are needed and how they can be used in the EU MT platform), as well as details on the types of data useful for MT training and where to look for them are further detailed here.
The ELRC-SHARE Repository
A major outcome of the initial 2-year ELRC initiave (SMART 2014/1074) is the ELRC-SHARE repository.
The ELRC-SHARE repository application currently offers all functionalities related to data sharing. Such functionalities include:
notification by a data provider about the existence of a language resource and provision of basic metadata (resource name, resource type, language),
uploading and storage of the actual resource,
documentation with extended metadata elements following the META-SHARE metadata schema,
search and browse of the resource inventory,
download the resource depending on the rights of use assigned to it.
What do ELRC Network and ELRC Data do?
As a follow up of the initial ELRC project, two new initiatives started in December 2016 for a 3-year duration, ELRC Network and ELRC Data, under the SMART 2015/1091 programme. Both of them follow-up on the ELRC’s objectives and apply different means to provide the necessary language resources to the EC.
ELRC Network focuses on the collection of LRs raising awareness through the organisation of a new series of workshops and conferences. It also provides the technical infrastructure (ELRC-Share repository) to host collected LRs as well as legal and technical support through an online Helpdesk and dedicated services for IPR clearance.
ELRC Data aims to implement the acquisition and production of additional LRs as well as their related processing services (e.g. data conversion, anonymization, etc.). Data are stored and documented in the ELRC-SHARE repository and IPR clearance is assured by the legal help from the ELRC Network.
In parallel to the identification, collection, production and processing tasks, ELRC Data also offers an on-site assistance service (see below) for public administrations and other data providers needing some help to process their data.
On-site Assistance Service
Public administrations and other data owners that possess language resources, but need assistance in processing these resources, can request on-site assistance from the ELRC consortium. A member of the ELRC consortium with special knowledge on the specific language processing issue will travel to meet representatives of your organization on-site and thus provide the required assistance.
The ELRC consortium has a great degree of experience providing assistance in language resource processing, particularly in areas such as pre-processing requirements, data formats and data management workflows. Much of this experience comes from the consortium’s former experience with LR collection and processing.
Data processing activities and practical help in applying better data management practices is the main focus of the on-site consultancy requests.
The consortium will perform the task of on-site assistance in close collaboration with the network of National Anchor Points (NAPs) in CEF countries. The NAP members will be engaged to support consortium partners in providing assistance to data owners in the relevant countries of each NAP, as well as to assist in language-specific support issues.
In the Services section of the ELRC website, stakeholders are given multiple options for services provided by the ELRC consortium, including:
However, for any other query which is not clearly covered by these items, stakeholders are invited to either send a request for on-site assistance or address themselves to the ELRC-Data experts by sending their question to email@example.com. The experts will get in touch with the stakeholder(s) to discuss their questions or needs.
First Year Results for ELRC Network and ELRC Data
The European Commission’s online machine translation service is called MT@ EC. The service produces translations into and from any official EU language. MT@ EC is currently available to public administrations in any EU country, Iceland, or Norway, as well as to all EU institutions and agencies.
The major achievements for both actions financed under the SMART 2015/1091 programme in their first year of work can be summarised as follows:
354 data sets were collected into the ELRC-Share repository. 225 data sets were provided to the EU in the previous SMART 2014/1074 project, 91 of them being made available as open data sets, and 129 other data sets are being analysed from a legal and technical point of view. Out of these, 100 sets were processed by ELRC partners (within ELRC Data) so as to comply with technical requirements of the project and supplied to the EU to be used for the enhancement of the eTranslation platform.
81 bilingual language resources have been built out of crawled data within ELRC Data.
Over 150 language data repositories have been identified, which will be used as additional sources for language resource collection (monolingual and parallel data, as well as terminologies) in the coming months.
The distribution of LRs collected by the ELRC can be seen here:
The on-site assistance service has been put into place to provide language resource providers/owners or language resource users with assistance and expertise at their site. For further details on this service, you can go here.
The ELRC infrastructure was enhanced and is being run so as to better meet current project requirements: website, legal and technical helpdesk, ELRC-Share repository.
The following events were organised to raise awareness about the importance of data sets and encourage participants to contribute to the collection of LRs needed: three workshops (in Dublin, Ireland, 13 October 2017, in Athens, Greece, 18 October 2017, and in Warsaw, Poland, 18 December 2017), the 3rd ELRC conference (7-8 November, Brussels, Belgium), and three ELRC Experience Cafés (at the eSENS Final Conference, Brussels, Belgium, 2-3 March 2017, at the EULITA Conference, Vienna, Austria, 30-31 March 2017, and at the Future Congress for State and Administration, Berlin, Germany, 20-21 June 2017).
The Language Resource Board (LRB), as the governance body of ELRC, is being run with a total of 60 members (Technology and Public Service National Anchor Points in each of the 30 participating countries). Following the series of meetings initiated by the LRB within the former ELRC action, two further LRB meetings have been organised in 2017: the 4th one took place in Berlin on March 28th, and the 5th LRB meeting was held in Brussels on November 7, in conjunction with the 3rd ELRC conference.