Frequently Asked Questions – and their Answers

What is going to happen to the data we provide?

Provision of data: How? Why? What are our data used for?

The data will go to the EC (DG Translate) to support the improvement of the machine translation system eTranslation.

Why should we (public institutions) actually provide data?

Provision of data: How? Why? What are our data used for?

Supporting your own language is supporting Europe and vice versa. Only with your help and with the provision of your language resources, CEF eTranslation can be made fit to your needs. Within the CEF programme, CEF eTranslation is available for free to public administrations in all EU member states and CEF affiliated countries (Iceland and Norway). So for your data, you receive a better service.

We (public institutions) don’t have any data for you! We work only paper-based. We outsource our translations.

Provision of data: How? Why? What are our data used for?

If translations are outsourced, you should ask for the translated data to be delivered with the translation memories. Make sure to negotiate the translation memories with the language service provider ahead.

We cannot just share our data with you – they are confidential!

Provision of data: How? Why? What are our data used for?

Most data held by the public sector is public data. Administrations provide various types of information online to the citizens (e.g. news, legal texts, official communications, interviews, brochures, background information, etc.). This information can also be available in a foreign language. In Germany, for instance, on the website of the national government, all information is provided in German, English, and French.

How can I upload my data to the repository?

Provision of data: How? Why? What are our data used for?

You can upload data to the ELRC repository in three simple steps:

1.      Register (new user) or login (returning user)

2.      Provide a basic description for the language resource (title, short description, language(s)) 

3.      Upload the .zip file

For further instructions, please read the Walkthrough for Contributors and/or contact the helpdesk  

Why should I care about translations and get hold of/keep corresponding language data?

Managing and harvesting language data - why and how?

Whether you translate your material internally or outsource it, your process can benefit from the re-use of language data from previous translations in a cost-effective way while improving the quality of the output.

How should I manage my data and why? We don’t have any infrastructures or resources (especially small translation services)!

Managing and harvesting language data - why and how?

In the public sector there is a great diversity in translation management: from paper-based to digitized workflows with term lists and translation memories storage.
From an organizational point of view, much benefit can arise even from small changes in dealing with language data. Suggested actions can be taken without major effort, including:

  • Analysis of all phases of data development
  • Based on this, creation of a “data management plan” (DMP), even a very basic one:

    • Which data is important?
    • Where is it stored?
    • Can it be further processed?
  • Document all relevant data
  • If possible, use the web as additional publication channel and reap the benefits of linked data (see http://www.w3.org/DesignIssues/LinkedData.html)
  •  (Check presentation “Best practice for the future: Capitalize on your valuable data”)

What is Open Data?

Managing and harvesting language data - why and how?

Open data is data that can be freely accessed, used, re-used, modified and disseminated by anyone  for any purpose - maximally restricted by requirements that preserve provenance and openness. The most important characteristics of open data are:

  • Availability and access: The work shall be available as a whole, at a cost no higher than the cost of reproduction, preferably for free download on the Internet. The work should also be available in an appropriate and modifiable form.

  • Reuse and subsequent use: The data must be made available under conditions that permit reuse, subsequent use and linking with other data sets. The data shall be machine-readable.

  • Universal participation: Everyone must be able to use, reuse and subsequently use the data. There must be no discrimination against specific fields of action, persons or groups. The subsequent use may not be limited to individual areas (e.g. only in education), nor may certain types of use (e.g. for commercial purposes) be excluded.

What are Open Licences?

Managing and harvesting language data - why and how?

In general, an Open License is a license that grants permission to access, reuse, and redistribute a work with few or no restrictions. The exact permissions granted depend on the full text of the open license used. Different projects can easily require different permissions or restrictions - and there are a number of different licenses to accommodate these different uses. A list of the most common open licenses can be found on the Open Knowledge Licenses page. Creative Commons licenses have evolved into an international standard for open licencing.

Does the share-alike requirement in CC licenses apply to translations?

Managing and harvesting language data - why and how?

The copyleft is a clause in copyrighted licenses of use that obligates the licensee to license any modification of the work under the license of the original work. The Copyleft clause is intended to prevent modified versions of the work from being distributed with restrictions on use that the original does not have. Since a translation is an adaptation, a translation must also be licensed under the license of the original work.

If I have access to a multilingual website, can I lawfully create a multilingual language resource out of it? Can I then share it under an Open License?

Managing and harvesting language data - why and how?

The mere fact that a website is online does not provide any information about the copyright status of its content. For the content of a multilingual website, it is therefore also required to check how the content is licensed.

The legal boundaries of text and data mining (TDM) in the EU must, however, be considered separately. Unlike countries - including the US, Israel, Singapore, Taiwan and the Republic of Korea - where “fair use" can be invoked against claims of copyright infringement arising from the use of TDM techniques, or where TDM activities - as in Japan - are generally permitted and prohibited only in exceptional cases, the use of such techniques in cases of long-term intermediate storage in the EU currently will generally require the consent of the right holder. While the UK early introduced a special copyright exception for text and data mining that allows lawful access to perform text and data analysis for non-commercial research, the legal handling of TDM is still unclear in the rest of the EU. With the forthcoming copyright reform the legal framework is expected to become more concrete, whereas TDM is thought to be classified as a separate form of usage.

In some countries (e.g. in Germany), official documents are expressly excluded from copyright. Does this mean that they can be considered public domain also in countries that do not have such a limitation (e.g. in the UK)? And vice versa, in countries

Managing and harvesting language data - why and how?

Within the EU, foreign and supranational official works – following the internationally applicable principle that the law of the country for which territorial protection is sought is applicable, as well as the territoriality principle – are treated according to the respective national domestic law. Also in the U.S., according to Section 105 of the Copyright Act, works of the U.S. Government are not entitled to domestic copyright protection and are therefore considered public domain in the U.S. These official documents are freely used in international practice.

What is CEF eTranslation?

CEF AT, eTranslation and translation needs in the public administration

eTranslation is the European Commission’s machine translation system. It is an online service with a web user interface in 24 languages for human use. It can also be used as a web service in a machine-to-machine scenario. It guarantees confidentiality of data. Any Member State administration, small and medium-sized enterprise (SME) and university language faculty in an EU country, Iceland or Norway can use it free of charge at least until 2020. As part of CEF Digital, eTranslation provides automatic translation services with the goal of making any digital service accessible to any EU citizen in his/her own language. European public online services such as Europeana, the Open Data Portal, the Online Dispute Resolution Platform, etc. should benefit from CEF eTranslation. More on eTranslation and on CEF.

 

Further information available here.

How can we access eTranslation?

CEF AT, eTranslation and translation needs in the public administration

eTranslation can be used by any Member State administration.

It can be accessed as follows:

  • Staff working for EU institutions or agencies can directly access eTranslation with their EU Login (formerly ECAS) credentials and therefore do not need to register.
  • Staff working for a public administration, small and medium-sized enterprises and university language faculties in an EU country, Iceland or Norway can self register here.
  • Individual accesses will be automatically deactivated after 12 months if not used.

Further information available here.

Why would we need eTranslation? We have human translators!

CEF AT, eTranslation and translation needs in the public administration

eTranslation can substantially help make the translation process more productive and more efficient. EC translators are responsible for translating content into all official EU languages. In total, more than 7,000 translators working for DG Translation and EU institutions have translated more than 2.3M pages in 2014.

eTranslation is used daily for French, Spanish, Portuguese and Italian to produce initial translations that are post edited in a very efficient way. For other languages (e.g. German) the quality level of the output is still too low.

In the last year, however, significant progress has been achieved through domain-specific engines. For domain-specific reports and texts, the quality of the translated output by eTranslation is acceptable. In other cases, the tool can rapidly scan long texts in a foreign language and point out passages to be translated by humans.

Overall, the translation quality is directly related to the availability of good quality data in the language: if the data for MT is good, then the MT system will be good.

eTranslation can provide SMEs with a cost effective solution to translate, for instance, daily conversations with foreign clients while continuing to use human translators for complicated texts needing perfect interpretation and understanding.

 

Why should we support eTranslation – we can have our own national solution?

CEF AT, eTranslation and translation needs in the public administration

Typically, national or proprietary solutions are targeted on particular range of topics. Hence, the scope of eTranslation is broader and more comprehensive. By supporting eTranslation participants can expect to have access to a broader service.

For SMEs that do not possess proprietary translation solutions, eTranslation can be a cost effective translation tool able to provide first impressions on a wide variety of texts.

Machine translation is directly opposed to our national policy that young people should learn foreign languages.

CEF AT, eTranslation and translation needs in the public administration

Not necessarily. Machine translation can actually provide a good basis for learning languages.

Initially, it can be used to bridge the gap for people who cannot speak a particular language until they acquire initial language skills.

For instance, at university level, machine translation can be used to provide automatic and simultaneous translations of lectures for foreign students who do not master the language.

Machine translation will never work for our languages (e.g. Estonian, Finnish, Hungarian and other morphologically rich languages).

CEF AT, eTranslation and translation needs in the public administration

Processing certain languages with the current MT technologies is more difficult because of e.g. their free morphology or their free constituent order. MT experts are working on new MT solutions based on neural networks more adapted to these languages. Moreover, the European Commission funds several actions (see e.g.) to investigate MT solutions for languages which currently receive only sub-optimal MT support.

However, regardless of the methodology, huge amounts of parallel resources are needed for the implementation of the systems, since these systems rely on machine learning.

By opening up access to SMEs, we will be able to collect data for “under-resourced” European languages or morphogically rich languages and improve the quality of translations and extend the domains that eTranslation covers.

The texts that we need to translate are confidential. What happens to the texts that were submitted for translation? Do you keep them or are they deleted afterwards?

CEF AT, eTranslation and translation needs in the public administration

All texts are deleted within 24 hours unless it is requested to delete the texts immediately after the request was processed. Users can also choose to delete immediately after download (“delete after download”) or choose to retrieve the output directly from the interface (tab “my translation requests”) instead of receiving the output via email (“e-mail me my translation”).

By registering to use this application, you are consenting to eTranslation’s use of personal data as described below.

Privacy Statement:

By registering to use this application, you are consenting to eTranslation’s use of personal data as described below. eTranslation records the login, time of access, languages requested, size of document submitted for translation and the domain of your email address (…@ec.europa.eu) to enable access to the service and processing of requests, as well as for statistical purposes. These are kept for 18 months and then archived. If you choose to have your document returned by e-mail, your e-mail address will be kept until the document is sent. It will subsequently be reduced to its domain only. Users should exercise their judgement when submitting potentially sensitive documents to any online service, including eTranslation. Documents submitted remain available for 24 hours after which they are deleted. There is a “delete after delivery” option which, if ticked, results in the text being deleted immediately after it is delivered. Data will not be shared with third parties. Should you wish to raise any concerns on the eTranslation’s use of personal data please write to DGT-MT@ec.europa.eu

Are translations protected by copyright? If so, who holds the copyright? How about machine translation?

CEF AT, eTranslation and translation needs in the public administration

a) Are translations protected by copyright?

According to internationally binding regulations (e.g. the Berne Convention for the Protection of Literary and Artistic works), translations, among other alternations, are protected as original works without prejudice to the copyright in the original work.

b) If so, who holds the copyright?

Since the editing copyright is a separate right equivalent to that in the original work, the translator is subject to the same legal regime as the author of the original work. The copyright in the translation is, although it is a separate right, dependent on the copyright in the original work. This is because the translator himself can only use his translation if the author of the original work has given his consent.

Regarding employees performing translations or post-edition of machine translated documents the legislation can vary from country to country. Some countries provide for a direct transfer of copyright from the employee to the employer some may not. Therefore employment contracts should contain a clause providing for the transfer of copyright works performed during the work hours to the employer. Check with a lawyer to help you in drafting and applying such a clause.

c) How about machine translation?

A translation produced by a machine in general is not a work capable of copyright protection. Only the code of the translation program is protectable. Nonetheless, if an author is using machine translations as a supporting tool for recommendations, but the translation is still the result of his intellectual act of creation, copyright will still apply.