COVID-19 Initiative: First Results and Upcoming Actions
In order to support fast information exchange and accurate communication in all EU official languages, Icelandic and Norwegian, the European Language Resource Coordination (ELRC) is contributing to a collaborative COVID-19 initiative. The initiative receives support by outstanding European MT companies, universities, research centres and networks like CLARIN ERIC, ELRA, the Universities of Padua, Utrecht and Lisbon, LIMSI and Pangeanic S.L.
During the last weeks, the initiative achieved huge progress in collecting language resources and tools to support the development of applications and services in relation to the COVID-19 pandemic: Currently, the ELRC COVID-19 repository counts 97 items – i.e. 93 bilingual (EN to X, X to EN) and 4 multilingual parallel textual Corpora – in TMX format. Further collections are in process; they will include consolidated multilingual corpora (e.g. the Global Voices Corpus) as well as material from news and media providers. Once validated, all language resources will be available in the ELRC-SHARE repository under the Creative Commons Attribution-ShareAlike 4.0 International License.
Beyond this, ELRC is contributing to the upcoming COVID-19 MLIA Eval effort, starting in June 2020, which, among others, will provide the training and test corpora for the three tasks defined in the initiative (Information Extraction, Multilingual Semantic Search, and Machine Translation). This community evaluation effort has been initiated as part of the CLEF Initiative and aims at accelerating the creation of resources and tools for an improved multilingual information access (MLIA), with particular reference to a general public use case, including information on social, economic or political aspects related to the pandemic, like e.g. self-isolation, social distance, school closing/re-opening, etc.
The initiative adopts an incremental tool evaluation over three rounds on the above mentioned main tasks, in order to enable the release of progressively consolidated tools and resources.
- Information Extraction: DFKI and LIMSI
- Multilingual Semantic Search: University of Padua and CLARIN ERIC
- Machine Translation: Pangeanic S.L.
Given the scale of the COVID-19 crisis, the need for language resources and tools is obviously enormous. Everyone is invited to support the initiative by sharing resources on this subject. More information on how to contribute can be found here.
Details on the COVID-19 MLIA Eval effort are available on the initiative’s homepage.