They have shared a corpus of over half a million phrases

On the back of this development, the three organisations have generated the first bilingual news corpus in Basque and Spanish. This will be a key resource for the development of automatic translation systems between the two languages.

The diversity of topics that make up the news corpus, as well as its actual volume, will lead to a significant improvement in the quality of the automatic translation of Basque. The corpus contains over two and a half million sentence pairings in the two languages, covering such subjects as national and international politics, culture, and sports, among others.

The resource has been created using ground-breaking automatic search methods for similar phrases in the news in both languages, and has been developed within the framework of R&D projects financed by the Basque government’s Department of Competitiveness and Development (GAITEK and HAZITEK programmes). Furthermore, it should be noted that the corpus has been shared on the META-SHARE European network for language resources.

MondragonLingua, Basque Radio & TV (EiTB), and Vicomtech-IK4 are particularly interested in sharing this result with the community in order to drive research and development in the automatic translation of Basque.