title: Open Language Data Initiative Advancing LowResource Machine Translation for Karakalpak

publish date:

2024-09-06

authors:

Mukhammadsaid Mamasaidov et.al.

paper id

2409.04269v1

download

abstracts:

This study presents several contributions for the Karakalpak language: a FLORES+ devtest dataset translated to Karakalpak, parallel corpora for Uzbek-Karakalpak, Russian-Karakalpak and English-Karakalpak of 100,000 pairs each and open-sourced fine-tuned neural models for translation across these languages. Our experiments compare different model variants and training approaches, demonstrating improvements over existing baselines. This work, conducted as part of the Open Language Data Initiative (OLDI) shared task, aims to advance machine translation capabilities for Karakalpak and contribute to expanding linguistic diversity in NLP technologies.

QA:

coming soon

编辑整理: wanghaisheng 更新日期:2024 年 9 月 9 日