Abstract:
The process of automatically converting the text from one language to another natural
language is Machine Translation. Machine Translation is a subfield of computational
linguistic. There are two state-of-the-art machine translation techniques i,e Neural Machine
Translation (NMT), and Statistical Machine Translation (SMT). In both techniques, a large
corpus is required for the training of the translation model. Urdu counts in low resource
languages due to the fewer resources available for computational work. To build a good
translation system available resources are not enough. Many languages present in the world
have a different structure. Like in Urdu and English, Urdu structure is based on SubjectObject-Verb (SOV) and the English structure is based Subject-Verb-Object (SVO). In this
study, we presented Urdu to English unsupervised translation model and the practical
challenges faced during the work. We try to partially remove the need for parallel corpora and
proposed a method to train a Machine Translation System in an unsupervised manner. The
proposed system is aimed to provide Urdu to English translation through an unsupervised
manner. For this propose, we use Artetxe Author developed a toolkit that is based on
Unsupervised Neural Machine Translation (UNMT). This approach tested the models of
UNMT which include denoising and on-the-fly back-translation. From denoising model
obtain the BLEU score 4.14 and 5.11 for two language pairs UR-EN and EN-UR. From backtranslation obtain the BLEU score of 5.21 and 6.28 which are better than from the previous
score. Back-translation results difference from denoising technique gains +1.07 and +1.17 for
two language pairs Urdu to English and English to Urdu. We also faced many challenges
during work and effects on pre-processing techniques. Our approach shows promising results
in translation of Urdu text into English which is mostly neglected due to its complexities.