UTAR Institutional Repository

Sentence-based alignment for parallel text corpora preparation for machine translation.

Lee, Yong Wei (2021) Sentence-based alignment for parallel text corpora preparation for machine translation. Final Year Project, UTAR.

[img]
Preview
PDF
Download (1459Kb) | Preview

    Abstract

    In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score.

    Item Type: Final Year Project / Dissertation / Thesis (Final Year Project)
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    T Technology > T Technology (General)
    Divisions: Faculty of Information and Communication Technology > Bachelor of Computer Science (Hons)
    Depositing User: ML Main Library
    Date Deposited: 09 Mar 2022 21:04
    Last Modified: 09 Mar 2022 21:04
    URI: http://eprints.utar.edu.my/id/eprint/4261

    Actions (login required)

    View Item