|
Publication
|
Research Title |
Semi-Automatic Word-Aligned Tool for Thai-Vietnamese Parallel Corpus Construction |
Date of Distribution |
14 October 2019 |
Conference |
Title of the Conference |
16th International Joint Conference on Computer Science and Software Engineering (JCSSE 2019) |
Organiser |
Faculty of Informatics, Burapha University, Chonburi, Thailand |
Conference Place |
Amari Pattaya |
Province/State |
ชลบุรี |
Conference Date |
10 July 2019 |
To |
12 July 2019 |
Proceeding Paper |
Volume |
2019 |
Issue |
1 |
Page |
121-125 |
Editors/edition/publisher |
|
Abstract |
A corpus, especially a parallel corpus, which contains both source and target language, is an important resource in Natural Language Processing (NLP) research, particularly in machine translation. A quality corpus can improve the accuracy of the translation results significantly; however, corpus construction is very time consuming, and requires the expertise of linguistic experts. In this paper, we present Thai-Vietnamese parallel corpus construction and the process of building a Thai-Vietnamese parallel corpus. This work focuses on the construction of a semi-automatic word-alignment tool, capable of assisting researchers in the construction of a parallel corpus. The collection and validation within this study was achieved through the use of our development tool. In the first stage, the Vietnamese -Thai parallel corpus, containing 14,771 sentence pairs; was collected, aligned at word level, and validated by linguistic experts. This parallel corpus can be used as a reliable resource for statistical machine translation and other applications. |
Author |
|
Peer Review Status |
มีผู้ประเมินอิสระ |
Level of Conference |
นานาชาติ |
Type of Proceeding |
Full paper |
Type of Presentation |
Oral |
Part of thesis |
true |
Presentation awarding |
false |
Attach file |
|
Citation |
0
|
|
|
|
|
|
|