GSMIS

ข้อมูลการเผยแพร่ผลงาน

การเผยแพร่ในรูปของบทความวารสารทางวิชาการ

ชื่อบทความ

Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F0 Representation

วัน/เดือน/ปี ที่ได้ตอบรับ

21 กันยายน 2563

วารสาร

ชื่อวารสาร

Applied Sciences

มาตรฐานของวารสาร

ISI

หน่วยงานเจ้าของวารสาร

MDPI (Basel, Switzerland)

ISBN/ISSN

ปีที่

ฉบับที่

เดือน

September

ปี พ.ศ. ที่พิมพ์

2563

หน้า

1-18

บทคัดย่อ

The modeling of fundamental frequency (F0) in speech synthesis is a critical factor affecting the intelligibility and naturalness of synthesized speech. In this paper, we focus on improving the modeling of F0 for Isarn speech synthesis. We propose the F0 model for this based on a recurrent neural network (RNN). Sampled values of F0 are used at the syllable level of continuous Isarn speech combined with their dynamic features to represent supra-segmental properties of the F0 contour. Different architectures of the deep RNNs and different combinations of linguistic features are analyzed to obtain conditions for the best performance. To assess the proposed method, we compared it with several RNN-based baselines. The results of objective and subjective tests indicate that the proposed model significantly outperformed the baseline RNN model that predicts values of F0 at the frame level, and the baseline RNN model that represents the F0 contours of syllables by using discrete cosine transform.

คำสำคัญ

tone, fundamental frequency, recurrent neural networks, Isarn dialect, speech synthesis

ผู้เขียน

577020065-1	นาย พงษ์ศธร จันทร์ยอย [ผู้เขียนหลัก]
	คณะวิทยาศาสตร์ ปริญญาเอก ภาษาอังกฤษ

การประเมินบทความ

มีผู้ประเมินอิสระ

สถานภาพการเผยแพร่

ตีพิมพ์แล้ว

วารสารมีการเผยแพร่ในระดับ

นานาชาติ

citation

ไม่มี

เป็นส่วนหนึ่งของวิทยานิพนธ์

เป็น

ใช้สำหรับสำเร็จการศึกษา

ไม่เป็น

แนบไฟล์

Citation