GSMIS

Publication

Journal Publication

Title of Article

F0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

Date of Acceptance

28 May 2020

Journal

Title of Journal

International Arab Journal of Information Technology (IAJIT)

Standard

ISI

Institute of Journal

Zarqa University

ISBN/ISSN

Volume

Issue

Month

Year of Publication

2020

Page

Abstract

The generation of the fundamental frequency (F0) plays an important role in speech synthesis, which directly influences the naturalness of synthetic speech. In conventional parametric speech synthesis, F0 is predicted frame-by-frame. This method is insufficient to represent F0 contours in larger units, especially tone contours of syllables in tonal languages that deviate as a result of long-term context dependency. This work proposes a syllable-level F0 model that represents F0 contours within syllables, using syllable-level F0 parameters that comprise the sampling F0 points and dynamic features. A Deep Neural Network (DNN) was used to represent the relationships between syllable-level contextual features and syllable-level F0 parameters. The proposed model was examined using an Isarn speech synthesis system with both large and small training sets. For all training sets, the results of objective and subjective tests indicate that the proposed approach outperforms the baseline systems based on hidden Markov models and DNNS that predict F0 values at the frame level.

Keyword

fundamental frequency, speech synthesis, deep neural networks.

Author

577020065-1	Mr. PONGSATHON JANYOI [Main Author]
	Science Doctoral Degree

Reviewing Status

มีผู้ประเมินอิสระ

Status

ได้รับการตอบรับให้ตีพิมพ์

Level of Publication

นานาชาติ

citation

false

Part of thesis

true

ใช้สำหรับสำเร็จการศึกษา

ไม่เป็น

Attach file

Citation