VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu

This page is the demo of audio samples for our paper. Note that we downsample the LJSpeech to 16k in this work for simplicity.

Part I: Speech Reconstruction

Recording GT Mel + HifiGAN
GT VQ&pros + HifiGAN GT VQ&pros + vec2wav
Recording GT Mel + HifiGAN
GT VQ&pros + HifiGAN GT VQ&pros + vec2wav
Recording GT Mel + HifiGAN
GT VQ&pros + HifiGAN GT VQ&pros + vec2wav
Recording GT Mel + HifiGAN
GT VQ&pros + HifiGAN GT VQ&pros + vec2wav
Recording GT Mel + HifiGAN
GT VQ&pros + HifiGAN GT VQ&pros + vec2wav

Part II: Text-to-speech Synthesis

with the particular purposes of the agency involved. the commission recognizes that this is a controversial area
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS
this matter is obviously beyond the jurisdiction of the commission,
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS
the commission regards this as a most useful innovation and urges that the practice be continued.
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS
this no longer appears to be the case.
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS
the secret service has been receiving full cooperation in scientific research and technological development
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS
even if the manpower and technological resources of the secret service are adequately augmented,
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS
the commission recommends that the present arrangements
Recording Tacotron 2 FastSpeech 2
GlowTTS VITS VQTTS

Part III: Prosody diversity in PL prosody hypotheses

county, and state law enforcement agencies in their districts.
Prosody 1 Prosody 2 Prosody 3
secret service personnel and facilities
Prosody 1 Prosody 2 Prosody 3
the assistance of trained federal law enforcement officers.
Prosody 1 Prosody 2 Prosody 3