Fork of https://github.com/alokprasad/fastspeech_squeezewave to also fix denoising in squeezewave

History

alokprasad 85470b258e running inference script		5 years ago
..
audio	Original Fastspeech and Squeezewave	5 years ago
data	Original Fastspeech and Squeezewave	5 years ago
img	Original Fastspeech and Squeezewave	5 years ago
model_new	model_path	5 years ago
results	Original Fastspeech and Squeezewave	5 years ago
tacotron2	Original Fastspeech and Squeezewave	5 years ago
text	Original Fastspeech and Squeezewave	5 years ago
transformer	Original Fastspeech and Squeezewave	5 years ago
waveglow	Original Fastspeech and Squeezewave	5 years ago
.gitignore	Original Fastspeech and Squeezewave	5 years ago
LICENSE	Original Fastspeech and Squeezewave	5 years ago
README.md	Original Fastspeech and Squeezewave	5 years ago
alignments.zip	Original Fastspeech and Squeezewave	5 years ago
dataset.py	Original Fastspeech and Squeezewave	5 years ago
fastspeech.py	Original Fastspeech and Squeezewave	5 years ago
glow.py	Original Fastspeech and Squeezewave	5 years ago
hparams.py	Original Fastspeech and Squeezewave	5 years ago
loss.py	Original Fastspeech and Squeezewave	5 years ago
modules.py	Original Fastspeech and Squeezewave	5 years ago
optimizer.py	Original Fastspeech and Squeezewave	5 years ago
preprocess.py	Original Fastspeech and Squeezewave	5 years ago
run_inference.sh	running inference script	5 years ago
synthesis.py	changes for squeezewave and non cuda	5 years ago
train.py	Original Fastspeech and Squeezewave	5 years ago
utils.py	Original Fastspeech and Squeezewave	5 years ago

FastSpeech-Pytorch

The Implementation of FastSpeech Based on Pytorch.

Update

* if you want to calculate alignment, don't unzip alignments.zip and put Nvidia pretrained Tacotron2 model in the Tacotron2/pretrained_model

Run python train.py.

Run python synthesis.py.

In the paper of FastSpeech, authors use pre-trained Transformer-TTS to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.
The examples of audio are in results.
The outputs and alignment of Tacotron2 are shown as follows (The sentence for synthesizing is "I want to go to CMU to do research on deep learning."):

The outputs of FastSpeech and Tacotron2 (Right one is tacotron2) are shown as follows (The sentence for synthesizing is "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition."):