Browse Source

adding readme and license

master
Rafael Valle 6 years ago
commit
1874b9a08f
2 changed files with 78 additions and 0 deletions
  1. +25
    -0
      LICENSE
  2. +53
    -0
      README.md

+ 25
- 0
LICENSE View File

@ -0,0 +1,25 @@
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

+ 53
- 0
README.md View File

@ -0,0 +1,53 @@
# Tacotron 2 (without wavenet)
Tacotron 2 PyTorch implementation of [Natural TTS Synthesis By Conditioning
Wavenet On Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf).
This implementation includes **distributed** and **fp16** support
and uses the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).
Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's
frameworks team.
![Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram](tensorboard.png)
## Pre-requisites
1. NVIDIA GPU + CUDA cuDNN
## Setup
1. Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
2. Clone this repo: `git clone https://github.com/NVIDIA/tacotron2.git`
3. CD into this repo: `cd tacotron2`
4. Update .wav paths: `sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' *.txt`
5. Install [pytorch 0.4](https://github.com/pytorch/pytorch)
6. Install python requirements or use docker container (tbd)
- Install python requirements: `pip install requirements.txt`
- **OR**
- Docker container `(tbd)`
## Training
1. `python train.py --output_directory=outdir --log_directory=logdir`
2. (OPTIONAL) `tensorboard --logdir=outdir/logdir`
## Multi-GPU (distributed) and FP16 Training
1. `python -m multiproc train.py --output_directory=/outdir --log_directory=/logdir --hparams=distributed_run=True`
## Inference
1. `jupyter notebook --ip=127.0.0.1 --port=31337`
2. load inference.ipynb
## Related repos
[nv-wavenet](https://github.com/NVIDIA/nv-wavenet/): Faster than real-time
wavenet inference
## Acknowledgements
This implementation is inspired or uses code from the following repos:
[Ryuchi Yamamoto](github.com/r9y9/tacotron_pytorch), [Keith
Ito](https://github.com/keithito/tacotron/), [Prem Seetharaman](Prem
Seetharaman's https://github.com/pseeth/pytorch-stft).
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen,
Yuxuan Wang and Zongheng Yang.

Loading…
Cancel
Save