adding readme and license

6 years ago · 1874b9a08f
--- a/+ 25
+++ b/+ 25
@ -0,0 +1,25 @@
 # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
 #  * Redistributions of source code must retain the above copyright
 #    notice, this list of conditions and the following disclaimer.
 #  * Redistributions in binary form must reproduce the above copyright
 #    notice, this list of conditions and the following disclaimer in the
 #    documentation and/or other materials provided with the distribution.
 #  * Neither the name of NVIDIA CORPORATION nor the names of its
 #    contributors may be used to endorse or promote products derived
 #    from this software without specific prior written permission.
 #
 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
 # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 # PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
 # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,53 @@
 # Tacotron 2 (without wavenet)

 Tacotron 2 PyTorch implementation of [Natural TTS Synthesis By Conditioning
 Wavenet On Mel Spectrogram Predictions](https://arxiv.org/pdf/1712.05884.pdf). 

 This implementation includes **distributed** and **fp16** support
 and uses the [LJSpeech dataset](https://keithito.com/LJ-Speech-Dataset/).

 Distributed and FP16 support relies on work by Christian Sarofeen and NVIDIA's
 frameworks team.

 ![Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram](tensorboard.png)


 ## Pre-requisites
 1. NVIDIA GPU + CUDA cuDNN

 ## Setup
 1. Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/)
 2. Clone this repo: `git clone https://github.com/NVIDIA/tacotron2.git`
 3. CD into this repo: `cd tacotron2`
 4. Update .wav paths: `sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' *.txt`
 5. Install [pytorch 0.4](https://github.com/pytorch/pytorch)
 6. Install python requirements or use docker container (tbd)
    - Install python requirements: `pip install requirements.txt`
    - **OR**
    - Docker container `(tbd)` 

 ## Training
 1. `python train.py --output_directory=outdir --log_directory=logdir`
 2. (OPTIONAL) `tensorboard --logdir=outdir/logdir`

 ## Multi-GPU (distributed) and FP16 Training
 1. `python -m multiproc train.py --output_directory=/outdir --log_directory=/logdir --hparams=distributed_run=True`

 ## Inference
 1. `jupyter notebook --ip=127.0.0.1 --port=31337`
 2. load inference.ipynb 

 ## Related repos
 [nv-wavenet](https://github.com/NVIDIA/nv-wavenet/): Faster than real-time
 wavenet inference

 ## Acknowledgements
 This implementation is inspired or uses code from the following repos:
 [Ryuchi Yamamoto](github.com/r9y9/tacotron_pytorch), [Keith
 Ito](https://github.com/keithito/tacotron/), [Prem Seetharaman](Prem
 Seetharaman's https://github.com/pseeth/pytorch-stft). 

 We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen,
 Yuxuan Wang and Zongheng Yang.