You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

18 lines
721 B

  1. """ from https://github.com/keithito/tacotron """
  2. '''
  3. Defines the set of symbols used in text input to the model.
  4. The default is a set of ASCII characters that works well for English or text that has been run through Unidecode. For other data, you can modify _characters. See TRAINING_DATA.md for details. '''
  5. from text import cmudict
  6. _pad = '_'
  7. _punctuation = '!\'(),.:;? '
  8. _special = '-'
  9. _letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
  10. # Prepend "@" to ARPAbet symbols to ensure uniqueness (some are the same as uppercase letters):
  11. _arpabet = ['@' + s for s in cmudict.valid_symbols]
  12. # Export all symbols:
  13. symbols = [_pad] + list(_special) + list(_punctuation) + list(_letters) + _arpabet