Pronunciation dictionaries

Pronunciations are built from phonemes

A pronunciation is a string of phonemes. A phoneme is the smallest unit of sound that conveys meaning. There is one phoneme for each sound in a language. These are documented in the Language Supplement for the language, which appears in: %SWISRSDK%\documentation\languages\.

By default, the American English supplement is included in the basic installation. Additional language packs install additional Language Supplement documents.

Every vocabulary item has a pronunciation consisting of a sequence of phonemes, and each phoneme is represented as a one- or two-character string. Thus, when you write a word phonemically, you write it the way it sounds using the language’s phonemic alphabet (defined in the Language Supplement). For example, the pronunciation for the word “dog” in en-US is dQg.

Pronunciations are not case-sensitive

Your original vocabulary items have the same pronunciations regardless of their casing. The text items “Bill”, “BILL”, “bill”, and “bIll” are all pronounced the same way. However, the casing is important when using the phonemic alphabet.

Each word in a phrase has its own pronunciation

Each word in your vocabulary has its own pronunciation. Even if you define a multi-word phrase as a vocabulary item, each word in the phrase will be pronounced separately. For example, the item “William Shakespeare” is treated as two separate words, each with its own pronunciation.

Sometimes, a spoken phrase has a different pronunciation than each of its constituent words. For example, “..want to..” might be spoken as “..wanna..”.

A word can have more than one pronunciation

A word often has more than one pronunciation. This is useful for words like “either”, which may have two very different pronunciations, and words like “route”, which have regional variations.

A phrase can have a pronunciation

Pronunciations are not limited to single words. Sometimes it is useful to write pronunciations for a specific phrase when that phrase has a pronunciation that is distinct from the pronunciations of its component words. For example, “want to” may be pronounced as “wanna”, as mentioned above.

In such cases, you can use underscores to combine the words as a phrase in the dictionary. Recognizer will search for the full phrase (including underscores) in the dictionaries. If the underscored phrase is not found, Recognizer then searches for individual-word pronunciations. For an example, see Example: pronunciations for a phrase.

Pronunciations for numbers

For the most control over pronunciation accuracy, it is best to spell out numeric quantities such as cardinals, ordinals, percentages, and dollar amounts. However, this is not always required, because for most languages Recognizer handles most numeric items that use digits.

The examples below use en-US (US English) as the language.

Leading zeroes are ignored for all numeric quantities.
Recognizer first looks for numbers (from zero to one billion) in the user dictionary. If a number (expressed in digits) is not found there, Recognizer looks for the digits in the system dictionary. If necessary, Recognizer will automatically generate a pronunciation.
Numbers from 1100 to 1999 are treated as “hundreds” rather than “thousands”. For example, 1100 is eleven_hundred, 1200 is twelve_hundred, and so on. However, 1000 is one_thousand, 2000 is two_thousand, and so on. For more control over the expansion, you can spell the complete number.
Decimal points and percentages—“3.22” and “5.5%” become three_point_two_two and five_point_five_percent.
Ordinal numbers—“56th” is fifty_sixth.
Dollar amounts—Vocabulary items that begin with the dollar sign ($) are treated as dollar amounts. For example, $5.56 becomes five_dollars_and_fifty_six_cents. This only works for dollar amounts written in the conventional way. For example, $5.1 expands to five_point_one_dollars. Special processing occurs for $0.00 (zero_dollars_and_zero_cents) and amounts less than a dollar (for example, $0.34 expands to thirty_four_cents).
Recognizer does not automatically handle negative numbers. For the number “–200” you must enter “minus_200” and/or “negative_200”.
Any combination of numbers and letters that does not represent an ordinal leads to undefined results.

Note that these examples will not hold true in languages other than en-US. For example, the dollar sign may not be recognized in other languages. Details on number and character pronunciations are provided in the Language Supplement documents for each installed language.

Testing pronunciations

The installation package provides a dicttest utility to allow you to test pronunciations, to make sure they’re correct. For a full discussion of this utility and how to use it, see Checking pronunciations with dicttest.

Where are pronunciations stored? When are they loaded?

Most words in your grammars are predefined in the system dictionaries, and have pronunciations that are tuned for accuracy. If you tune the pronunciation for any words, they get stored in your user dictionary. If a pronunciation for a particular word is missing, Recognizer automatically generates one.

Pronunciations are loaded into grammars during compilation; any change to the dictionaries requires a recompilation before new pronunciations are used.

Where do I find the phoneme definitions?

Each language pack download includes an HTML file describing the phonemes for that language. See User dictionaries.

When should you change pronunciations?

It is better to add a pronunciation than to change the default. This ensures that the original system defaults remain available if your user-defined counterpart fails to match when spoken.

The only case in which we recommend replacing a system default pronunciation is if you are certain the existing one is wrong and that your replacement is better. Please inform Nuance technical support about these situations.

What if pronunciations are wrong?

If a pronunciation is not correct, your application will recognize the corresponding word poorly. A poor dictionary pronunciation causes low scores or recognition failures when spoken. You can test your pronunciations using the dicttest utility (see Checking pronunciations with dicttest).

If you identify an incorrect pronunciation, you can correct it as follows:

If the source of the incorrect pronunciation is the user dictionary, then delete the pronunciation.
If the source is the system dictionary, then add an alternative pronunciation in a user dictionary. If the source is “Automatic” (see Automatically generated pronunciations), then create a new pronunciation that is more accurate.

What if pronunciations are missing?

If a vocabulary contains a word that is not in the user or system dictionaries, a pronunciation is automatically generated (see Automatically generated pronunciations). These generated pronunciations are usually very accurate. However, be aware of them, and ensure their accuracy.

Be careful when working with pronunciations. There is a difference between adding an alternative pronunciation, and overriding an existing pronunciation.

Pronunciation dictionaries

Working with pronunciations

Frequently asked questions

Related topics