Pronunciation dictionaries
Recognizer uses dictionaries for the pronunciations of the words spoken by callers. A dictionary encodes the pronunciations for each word in a phonetic alphabet, and maps the pronunciations to text.
There are two types of dictionaries:
- The system dictionary: Default pronunciations provided by Recognizer.
- User dictionaries: Application-specific pronunciations.
Note: In all languages, Recognizer uses a rule-based system to generate pronunciations automatically for words not found in the dictionaries. It also reports each instance of automatic generation in the diagnostic log file. Generated pronunciations are usually accurate, but are worth checking because incorrect pronunciations can lower recognition accuracy significantly.
Working with pronunciations
Recognizer works phonetically. In simple terms, this means that Recognizer understands the sound of written pronunciations. During a telephone call, Recognizer does the following:
- Hears an utterance spoken by the caller.
- Builds a large matrix of the possible phoneme combinations in the utterance.
- Matches the possibilities with the pronunciations for the current vocabulary.
- Delivers the n-best list (the most likely “answers”) to the application.
In most situations, Recognizer finds pronunciations in the system dictionary, with no need for application intervention. Sometimes, pronunciations are not found or are not exact enough for application:
- When a pronunciation is not found, the diagnostic log contains an error such as: SWI_ERROR_GENERIC| error| lookupIndividualWords | Could not generate pronunciation for phrase. The error shows the missing phrase, and you can add it to a user dictionary.
- When a pronunciation is not exact enough, you detect the problem when deploying and tuning the application. Incorrect pronunciations result in nomatch recognitions (because of lower confidence scores), and cause the application to re-collect the information. In these situations, you can to tune the pronunciations of specific words and phrases in a user dictionary. For example, you can provide an alternative pronunciation to accommodate the regional dialect of the application users.
Before you create a user dictionary or otherwise fine-tune pronunciations, it is important to understand how pronunciations work in Recognizer:

A pronunciation is a string of phonemes. A phoneme is the smallest unit of sound that conveys meaning. There is one phoneme for each sound in a language. These are documented in the Language Supplement for the language, which appears in: %SWISRSDK%\documentation\languages\.
By default, the American English supplement is included in the basic installation. Additional language packs install additional Language Supplement documents.
Every vocabulary item has a pronunciation consisting of a sequence of phonemes, and each phoneme is represented as a one- or two-character string. Thus, when you write a word phonemically, you write it the way it sounds using the language’s phonemic alphabet (defined in the Language Supplement). For example, the pronunciation for the word “dog” in en-US is dQg.

Your original vocabulary items have the same pronunciations regardless of their casing. The text items “Bill”, “BILL”, “bill”, and “bIll” are all pronounced the same way. However, the casing is important when using the phonemic alphabet.

Each word in your vocabulary has its own pronunciation. Even if you define a multi-word phrase as a vocabulary item, each word in the phrase will be pronounced separately. For example, the item “William Shakespeare” is treated as two separate words, each with its own pronunciation.
Sometimes, a spoken phrase has a different pronunciation than each of its constituent words. For example, “..want to..” might be spoken as “..wanna..”.

A word often has more than one pronunciation. This is useful for words like “either”, which may have two very different pronunciations, and words like “route”, which have regional variations.

Pronunciations are not limited to single words. Sometimes it is useful to write pronunciations for a specific phrase when that phrase has a pronunciation that is distinct from the pronunciations of its component words. For example, “want to” may be pronounced as “wanna”, as mentioned above.
In such cases, you can use underscores to combine the words as a phrase in the dictionary. Recognizer will search for the full phrase (including underscores) in the dictionaries. If the underscored phrase is not found, Recognizer then searches for individual-word pronunciations. For an example, see Example: pronunciations for a phrase.

For the most control over pronunciation accuracy, it is best to spell out numeric quantities such as cardinals, ordinals, percentages, and dollar amounts. However, this is not always required, because for most languages Recognizer handles most numeric items that use digits.
The examples below use en-US (US English) as the language.
- Leading zeroes are ignored for all numeric quantities.
- Recognizer first looks for numbers (from zero to one billion) in the user dictionary. If a number (expressed in digits) is not found there, Recognizer looks for the digits in the system dictionary. If necessary, Recognizer will automatically generate a pronunciation.
- Numbers from 1100 to 1999 are treated as “hundreds” rather than “thousands”. For example, 1100 is eleven_hundred, 1200 is twelve_hundred, and so on. However, 1000 is one_thousand, 2000 is two_thousand, and so on. For more control over the expansion, you can spell the complete number.
- Decimal points and percentages—“3.22” and “5.5%” become three_point_two_two and five_point_five_percent.
- Ordinal numbers—“56th” is fifty_sixth.
- Dollar amounts—Vocabulary items that begin with the dollar sign ($) are treated as dollar amounts. For example, $5.56 becomes five_dollars_and_fifty_six_cents. This only works for dollar amounts written in the conventional way. For example, $5.1 expands to five_point_one_dollars. Special processing occurs for $0.00 (zero_dollars_and_zero_cents) and amounts less than a dollar (for example, $0.34 expands to thirty_four_cents).
- Recognizer does not automatically handle negative numbers. For the number “–200” you must enter “minus_200” and/or “negative_200”.
- Any combination of numbers and letters that does not represent an ordinal leads to undefined results.
Note that these examples will not hold true in languages other than en-US. For example, the dollar sign may not be recognized in other languages. Details on number and character pronunciations are provided in the Language Supplement documents for each installed language.

The installation package provides a dicttest utility to allow you to test pronunciations, to make sure they’re correct. For a full discussion of this utility and how to use it, see Checking pronunciations with dicttest.

The pronunciations in system dictionaries for supported languages have been tuned extensively. However, you may still find it necessary to modify them. See Tuning vocabulary pronunciations.
Frequently asked questions
Each grammar that you write defines vocabulary words (items) spoken by callers and decoded by Recognizer. Dictionaries supply pronunciations for each word and phrase. If a word is not found in the dictionaries, Recognizer generates a pronunciation based on a set of language-specific rules.
Note: Listen to test-generated pronunciations, because they are sometimes inaccurate. This can be particularly true for proper names.

Most words in your grammars are predefined in the system dictionaries, and have pronunciations that are tuned for accuracy. If you tune the pronunciation for any words, they get stored in your user dictionary. If a pronunciation for a particular word is missing, Recognizer automatically generates one.
Pronunciations are loaded into grammars during compilation; any change to the dictionaries requires a recompilation before new pronunciations are used.

Each language pack download includes an HTML file describing the phonemes for that language. See User dictionaries.

It is better to add a pronunciation than to change the default. This ensures that the original system defaults remain available if your user-defined counterpart fails to match when spoken.
The only case in which we recommend replacing a system default pronunciation is if you are certain the existing one is wrong and that your replacement is better. Please inform Nuance technical support about these situations.

If a pronunciation is not correct, your application will recognize the corresponding word poorly. A poor dictionary pronunciation causes low scores or recognition failures when spoken. You can test your pronunciations using the dicttest utility (see Checking pronunciations with dicttest).
If you identify an incorrect pronunciation, you can correct it as follows:
- If the source of the incorrect pronunciation is the user dictionary, then delete the pronunciation.
- If the source is the system dictionary, then add an alternative pronunciation in a user dictionary. If the source is “Automatic” (see Automatically generated pronunciations), then create a new pronunciation that is more accurate.

If a vocabulary contains a word that is not in the user or system dictionaries, a pronunciation is automatically generated (see Automatically generated pronunciations). These generated pronunciations are usually very accurate. However, be aware of them, and ensure their accuracy.
Be careful when working with pronunciations. There is a difference between adding an alternative pronunciation, and overriding an existing pronunciation.