Wordlist (directory-assistance) grammars
Wordlist grammars are speech grammars that are optimized for fast compilation. Often, they are called directory-assistance or DA grammars because their primary use is for very large lists of items, needed for telephone directory assistance applications that recognize large numbers of personal names. Applications can use wordlist grammars as if they were XML grammars.
Some key features of wordlist grammars are:
- Wordlist grammar files are UTF-8 format text files.
- The file begins with a header that declares the wordlist grammar (required) and specifies any applicable configuration parameters (optional).
- The remainder of the file lists the recognized vocabulary, one item per line.
- You cannot include ECMAScript expressions in wordlist grammars, although they may be imported into grammars that use ECMAScript.
- For best performance, precompile any pronunciation dictionaries used by your wordlist grammars. See Performance considerations.
- By default, Recognizer aborts compilation if it cannot create a pronunciation for a word in the grammar. Because wordlist grammars sometimes contain nonsense words (generated from a database, for example), they can cause failed compilations. To avoid such failures, set the parameter swirec_enable_robust_compile in a <meta> element in the grammar.
Sample wordlist grammar
Below is a sample wordlist grammar:
::DAWORDLIST en-us
::TAG_FORMAT swi-semantics/1.0
::META swirec_lexicon wordlist.xml
::META swirec_lexicon wordlist2.xml
Mark_Fanta
Caroline_Diego
Caroline Diego!fr-ca -0.693
Francois Rousseau!fr-ca
::PREFIX
I would like to speak to
::END
::SUFFIX
please
::END

The wordlist grammar header consists of commands beginning with a double colon (::) followed by the command arguments (with no intervening space).
- ::DAWORDLIST—The first line of the grammar must declare that the file is a wordlist grammar. The format is "::DAWORDLIST LangCode", where the language code defines a default language.
- ::TAG_FORMAT—Optional. ::TAG_FORMAT specifies the grammar’s tag syntax. The default is "swi-semantics/1.0". The available tag formats are swi-semantics/1.0, semantics/1.0, or semantics/1.0-literals, as discussed in Language, namespace, and semantic tag format.
- ::META—Optional. Subsequent lines with the "::META parameter" format define the grammar’s configuration. The parameter is any Recognizer parameter that can be specified as a meta element. In particular, you can specify a pronunciation dictionary by using the swirec_lexicon parameter. To specify more than one dictionary, use additional ::META lines as shown. For a discussion of dictionaries, see Pronunciation dictionaries.

The main body of a wordlist grammar consists of the entries, each appearing on a separate line. In the sample wordlist, this consists of a list of names:
Mark_Fanta
Caroline_Diego
Caroline Diego!fr-ca 0.3
Francois Rousseau!fr-ca

In the main body of the grammar, you can use the ::PREFIX and ::SUFFIX commands to specify prefix and suffix strings to be added to all words in the main list. Using this shortcut lets you avoid having to list each permutation on a separate line of the file. The prefix or suffix string appears between the opening ::PREFIX or ::SUFFIX command, and the terminating ::END command.
For example, here is a prefix definition:
::PREFIX
I would like to speak to
::END
Similarly, here is a suffix definition:
::SUFFIX
please
::END
With the prefix and suffix, this example wordlist covers the following:
Caroline_Diego
Caroline_Diego please
I would like to speak to Caroline_Diego
I would like to speak to Caroline_Diego please

Although you must declare a default language for the entire wordlist grammar in the header, on each line you can use an exclamation mark (!) and language code to define a local language to be used for that line. For example:
::DAWORDLIST en-us
Caroline_Diego
Caroline Diego !fr-ca -0.693
Above, the name will be recognized if either the US English or the Canadian French pronunciation is used (assuming the fr-CA language pack is installed).
Penalties
You can assign penalties to specific words in the list by adding a floating point number to the end of the line, after any comments. For example:
::DAWORDLIST en-us
Caroline_Diego
Caroline Diego!fr-ca -0.693
The assigned number affects the weight for the item as a natural log probability. Here, the weighting for the "Caroline Diego !fr-ca" item is e to the power of -0.693, or approximately half (0.5).
Typical penalties are in the range from 0 to -5, but may be greater. Small magnitude values reduce the chance that the word will be recognized by a small amount, while high magnitude values reduce the chance severely.
On rare occasions, you can use small positive values (which evaluate to a probability greater than 1) to increase the chance of a word being recognized. Caution: if the positive value is too large, this may overpower the acoustic model and skew recognition in favor of the item beyond the intended weight.
Recognition results
When you use a wordlist grammar, Recognizer reports the recognized item in the SWI_literal and SWI_meaning keys. The following example shows recognition of the word "Mark_Fanta" (the tag format is swi-semantics):
<?xml version='1.0'?>
<result>
<interpretation grammar="MyWordlistGrammar" confidence="98">
<input mode="speech">
mark_fanta
</input>
<instance>
<SWI_meaning>
mark_fanta
</SWI_meaning>
<SWI_literal>
Mark_fanta
</SWI_literal>
<SWI_grammarName>
MyWordlistGrammar
</SWI_grammarName>
</instance>
</interpretation>
</result>
Performance considerations
Because wordlist grammars do not require XML parsing, they compile more quickly than XML grammars of similar size. However, wordlist grammars do not recognize input as quickly as XML grammars.
- The best use for wordlist grammars is for large lists of words that are only occasionally needed, because the savings in CPU compilation cycles is significant in relation to the added cost during recognition.
- A large list that is used frequently would not be a good candidate for a wordlist grammar, because the recognition costs would add up over time.
Parameters that set grammar optimization levels have no effect on wordlist grammars. For example, swirec_optimization is ignored.
Although the wordlist grammar does not require XML parsing, any specified user dictionaries will be written with XML and can slow the grammar’s compilation. To avoid this problem, precompilation of pronunciation dictionaries with the make_dict utility is strongly recommended (see Compiling a user dictionary).