Testing grammars
The parseTool and test_parser, described in detail below, are tools specifically designed for testing grammars. Both are shipped with Recognizer, and are stored in the %SWISRSDK%\bin directory.
Also useful are the following utilities:
- acc_test for recognition accuracy testing, which takes one or more prepared scripts as input.
- dicttest for checking dictionary pronunciations.
Using parseTool
Use the parseTool program test a single grammar interactively. It lets you type sentences into grammar to see how the grammar handles them. You can test grammar coverage, interpretation, ambiguity, and overgeneration.
To use parseTool, navigate to the %SWISRSDK%\bin directory, and enter a command with the following format at the prompt:
parseTool grammarfile.grxml [option1arg1] [option2arg2] [...]
Where grammarfile is the path and name of the grammar file to be tested. The options most often used for regular GrXML grammars are described in the table that follows:
Note: Some parseTool options only apply for natural language grammars.
Option |
Description |
---|---|
-debug_output |
Prints information about ECMAScript operations. Used with -test_sentences and -test_file. Can be abbreviated to -d_o. |
-dump_parser filename |
Prints parser information to the specified file. |
-gen_file filename |
Generates random output sentences from the grammar to the specified file. Use with -max_gen to specify a number of sentences to be generated. |
-gen_sentences |
Generates random output sentences from the grammar. Used to detect overgeneration. Use with -max_gen to specify a number of sentences to be generated. Can be abbreviated to -g_s. |
-iso8859 |
Specifies the encoding format of the input and output files as ISO-8859. Used to override UTF-8 format when UTF-8 is the default. |
-max_gen |
Specifies how many sentences to generate. Used with -gen_sentences and -gen_file. |
-media_type |
Specifies the media type of the grammar. The value is either "application/x-vnd.speechworks.emma+xml" or "application/x-vnd.speechworks.recresult+xml". |
-no_pretty |
Prints the parse result with no formatting (that is, as a continuous line of text). |
-no_script_check |
Disables validity checking of the grammar. |
-s |
Enables silence mode, which stops the printing of argument information at the beginning of output. |
-test_file filename |
Specifies an input file with test sentences (one sentence per line). |
-test_sentences |
Enables input of sentences to test the grammar. You can type sentences from the keyboard (the default) or specify an input file (using the -test_file option). The tool evaluates each sentence and shows whether it is covered by the grammar. Can be abbreviated to -t_s. |
-utf8 |
Specifies the encoding format of the input and output files as UTF-8. Used to override ISO-8859 format when ISO-8859 is the default. |
-utt |
Enables input of audio files to be parsed. Only audio/basic files may be input. Use this option with –test_sentences. You cannot use this option with -test_file. With this option, you can specify audio files in addition to typing sentences (see below). The syntax is <filename (the angle bracket is required, and no whitespace is allowed between the bracket and the filename). |
-verbose |
Prints additional parse details. |
Note: You can put the parseTool options in any order on the command line.

The following command parses the myGrammar.grxml file, and tests it in test sentence mode (-test_sentences), which allows you to enter phrases to be tested as text directly at the keyboard. Since this particular example includes the -utt option, the input used for testing can also be an audio file (roma.ulaw):
NR9>parseTool myGrammar.grxml -test_sentences -utt
This command produces the following output (note that in the following transcript, the user enters the <roma.ulaw and roma input when prompted):
PROG parsetool:
arg <spec-filename> == myGrammar.grxml
arg <-test_sentences> == -t_s
arg <-utt> == -utt
next sentence: <roma.ulaw
Parsing '<roma.ulaw' with uri 'myGrammar.grxml'...
<?xml version='1.0'?>
<result>
<interpretation grammar="ParseToolGrammar" confidence="86">
<input mode="speech">
roma
</input>
<instance>
<SWI_literal>
Roma
</SWI_literal>
<SWI_grammarName>
ParseToolGrammar
</SWI_grammarName>
<SWI_meaning>
{SWI_literal:Roma}
</SWI_meaning>
</instance>
</interpretation>
</result>
Parse successful, line 1
next sentence: roma
Parsing 'roma' with uri 'myGrammar.grxml'...
<?xml version='1.0'?>
<result>
<interpretation grammar="ParseToolGrammar" confidence="100">
<input mode="speech">
roma
</input>
<instance>
<SWI_literal>
Roma
</SWI_literal>
<SWI_grammarName>
ParseToolGrammar
</SWI_grammarName>
<SWI_meaning>
{SWI_literal:Roma}
</SWI_meaning>
</instance>
</interpretation>
</result>
Parse successful, line 2

The -test_sentences option allows you to enter test sentences from the keyboard to find out how your grammar handles them. If your test set is large, you can instead use the -test_file option to specify a text file containing the sentences you want to test (each sentence must appear on a separate line of the test file).
When one of these options is coupled with -debug_output (-d_o), you will also see a trace of the ECMAScript operations. See Using parseTool to verify ECMAScript for an example of this use of the -debug_output option.

The parseTool utility offers three options that can be used to generate random sentences based on the input grammar:
- The -gen_sentences (-g_s) option generates a set of random sentences that are valid according to the grammar rules.
- The -gen_file (-g_f) option generates a set of random sentences, and then writes them to the specified file.
- The -max_gen option specifies the number of sentences to be generated.
You can use these options to test for sentence overgeneration by generating a large number of sentences, and checking them for nonsense combinations of words and phrases. Nonsense sentences waste computing resources if they’re recognized at runtime, so it’s important to rewrite your grammar to reject them.
Here is an example of how to use the -gen_sentences and -max_gen options:
> parseTool directcalls.grxml -gen_sentences -max_gen 10
please send calls to the office
send calls home
please direct calls home
please send calls home
send my calls home
please direct my calls home
please direct calls home
please send calls home
please send my calls home
please send calls to the office

To perform ambiguity tests, use parseTool to generate a file of test sentences, and then parse those sentences using the -test_file option.

Your language may have restrictions on the way to enter certain words. This information is described in each of the Language Supplements as appropriate.
To test sentences containing alphabetic characters beyond a–z and A–Z (accented characters, vowels with umlaut symbols such as ü, ß, and so on), you cannot type the characters directly from the keyboard into parseTool.
Instead, use a text editor to create a file, put one test sentence per line, and then pipe the file to parseTool as follows:
cat filename | parseTool grammarfile -test_sent
Where filename is the file of test sentences, and grammarfile is the grammar file you want to test.

If your grammar involves several languages, then when you generate sentences, parsetool adds a language code to any string that is set to a language other than the default. This code is not displayed for strings set to the default language.
For example, if you have a bilingual grammar that allows the word "Peter" or "Pierre," the generated sentences might appear as follows:
peter
pierre!fr-FR
Below, we’ve provided a more detailed example. Consider the following grammar, which sets the default language to Japanese:
<?xml version='1.0' encoding='UTF-8'?>
<grammar xml:lang="ja-JP" version="1.0" root="boolean"
xmlns="http://www.w3.org/2001/06/grammar">
<rule id="boolean" scope="public">
aaa
<one-of>
<item xml:lang="fr-CA" >
<tag>SWI_meaning=true; MEANING=SWI_meaning;
</tag>
frca
</item>
<item xml:lang="en-US" >
<tag>SWI_meaning=true; MEANING=SWI_meaning;
</tag>
enus
</item>
<item xml:lang="ja-JP" >
<tag>SWI_meaning=true;MEANING=SWI_meaning;
</tag>
jajp
</item>
</one-of>
<item xml:lang="ja-JP">bbb</item>
</rule>
</grammar>
Sending this grammar through parsetool with the -gen_sentences option gives a result similar to the following:
PROG installation_path/baseline/bin/parsetool:
arg <spec-filename> == frca.xml
arg <-gen_sentences> == -g_s
aaa jajp bbb
aaa frca!fr-CA bbb
aaa enus!en-US bbb
aaa frca!fr-CA bbb
aaa enus!en-US bbb
aaa jajp bbb
aaa frca!fr-CA bbb
aaa enus!en-US bbb
aaa frca!fr-CA bbb
aaa frca!fr-CA bbb
Because the strings "aaa", "bbb", and "jajp" are set to the default language, they do not cause their language to be displayed. However, the strings "frca” and "enus” are always set to French and English respectively; when they appear in a generated sentence, they are always accompanied with their language codes.
Using test_parser
The test_parser tool allows you to perform interpretation tests on grammars by comparing the correct key/value pairs that get passed to Recognizer with those actually generated.
Note that test_parser only tests keys/values that are set at the root and thus passed back to Recognizer; it cannot test attribute settings from subroot rules.
The tool operates on a test file, each of whose lines defines a test, or directive. Additionally, lines beginning with # are treated as comments. Like parsetool, test_parser accepts an argument of -iso8859 when the input file is not utf-8.
Each test line in the test file must be of the form:
xml_grammar_file sentence_text key_name correct_value_for_key
Item |
Description |
---|---|
xml_grammar_file |
Name of the grammar file. A hyphen (–) uses the previous grammar. |
sentence_text |
Text of the sentence to be recognized (in quotes). Precede the text with a tilde (~) to indicate sentences not allowed by the grammar (sentences that should not parse, or that parse and cause SWI_disallow to be set to 1. |
key_name |
Name of key to test. Precede the text with a tilde (~) to indicate keys to ignore. |
correct_value_for_key |
The expected value for key_name. The test_parser program compares the value actually returned with this value. Place the value in double quotes if it has spaces. |
You can run test_parser with the following command-line options:
- -verbose (-v) prints more details of program operation.
- -debug_output (-d_o) prints details of script execution (see Using parseTool to verify ECMAScript for an example).

For example, say the file direct.test has the following lines:
direct.grxml "send my calls home" SWI_meaning "direct calls home"
- "direct my calls home" ACTION phone
- "direct my calls home" LOCATION home
- "direct my calls home" LOCATION office
## A "negative example"
- ~"hello world"
When this is sent through test_parser, it generates the following results:
> test_parser direct.test
2 errors!!!
0 1
Key values don't match (direct calls,phone).
--Sentence "- "direct my calls home" ACTION phone " returned error 1
Key values don't match (home,office).
--Sentence "- "direct my calls home" LOCATION office" returned error 1
2 errors!!!
6 3.338133
done
The meaning of this output is as follows:
- 2 errors!!! indicates the number of errors found.
- 0 1 indicates that this was the first (1) instance of the test run on thread 0.
- Key values don’t match sections identify the two errors.
- 6 3.338133 indicates that there were six test sentences in the file, and that the average CPU per sentence was 3.33 ms.
The number of errors found and their descriptions are generally the most significant for testing purposes.

You can begin the input test file with an EcmaVars directive line for testing purposes (for example, for writing a regression test suite). With this directive, you can set SWI_vars variables (see SWI_vars) and test their effect on parsing.
For example, suppose that in the SWI_vars text, the grammar birthdate.grxml includes the variable SWI_vars.today. Below is a sample script testing a birthdate grammar, in which the EcmaVars directive passes in a current date and constrains an earliest date using the SWI_vars mechanism:
EcmaVars "SWI_vars.today=102303;SWI_vars.earliestdate=111598"
birthdate.grxml "my birthday is in january" SWI_meaning "01??04"
- "tomorrow" SWI_meaning "102403"
## A "negative example"
- ~"hello world"
Note that you can vary the filename in the first column to test several grammars in the same test file.
Related topics
Related topics
Reference