Testing grammars

The parseTool and test_parser, described in detail below, are tools specifically designed for testing grammars. Both are shipped with Recognizer, and are stored in the %SWISRSDK%\bin directory.

Also useful are the following utilities:

acc_test for recognition accuracy testing, which takes one or more prepared scripts as input.
dicttest for checking dictionary pronunciations.

Using parseTool

Use the parseTool program test a single grammar interactively. It lets you type sentences into grammar to see how the grammar handles them. You can test grammar coverage, interpretation, ambiguity, and overgeneration.

To use parseTool, navigate to the %SWISRSDK%\bin directory, and enter a command with the following format at the prompt:

parseTool grammarfile.grxml [option1arg1] [option2arg2] [...]

Where grammarfile is the path and name of the grammar file to be tested. The options most often used for regular GrXML grammars are described in the table that follows:

Note: Some parseTool options only apply for natural language grammars.

Option	Description
-debug_output	Prints information about ECMAScript operations. Used with -test_sentences and -test_file. Can be abbreviated to -d_o.
-dump_parser filename	Prints parser information to the specified file.
-gen_file filename	Generates random output sentences from the grammar to the specified file. Use with -max_gen to specify a number of sentences to be generated.
-gen_sentences	Generates random output sentences from the grammar. Used to detect overgeneration. Use with -max_gen to specify a number of sentences to be generated. Can be abbreviated to -g_s.
-iso8859	Specifies the encoding format of the input and output files as ISO-8859. Used to override UTF-8 format when UTF-8 is the default.
-max_gen	Specifies how many sentences to generate. Used with -gen_sentences and -gen_file.
-media_type	Specifies the media type of the grammar. The value is either "application/x-vnd.speechworks.emma+xml" or "application/x-vnd.speechworks.recresult+xml".
-no_pretty	Prints the parse result with no formatting (that is, as a continuous line of text).
-no_script_check	Disables validity checking of the grammar.
-s	Enables silence mode, which stops the printing of argument information at the beginning of output.
-test_file filename	Specifies an input file with test sentences (one sentence per line).
-test_sentences	Enables input of sentences to test the grammar. You can type sentences from the keyboard (the default) or specify an input file (using the -test_file option). The tool evaluates each sentence and shows whether it is covered by the grammar. Can be abbreviated to -t_s.
-utf8	Specifies the encoding format of the input and output files as UTF-8. Used to override ISO-8859 format when ISO-8859 is the default.
-utt	Enables input of audio files to be parsed. Only audio/basic files may be input. Use this option with –test_sentences. You cannot use this option with -test_file. With this option, you can specify audio files in addition to typing sentences (see below). The syntax is <filename (the angle bracket is required, and no whitespace is allowed between the bracket and the filename).
-verbose	Prints additional parse details.

Note: You can put the parseTool options in any order on the command line.

Using test_parser

The test_parser tool allows you to perform interpretation tests on grammars by comparing the correct key/value pairs that get passed to Recognizer with those actually generated.

Note that test_parser only tests keys/values that are set at the root and thus passed back to Recognizer; it cannot test attribute settings from subroot rules.

The tool operates on a test file, each of whose lines defines a test, or directive. Additionally, lines beginning with # are treated as comments. Like parsetool, test_parser accepts an argument of -iso8859 when the input file is not utf-8.

Each test line in the test file must be of the form:

xml_grammar_file sentence_text key_name correct_value_for_key

Item	Description
xml_grammar_file	Name of the grammar file. A hyphen (–) uses the previous grammar.
sentence_text	Text of the sentence to be recognized (in quotes). Precede the text with a tilde (~) to indicate sentences not allowed by the grammar (sentences that should not parse, or that parse and cause SWI_disallow to be set to 1.
key_name	Name of key to test. Precede the text with a tilde (~) to indicate keys to ignore.
correct_value_for_key	The expected value for key_name. The test_parser program compares the value actually returned with this value. Place the value in double quotes if it has spaces.

You can run test_parser with the following command-line options:

-verbose (-v) prints more details of program operation.
-debug_output (-d_o) prints details of script execution (see Using parseTool to verify ECMAScript for an example).

Example of test_parser use

For example, say the file direct.test has the following lines:

direct.grxml "send my calls home" SWI_meaning "direct calls home"

- "direct my calls home" ACTION phone

- "direct my calls home" LOCATION home

- "direct my calls home" LOCATION office

## A "negative example"

- ~"hello world"

When this is sent through test_parser, it generates the following results:

> test_parser direct.test

2 errors!!!

0 1

Key values don't match (direct calls,phone).

--Sentence "- "direct my calls home" ACTION phone " returned error 1

Key values don't match (home,office).

--Sentence "- "direct my calls home" LOCATION office" returned error 1

2 errors!!!

6 3.338133

done

The meaning of this output is as follows:

2 errors!!! indicates the number of errors found.
0 1 indicates that this was the first (1) instance of the test run on thread 0.
Key values don’t match sections identify the two errors.
6 3.338133 indicates that there were six test sentences in the file, and that the average CPU per sentence was 3.33 ms.

The number of errors found and their descriptions are generally the most significant for testing purposes.

Testing grammars

Using parseTool

Using test_parser

Related topics