acc_test
You can create acc_test scripts to test grammars for basic accuracy and generate reports.
The acc_test utility tests grammar accuracy by comparing audio files that have known meanings with their recognition results, and analyzing the results in one or more statistical reports. The utility is located in: %SWISRSDK%\amd64\bin
This utility takes one or more prepared scripts as input. Each script contains instructions for recognition testing, encoded in a proprietary format. The script lists the grammars to load for testing, input audio files, and the correct meanings of each input audio file, the reports to be generated, and so on. As output, the utility generates the specified reports based on the recognition results.
Note: The acc_test utility consumes three licenses in order to run.
Supported audio
Best practice: use end-pointed wave files generated by Recognizer, as these have correct begin and end silence times. Wave files generated or processed by other methods are not recommended.
Format |
Description |
---|---|
audio/basic |
WAV or ulaw |
audio/x-alaw-basic |
alaw |
audio/L16 |
16-bit linear data |
application/x-aurora |
Aurora data (original bitstream) |
application/x-feature |
Aurora data (advanced bitstream for encoding=ES_202_050) |
Usage
acc_test script1 [script2 ...] -local_log filename
[-keep_cache]
[-no_rectest_log]
[-real_time]
[-report_dir directory]
Options
script1 [script2 ...]
One or more scripts written in the proprietary format. If more than one script is specified, each script runs on a separate channel.
-local_log filename
Outputs event logging to the named file.
-keep_cache
Prevents deletion of existing grammar and inet cache directories.
-no_rectest_log
Disables the logging of script commands to the event log. This means that certain events (STARTSCRIPT, ALTdtst and ALTdtnd, SWIdtst, ENDSCRIPT, and others) will not appear in the log.
-real_time
Writes audio in real time.
-report_dir directory
Specifies a directory to which the reports will be written. This directory must already exist: the command will not create it.
Example
> acc_test test.script -no_rectest_log -local_log mylog.log
This command runs the test.script testing script, and logs the results in the mylog.log file. Since -no_rectest_log was used, this log does not include any events that describe the script commands.

A simple sample script appears below:
# Example script. Use pound sign (#) for Comments
# Header (ACC:)
:ACC
# Load grammars
SWIrecGrammarLoad G0 g0.grxml
SWIrecGrammarLoad G1 g1.grxml
SWIrecGrammarLoad G2 g2.grxml
# Define the contexts
context_define context1 500 800
context_add G1 1
context_add G2 1
context_end
context_define context2 200 900
context_add G0 1
context_end
# Open the cumulative files
open utd test_grammar.utd
open errors test_grammar.err
open xmlresult test_grammar_nlsml.xml
xmlresult_media_type application/x-vnd.speechworks.emma+xml
# Test the contexts
context_use context1
transcription blue elephant
meaning {toto}
recognize blue_elephant.ulaw
transcription blue
recognize blue.ulaw
# Reset channel normalization
SWIrecAcousticStateReset
context_use context2
transcription i want to fly from denver to boston at five o'clock
recognize boston_denver_at_5.ulaw
# Generate reports
report summary test_grammar.summary
report confidence test_grammar.confidence
report nbest test_grammar.nbest
report oov test_grammar.oov
report words test_grammar.words
# Close the cumulative files
close utd
close errors
close xmlresult
Header (ACC:)
Each script begins with the following header:
:ACC
This header tells acc_test to call Recognizer, which uses the speech detector to detect end of speech. This means that the utility may declare the end of speech before the end of the file, based on the "incompletetimeout" parameter. Typical input waveform files will have already been endpointed and padded with approximately 200 ms of silence before and after the speech.
Comments
To enter comments anywhere in the script, use a hashmark (#) at the beginning of the line. This character tells acc_test to ignore the rest of the line.
Load grammars
The next section of the script tells acc_test which grammars to load for testing:
SWIrecGrammarLoad G0 g0.grxml
SWIrecGrammarLoad G1 g1.grxml
SWIrecGrammarLoad G2 g2.grxml
Here, each SWIrecGrammarLoad command defines an internal name (for example, G0) and matches it with a grammar to be loaded (g0.grxml):
SWIrecGrammarLoad gname gpath
Here, gname is the name to be assigned to the grammar in the rest of the script, and gpath is the URI to the grammar. This URI must include the full pathname for the grammar relative to the script. In the examples above, all the grammars are assumed to be in the same directory as the script itself. If there is a problem loading the grammar, the script will exit with an error message.
Define the contexts
Once the grammars are loaded, they are used to define grammar contexts:
context_define context1 500 800
context_add G1 1
context_add G2 1
context_end
In this excerpt, the script defines "context1" as a combination of grammars G1 and G2, weights these grammars equally, and sets the confidences thresholds that will be considered low (500) and high (800) on a scale of a thousand.
The commands used to define contexts are:
context_define cname low_thresh high_thresh
This command begins each definition, specifying the name to be used for the context (cname), and setting the low (low_thresh) and high (thresh) confidence thresholds for the context. These threshold limits range from 1 to 1000. Both are required, but you can use the same number for both if desired.
context_add gname weight
This command adds a grammar gname to the current context, assigning it the specified weight within the context. To weight all grammars equally, you assign the same weight.
context_end
This command marks the end of the current context definition.
Open the cumulative files
The acc_test reports represent data which has been accumulated internally. The data is written when the report command is processed. However, some kinds of data are written as the recognitions happen:
- utd: Up-to-date files recording every transaction.
- error: Error files recording occurrences of errors.
- xmlresult: Output of the recognition results in XML format.
The xmlresult format can be further specified using the xml_media_type command, as shown in the example.
- fr: Tracks all false rejections, where Recognizer incorrectly assigns a confidence score below or equal to the low confidence threshold, and thus rejects valid input.
- fa: Tracks all false acceptances, where Recognizer incorrectly assigns a confidence score above the high confidence threshold to an incorrect recognition (the utterance is out-of-grammar, or is an in-grammar utterance that has been incorrectly interpreted as a different in-grammar utterance).
- cpu: Tracks the CPU used for each test recognition.
To activate these files so new results will be written during the current session, the script uses the open command:
open filetype fname
Where filetype is one of the options listed above (utd, error, or xmlresult) and fname is the name and location for the file. If the named files already exist, they will be overwritten with the new results.
These cumulative files can be closed later—normally at the end of the script—by using the close command. See Close the cumulative files for details.
Test the contexts
The testing section specifies the tests themselves. Each subsection uses a context_use command to invoke a context with which to test recognitions, and specifies the tests to be conducted. For example, the sample file above tests the context1 context with two items:
context_use context1
transcription blue elephant
meaning {toto}
recognize blue_elephant.ulaw
transcription blue
recognize blue.ulaw
The context_use command takes one argument, that being the name of the context to be tested (the cname specified in the context_define command). Only one context can be active at one time: each context_use command implicitly deactivates whatever context was active up to that point.
Each test can include the following commands:
- transcription: The transcription of the audio file being used for recognition.
- meaning: The meaning to be assigned to the item when recognized, if this is different from the meaning that will be returned by Recognizer (optional). Enclose the meaning in braces {like this}.
- recognize: The name and location of the audio file to be recognized. In the example, both audio files are in the same directory as the script.
Use the -format option to specify an audio type (the default is 8-bit, 8 KHz ulaw audio). For example:
recognize blue.alaw -format audio/x-alaw-basic
You can use a recognize command to specify text as well:
recognize -format text/plain "blue elephant"
However, acc_test is intended for audio tests, so this is not recommended.
It is strongly recommended that you use end-pointed wave files generated by Recognizer, as these have correct begin and end silence times that the acc_test utility requires. Wave files generated or processed by other methods are not recommended.
The transcription and meaning values are used in reports (see below).
Reset channel normalization
To reset the speaker/channel normalization between tests (in order to simulate the start of a new call, for example), use a SWIrecAcousticStateReset command:
SWIrecAcousticStateReset
Reset recognition count within a script
You can reset the result count at any point with a context_reset command:
context_reset
This command erases the results from all recognitions performed up to this point, so they are not counted for subsequent reports. You will probably only want to do this if you have already generated one set of reports (see below), and want to reinitialize for the next set of reports.
Generate reports
Once the tests are complete, you can use the results to write several different kinds of reports, using a separate command for each desired report:
report rtype reportfile
Here, the rtype is the type of report to be generated, while the reportfile specifies the location and name for the report. The name can include an environment variable (for example, %fname.err%). You can use a report command anywhere in the script. Usually, reports are generated before a context_reset or just before the script’s end. See Reset recognition count within a script.
The types of reports recommended for your testing include:
- summary: Provides an overall summary of the results for each context.
- nbest: Shows where on the nbest list the correct answer occurred.
- oov: Lists the words found to be out-of-vocabulary, by count.
- words: Evaluates the overall accuracy of recognition for each word.
- confidence: Lists the confidences of recognitions for each context.
You stop writing to the cumulative files using the close command:
close filetype
Normally these files will be closed at the end of the script, as in the example. Only one file of each type can be open at a time, so there is no need to specify the file name when you close. You can make any number of cumulative files, but when you open a second file of a type, the first closes automatically.

The acc_test utility is capable of generating many different types of reports. This section describes the five most common recommended reports.
Summary report
A summary report summarizes the results of a test according to context. Information for each context appears on a separate line. The fields are:
- Name: The name of the context.
- #utts: The total number of utterances that were used to test the context (including all in and out of vocabulary utterances).
- %err(iv): The error rate, which is the percentage of misrecognitions (including failures), divided by the total number of in-vocabulary utterances.
- %cr(iv): The correct reject rate. This is the percentage of correct recognitions which fell below the low confidence threshold.
- %fa_in(iv): The false acceptance rate. This is the number of mis-recognized in-vocabulary utterances which had a confidence above the high threshold.
- %oov(tot): The percentage of out-of-vocabulary utterances.
- %fa_out(oov): The percentage of out-of-vocabulary utterances which had a confidence score above the high threshold.
N-best report
An n-best report counts the ranking of correct recognitions on the nbest list. It includes a separate section for each tested context. The fields are:
- Context: The name of the context.
- iv utts: The total number of in-vocabulary utterances for the context.
- nbest n: The total number of in-vocabulary utterances that were ranked nth on the n-best list (there can be several lines in each section, one for each n).
- total nbest inclusion: The ratio of total nbest results (sum of all nbest n lines) to the total in-vocabulary utterances (iv utts).
Out-of-vocabulary report
For each tested context, an out-of-vocabulary (oov) report lists all utterance transcriptions that were found to be out-of-vocabulary, sorted in descending order by count so the most common oov utterance transcriptions appear at the top. The format is:
context: cname
numbertranscription
...
Here, cname is the name of the context. Each line under this heading lists the number of times (number) each oov transcription (transcription) appeared:
context: context1
12 red squirrel
9 purple snake
7 chartreuse pig
Words report
The words report has two parts for each context. First, for every utterance that appeared in the transcriptions or in the recognized text, it gives a report on accuracy for that word (based on dynamic string alignment). For example:
elephant
total: 6
good: 4 (0.67)
sub away: 2 (0.33)
del: 0 (0.00)
---
sub to: 3
ins: 0
Here, “elephant” is the word. The fields in this first section are:
- total: The number of times this utterance appeared in the transcriptions.
- good: How many of these occurrences aligned correctly (the fraction in parentheses is this number divided by the total).
- sub_away: How many occurrences aligned incorrectly to another utterance. Again, the fraction in parentheses is this number divided by the total.
- del: How many occurrences of the utterance were deleted.
- sub to: How many times some other utterance aligned with the utterance.
- ins: How many times the utterance was inserted.
The second part of the report shows confusion (utterance pairings which the alignment algorithm said were substitutions), sorted by count. For example:
CONFUSIONS
6 Boston -> Austin
3 Boston -> Houston
Confidence report
This report provides an overview of how different confidence thresholds affect the recognition results for a given context, so you can determine an acceptable tradeoff between recognition accuracy and utterance rejection rates.
The first part of the report specifies the confidence thresholds to set in order to generate correct acceptance rates of 10% through to 90%, and lists the rates of false acceptances at each level. The second part specifies the confidence thresholds to set in order to generate false rejection rates of 5%, 2%, 1%, 0.5%, and 0.1%, and lists the correct rejection rates experienced at each level.
mycontext: 38 inv correct, 1 inv incorrect, 20 oov utts
10% CA at 990 FA(inv) = 0.0% FA(oov) = 0.0%
20% CA at 986 FA(inv) = 0.0% FA(oov) = 0.0%
30% CA at 984 FA(inv) = 0.0% FA(oov) = 0.0%
40% CA at 979 FA(inv) = 0.0% FA(oov) = 0.0%
50% CA at 976 FA(inv) = 0.0% FA(oov) = 0.0%
60% CA at 971 FA(inv) = 0.0% FA(oov) = 0.0%
70% CA at 964 FA(inv) = 0.0% FA(oov) = 5.0%
80% CA at 953 FA(inv) = 0.0% FA(oov) = 5.0%
90% CA at 819 FA(inv) = 100.0% FA(oov) = 10.0%
5% FR at 693 CR(inv) = 0.0% CR(oov) = 75.0%
2% FR at 370 CR(inv) = 0.0% CR(oov) = 65.0%
1% FR at 370 CR(inv) = 0.0% CR(oov) = 65.0%
.5% FR at 370 CR(inv) = 0.0% CR(oov) = 65.0%
.1% FR at 370 CR(inv) = 0.0% CR(oov) = 65.0%
context-name total-invocab-correct total-invocab-incorrect total-oovocab
FA(inv) = 100.0 * tot_fa_iv_incorrect / tot_iv_incorrect
FA(oov) = 100.0 * tot_fa_oov / tot_oov
CR(inv) = 100.0 * tot_cr_iv_incorrect / tot_iv_incorrect
CR(oov) = 100.0 * tot_cr_oov / tot_oov
The report begins with a list of the total results (correct in-vocabulary utterances, incorrect acceptances, and total rejected utterances).
It then lists the confidence scores required to achieve the specified percent of correct acceptances. For example, the report above shows that to obtain a correct acceptance rate of only 10%, the required confidence level is 990 (out of 1000). To obtain a correct acceptance rate of 20%, the required confidence level is 986. Both confidence levels result in a false acceptance rate of 0%.