Grammar performance
These guidelines for detecting and avoiding resource usage problems in your grammars serve as best practices that promote efficiency for voice platforms and applications.
The foremost objective for grammar development is to design for optimal recognition accuracy. The next goal is to write for clarity, maintainability, and extensibility. The third goal is to create efficient recognition contexts.
You can evaluate the first two goals to some extent by using the testing tools described in Testing grammars. The following topics address grammar performance as it applies to the CPU and storage space resources that your grammar uses: the factors that contribute to resource use, and strategies you can use to make your grammars as resource-efficient as possible.
How grammars affect resource usage
Below are grammar characteristics that affect resource usage:
- Coverage: The grammar covers (includes) the phrases you expect the caller to use. Under-coverage leads to an increase in out-of-vocabulary utterances, confirmations, and retries, which all increase CPU usage and call duration.
- Over-generation: It is important that the grammar not over-generate by allowing nonsensical phrases, as this reduces accuracy. For example, a grammar that recognizes a city and state needs to constrain utterances to valid combinations of cities and states.
- Multiple parses: See Multiple parses.
- Keys passed to the application: You must ensure that key/value pairs are set correctly.
- SWI_meaning key: This key can improve efficiency by compiling redundant answers into a single entry on the n-best list. See SWI_meaning.
Performance considerations
The following list summarizes strategies for conserving grammar resources:
- Fetching grammars from web servers: When you load grammars, Recognizer fetches them from a web server and caches them on the local machine. It’s important to configure Cache-Control headers on the web server to inform Recognizer about timers such as expiration and maximum age. Otherwise, performance degrades if Recognizer re-fetches files repeatedly and unnecessarily.
If a web server does not provide expiration information, Recognizer calculates a default behavior (using the Last-Modified stamp) that might not be optimal for your system: the system might re-fetch data more often than needed.
To avoid performance problems, configure all web servers to specify the cache policy for your grammar file types:
- Ensure the web server specifies the proper HTTP/1.1 Cache-Control headers. For example, "Cache-Control: max-age=1440" allows the system to cache fetched data for 24 hours (1440 seconds).
- Choose the cache duration carefully based on each application’s requirements. Be especially careful with dynamically generated grammars: make the duration long enough so that fetches are infrequent, but short enough so that the system acquires updated grammars within a reasonable time frame.
- Caching grammars: After fetching grammars to the local machine, Recognizer compiles them, writes them to the local disk, puts them into memory for an amount of time before replacing them. All these activities are influenced by the caching configuration. See Understanding grammar caching.
Your strategy for managing grammars must include decisions about the costs and benefits of re-fetching, storing in the disk cache, and storing in memory. Keeping grammars in memory is a good strategy when the cost of loading is high (such as when a grammar is large, must compile at runtime, or must be reloaded frequently).
To troubleshoot a caching problem, examine the HTTP/1.1 responses to the grammar fetch requests and examine the expiration settings. By tuning the web server’s Cache-Control headers, you can solve most performance issues.
- Load source grammars: You can load grammars in their SRGS source form and allow Recognizer to compile them at runtime. This strategy is useful for small grammars and grammars generated at runtime (such as grammars that must be customized for each caller). The drawback is the cost of CPU cycles for compiling each time the grammar is loaded.
- Load binary grammars: You can precompile grammars (see Compiling grammars). This strategy is good for large grammars that cause latency problems if compiled at runtime. The drawback is that these grammars are static and cannot change their coverage at runtime. You can also precompile user dictionaries to improve performance; see Compiling a user dictionary.
- Combine source and binary grammars: You can combine precompiled and source grammars using dynamic linked grammars (Dynamic-link grammars). This strategy combines the previous techniques to mitigate their drawbacks, but it requires more planning and maintenance activities due to the modularized design of your grammar libraries.
- Preload grammars: You can load grammars when your application starts. This strategy incurs the costs of fetching, compiling and loading before the application accepts telephone calls (otherwise, the first callers to the application would experience any delays associated with those costs). This technique is only useful for static grammars, since grammars that change at runtime must be recompiled each time they are used. See swirec_preload_file.
- Trade recognition time for faster compilation: When your application requires large lists of items, you can create a wordlist grammar that reduces compilation time at the cost of increased CPU usage during runtime recognition. This strategy is useful when the list changes frequently, or when it is not used frequently enough to keep in memory. For details, see Wordlist (directory-assistance) grammars.
Latency issues
Latency is defined as the period of elapsed time from after the caller stops speaking (including the configured end-of-speech timeout) until a recognition result is returned to the application. When latency is too high, the caller’s experience degrades; the system appears sluggish, which can be frustrating to the user and leads to further user interface complications.
In extreme circumstances, excess latency causes unsuccessful application transactions if callers hang up without accomplishing the goal of their calls. Poor recognition response times can have many contributing factors:
- Use of very large grammars containing hundreds of thousands of items.
- Extremely long average utterance lengths.
- High amounts of ECMAScript processing within the grammar.
- Insufficient system memory, resulting in excessive paging and swapping.
- Extra time for compiler grammars that are not precompiled.
- Extra time when application servers dynamically generate grammars.
- Network delays when fetching grammars.
- Processes on the host machine that are not part of the recognition service.
The first step to finding the source of latency is to measure the response time of your recognition contexts, as discussed below.
Managing performance
The following subsections describe factors in managing performance.

You can use the sgc.exe tool to precompile grammars and evaluate the CPU costs. This evaluation is valuable when testing a grammar to find the most efficient ways of covering the same sentences.
The required compilation time is a factor in determining whether to precompile the grammar, or to load it as a source grammar at runtime. Grammars that require a long compilation time are best precompiled or preloaded, to avoid the delays during telephone calls that result from compiling at runtime.

The simplest way to measure response time is to use the Nuance application reporting tool or use the call log. In the log file, the SWIrcnd (SWI recognition end) event appears for every recognition.
Consult the following tokens in the SWIrcnd event:
- GRMR: Identifies the grammar. Use this token to isolate latency problems to a particular recognition context.
- EOST: Provides the end-of-speech time in milliseconds relative to the time of the first audio packet. The end-of-speech time is the time at which the endpointer declares end-of-speech. This determination includes the value of the incompletetimeout parameter.
- EORT: Provides the end-of-recognition time in milliseconds. Again, this is relative to the time of the first audio packet.
You can measure latency (recognition response time) by subtracting EOST from EORT. However, this does not include the end-of-speech timeout or application processing, which must be added to determine the total user-perceived delays in the system.
Note: Running non-essential software on a Recognizer host can detract from system performance. Avoid using software that is not provided or certified by Nuance.

If the total CPU usage on the system exceeds 80% for a long time (more than a couple of seconds), callers will notice increased recognition delays. Sustained high CPU usage levels can also cause stability problems for your system: for example, buffer overflows might occur if there are not enough CPU cycles available to empty the buffers.
If your system experiences excessive CPU usage, try to narrow down the cause to a particular process. Each operating system has its own tools for measuring CPU usage and identifying processes that use the most cycles. Use these tools to identify the expensive processes. Your investigation might reveal that CPU problems are not related to Recognizer.
A very large grammar increases the search space during recognition, which increases CPU usage on the system. In addition, recognition contexts that allow very long utterances, such as 20-digit strings or long natural-language sentences, will require more CPU for recognition. If the contexts cannot be changed, then it may be necessary to switch to a faster processor.
Application servers can influence CPU usage on a recognition host. For example, if a server sets an immediate expiration time on a fetched grammar, Recognizer is forced to re-fetch that grammar whenever it is activated.
One way to reduce CPU resources at runtime is to precompile grammars. For large grammars, you can precompile the main body of the grammar and bind smaller, more dynamic portions at runtime (see Dynamic-link grammars). Another method is to maximize grammar caches to retain grammars, instead of swapping them between disk and memory or re-fetching them. For an explanation and details, see Understanding grammar caching.

Very large grammars require more system memory. If the total virtual memory usage exceeds physical RAM, then the operating system will begin to page and swap; if paging and swapping become excessive, response times will increase. Similarly, a grammar that uses two or more languages can require more available memory, and may cause lapses. It is therefore strongly recommended that you ensure that you run your system entirely within physical RAM, so that no paging and swapping occur.
If you still have problems after optimizing the memory use of all other processes on the system, and if increasing RAM on the system is not possible, it may be possible to reduce the memory footprint of your recognition contexts by re-writing portions of the grammars. Nuance offers solutions services to assist and instruct grammar developers in this re-writing process.

Reduce the system’s resource use can sometimes decrease accuracy or maintainability. It is best to prioritize accuracy over resource usage.
If resource usage on the system is under control and recognition response times are acceptable, there is no urgent need to improve grammar efficiency.

The n-best list is the number of different interpretations Recognizer will return for an utterance. When it interprets an utterance, Recognizer doesn’t consider just one interpretation. It finds several possible interpretations, and assigns a confidence rating to each of them. The interpretation with the highest confidence rating is used first; but finding the others still takes CPU resources.
You can reduce CPU and memory resource usage by limiting the number of sentences considered by Recognizer when constructing the n-best list. Use the following <meta> parameters to reduce the pool of sentences:

You can reduce user-perceived latency by lowering the value of incompletetimeout, which controls the length of a period of silence after callers have spoken to conclude that they finished.

ECMAScript processing can sometimes contribute to latency. ECMAScript is particularly important to consider because it is processed after the end of an utterance, and thus it has a direct impact on caller-perceived latency.
The CPU usage of the ECMAScript interpreter depends on many variables. For example, the built-in “currency” grammar makes heavy use of ECMAScript. As a consequence, the grammar’s parsing and ECMAScript execution account for a substantial portion of the total CPU usage.

Since ECMAScript is a complete language, it may be tempting to write extensive logic within grammars using ECMAScript. Resist this temptation.
Complex logic is more efficiently executed outside of the grammar, using your higher-level programming environment (C, C++, JSP, VoiceXML, and so on).

There is a CPU overhead required to load-and-interpret each ECMAScript. It is thus more efficient to have one large ECMAScript than several small ones.
The exception to this general rule occurs when you have large, rarely-used blocks of script. In this case, distribute the logic so that these blocks are only compiled and executed when necessary to understand the parse. For example, if the script contains a large, compound IF statement where some of the conditions are rarely executed, try to distribute this logic in the grammar.

During recognition, Recognizer considers the current level of CPU activity before determining the level of CPU to allocate to the current recognition request. If the system is being underused, it may allocate more CPU; if the system is already overloaded, it may allocate less than usual.
See swirec_load_adjusted_speedvsaccuracy for a full discussion of how to set the different levels governing CPU load definitions.
Self-learning feature (acoustic adaptation)
As an additional performance-enhancing feature, Recognizer automatically improves recognition accuracy over time by using high-confidence results to tune the underlying recognition models. This feature uses negligible CPU and memory resources except for a daily update, which typically occurs during low-usage times.
The benefits of self-learning depend on the language being recognized. For languages where little or no benefit is expected, Nuance suppresses the feature by default. Furthermore, adaptation is intended for deployed systems and not recommended when developing and testing grammars.
Do not use the self-learning feature for voice enrollment grammars or any highly-unconstrained grammars (such as a “phoneme loop” grammar). To suppress the feature, use a <meta> element in the grammar header to set the swirec_acoustic_adapt_suppress_adaptation parameter to "1", as follows:
<meta name="swirec_acoustic_adapt_suppress_adaptation" content="1"/>
To suppress acoustic adaptation only when the CPU load is high, use the swirec_load_adjusted_cpu_ranges parameter to define the idle, normal, high, and pegged levels of CPU activity. Then use the swirec_acoustic_adapt_suppress_adaptation parameter to specify the levels at which acoustic adaptation is to be suppressed.
See swirec_load_adjusted_cpu_ranges and swirec_acoustic_adapt_suppress_adaptation.