Writing a grammar file header

As mentioned in Grammar file structure, the grammar file header specifies crucial information about the grammar document, including:

The XML declaration and encoding type
The language
The namespace and root rule
The semantic tag format
The mode

For full details on the GrXML elements in the header and their attributes, refer to the SRGS specification.

Controlling languages in a grammar

Recognizer uses language to convert the text of a word in the grammar into a recognizable pronunciation. Since words spelled the same way can still be pronounced differently in different languages, it’s important to specify the language to use when Recognizer interprets items.

You can specify the language in the following ways:

Method	Description
In the grammar document header	In the grammar header you must specify a default language for the grammar and its imported grammars. The language can still be overridden locally by using the xml:lang attribute, or importing a subgrammar of another language.
When using a built-in grammar or a user dictionary.	When you invoke a built-in grammar, you can specify a language other than the current default.
Using an xml:lang attribute in a tag within the grammar main body	You can override the default language for individual vocabulary items within a grammar.

Recognizer allows you to activate grammars of different languages at the same time. This simultaneous activation invokes parallel grammars.

An application that recognizes more than one language at a time is known as a multi-language application. You can build a multi-language application with different grammars for each language, or you can embed different languages in a single grammar. Recognizer uses more memory for each added language.

Setting the language in the grammar header

Every voice grammar must specify a language in the xml:lang attribute of the <grammar> element, as shown in the example below:

<grammar xml:lang="fr-FR" version="1.0" root="ROOT"

xmlns="http://www.w3.org/2001/06/grammar">

The language must be compatible with the encoding type specified in the XML declaration. By default, Recognizer supports en-US; other languages must be installed separately.

Applications can recognize speech in any language that has been installed on Recognizer. The following table lists the language codes as used by Recognizer. The list of supported languages is continuously growing; for the most current information, refer to Nuance Network at Nuance Network.

LangCode	Meaning
ca-ES	Catalan, Spain
cn-HK	Cantonese, Hong Kong
cs-CZ	Czech, Czech Republic
da-DK	Danish, Denmark
de-AT	German, Austria
de-CH	German, Switzerland
de-DE	German, Germany
el-GR	Greek, Greece
en-AU	English, Australia
en-GB	English, Great Britain
en-IN	English, India
en-SG	English, Singapore
en-US	English, United States
en-ZA	English, South Africa
es-AR	Spanish, Argentina
es-CO	Spanish, Colombia
es-ES	Spanish, Spain
es-US	Spanish, United States (includes Mexico)
eu-ES	Basque, Spain
fi-FI	Finnish, Finland
fr-BE	French, Belgium
fr-CA	French, Canada
fr-FR	French, France
he-IL	Hebrew, Israel
hi-IN	Hindu, India
hu-HU	Hungarian, Hungry
it-IT	Italian, Italy
ja-JP	Japanese, Japan
ko-KR	Korean, Korea
nl-BE	Flemish, Belgium
nl-NL	Dutch, Netherlands
no-NO	Norwegian, Norway
pl-PL	Polish, Poland
pt-BR	Portuguese, Brazil
pt-PT	Portuguese, Portugal
ru-RU	Russian, Russia
sk-SK	Slovak, Slovakia
sl-SI	Slovenian, Slovenia
sv-SE	Swedish, Sweden
tr-TR	Turkish, Turkey
zh-CN	Mandarin, Mainland China
zh-TW	Mandarin, Taiwan

If the grammar specifies strings other than these codes, Recognizer attempts to map the strings to an installed language. This attempted mapping is controlled by swirec_language_translation_table.

DTMF grammars

The mode attribute of the <grammar> element lets you specify whether the grammar will be a voice grammar that supports both speech or a DTMF grammar (Dual-Tone Multi-Frequency) that supports input from the number pad. All scripting that applies to speech grammars applies to DTMF grammars as well.

The terminals of DTMF grammars must be drawn from the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, *, #. A, B, C, D}. Any DTMF grammar that tries to use other terminals will cause an error when it is loaded.

Here is a sample DTMF grammar that accepts an unlimited number of digits:

<?xml version='1.0' encoding='UTF-8'?>

<grammar xml:lang="en-US" mode="dtmf" version="1.0"

  tag-format="swi-semantics/1.0"

  root="_digits"

  xmlns="http://www.w3.org/2001/06/grammar">

 <rule id="_digits" scope="public">

  <ruleref uri="#DIGIT_STRING" />

  <tag>

    MEANING=DIGIT_STRING.SWI_literal.replace(/ /g, '')

  </tag>

 </rule>

 <rule id="DIGIT_STRING">

  <item repeat="1-">

   <ruleref uri="#DIGIT"/>

  </item>

 </rule>

 <rule id="DIGIT">

  <one-of>

   <item>0</item>

   <item>1</item>

   <item>2</item>

   <item>3</item>

   <item>4</item>

   <item>5</item>

   <item>6</item>

   <item>7</item>

   <item>8</item>

   <item>9</item>

  </one-of>

 </rule>

</grammar>

Other header elements (<lexicon>, <meta>, and <metadata>)

Grammar file structure describes the main XML declaration and the <grammar> element, both of which are required in any grammar document.

There are three other GrXML elements that can be used in the header. All three elements must be children of the <grammar> element if you use them.

Writing a grammar file header

Controlling languages in a grammar

DTMF grammars

Other header elements (<lexicon>, <meta>, and <metadata>)

Related topics