Getting recognition results
Recognition results are returned in different XML formats, depending on the media type requested by the application. By default, Speech Server requests NLSML (Natural Language Semantic Markup Language).
The voice browser is responsible for mapping the results to the VoiceXML variable application.lastresult$. See the VoiceXML and MRCP standards documentation.
Confidence scores
Dragon Voice and Nuance Recognizer use different techniques to return confidence scores to your application:
- For Nuance Recognizer recognition events-Use the VoiceXML “confidence” returned to your application. The value of confidence corresponds to the CONF token (which is also returned).
- For Dragon Voice recognition events-To assess confidence scores, ignore “confidence” in the VoiceXML recognition results, and use the NLCONF token in the log events instead (see NLEinnd—QuickNLP interpretation end and NLEplnd—Pipeline end).
Nuance Recognizer results
The following example illustrates a typical case of a user utterance (“I want to go to Pittsburgh”) and the corresponding recognition result, where the system presents more than one possible interpretation.
<?xml version="1.0"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
xmlns:ex="http://www.example.com/example"
grammar="http://www.example.com/flight">
<interpretation confidence="0.6">
<instance>
<ex:airline>
<ex:to_city>Pittsburgh</ex:to_city>
<ex:airline>
</instance>
<input mode="speech">
I want to go to Pittsburgh
</input>
</interpretation>
<interpretation confidence="0.4">
<instance>
<ex:airline>
<ex:to_city>Stockholm</ex:to_city>
</ex:airline>
</instance>
<input>I want to go to Stockholm</input>
</interpretation>
</result>
To specify the MIME media type of the recognition result, specify the value for the server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server.osrspeechrecog.result.mediatype parameter.
The optional swirec_result_enable_speech_mode parameter ensures that recognition results conform to the VoiceXML 2.0 specification.

The "My_Cities" grammar is used as the basis for the examples in this section:
<?xml version='1.0' encoding='ISO-8859-1'?>
<grammar xml:lang="en-US" version="1.0" root="_root"
tag-format='swi-semantics/1.0'
xmlns="http://www.w3.org/2001/06/grammar">
<rule id="_root" scope="public">
<one-of>
<item> austin
<tag>city='Austin';state='TX'</tag>
</item>
<item> boston
<tag>city='Boston';state='MA'</tag>
</item>
</one-of>
</rule>
</grammar>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.
For an example of a word lattice, see Getting raw recognition results.

<?xml version='1.0'?>
<result>
<interpretation grammar="My_Cities" confidence="99">
<input mode="speech">austin</input>
<instance>
<city confidence="99">Austin</city>
<state confidence="99">TX</state>
</instance>
</interpretation>
<interpretation grammar="My_Cities" confidence="95>
<input mode="speech">boston</input>
<instance>
<city confidence="95">Boston</city>
<state confidence="95">MA</state>
</instance>
</interpretation>
</result>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.

<?xml version='1.0'?>
<emma:emma version="1.0"
xmlns:emma="http://www.w3.org/2005/09/emma">
<emma:one-of id="nbest" emma:disjunction-type="recognition" emma:duration="924">
<emma:interpretation id="interp_1" emma:confidence="0.52"
emma:grammar-ref="grammar_1" emma:tokens="Austin">
<city conf="0.52">Austin</city>
<state conf="0.52">TX</state>
<SWI_literal>Austin</SWI_literal>
<SWI_spoken>Austin</SWI_spoken>
<SWI_meaning>{city:Austin state:TX}</SWI_meaning>
</emma:interpretation>
<emma:interpretation id="interp_2" emma:confidence="0.45"
emma:grammar-ref="grammar_1" emma:tokens="Boston">
<city conf="0.45">Boston</city>
<state conf="0.45">MA</state>
<SWI_literal>Boston</SWI_literal>
<SWI_spoken>Boston</SWI_spoken>
<SWI_meaning>{city:Boston state:MA}</SWI_meaning>
</emma:interpretation>
</emma:one-of>
<emma:grammar id="grammar_1" href="My_Cities"/>
</emma:emma>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.

<result>
<interpretation conf="0.99">
<text mode="voice">austin</text>
<instance grammar="My_Cities">
<city conf="0.99">Austin</city>
<state conf="0.99">TX</state>
</instance>
</interpretation>
<interpretation conf="0.95">
<text mode="voice">boston</text>
<instance grammar="My_Cities">
<city conf="0.95">Boston</city>
<state conf="0.95">MA</state>
</instance>
</interpretation>
</result>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.
Dragon Voice results: Krypton-only
Krypton-only results have a literal meaning. They never have a semantic intent.

For Krypton-only results, use SWI_literal and SWI_spoken to retrieve the transcription. (Ignore the semantic keys <INTENT> and <SWI_meaning>. Their value will always be RECOGNITION_ONLY.)
<?xml version="1.0" encoding="UTF-8"?>
<result grammar="session:qnlp-usaa">
<interpretation grammar="session:qnlp-usaa" confidence="72">
<input mode="voice">How much did I spend on March thirty first</input>
<instance>
<INTENT confidence="72">RECOGNITION_ONLY</INTENT>
<SWI_literal>How much did I spend on March thirty first</SWI_literal>
<SWI_spoken>How much did I spend on March 31</SWI_spoken>
<SWI_meaning>{"INTENT":"RECOGNITION_ONLY"}</SWI_meaning>
</instance>
</interpretation>
</result>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.

For Krypton-only results, use SWI_literal and SWI_spoken to retrieve the transcription. (Ignore the semantic keys <INTENT> and <SWI_meaning>. Their value will always be RECOGNITION_ONLY.)
To generate this example output, the VoiceXML application loaded two builtins, assigned their weights, and performed a recognition.
<emma:emma xmlns:emma="http://www.w3.org/TR/2007/CR-emma-20071211" xmlns:nuance="http://nr11.nuance.com/emma" version="1.0">
<emma:grammar id="grammar_1" ref="session:http://myPath/vxml_dp_models/?nlptype=config&builtin_VERT_FINANCIAL_SERVICES_weight=0.1&builtin_DIGITS_weight=medium&sp4_004_DLM_weight=0.1 -1 -1 10000" />
<emma:one-of id="nbest" emma:disjunction-type="recognition">
<emma:interpretation id="interp_1" emma:confidence="0.87" emma:grammar-ref="grammar_1" emma:tokens="I want to pay a bill" emma:mode="voice">
<INTENT>RECOGNITION_ONLY</INTENT>
<SWI_literal>I want to pay a bill</SWI_literal>
<SWI_spoken>I want to pay a bill</SWI_spoken>
<SWI_meaning>{"INTENT":"RECOGNITION_ONLY"}</SWI_meaning>
</emma:interpretation>
</emma:one-of>
</emma:emma>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.
Dragon Voice results: open-dialog
Open-dialog results have a meaning representation that is a combination of standard slots. Any slot can be captured at any turn—it is up to the application to extract (or ignore) the meaning (slots) returned.
To assess confidence scores, ignore “confidence” in the VoiceXML recognition results, and use the NLCONF token in the log events instead (see NLEinnd—QuickNLP interpretation end and NLEplnd—Pipeline end).
The following examples are taken from the pay-bill VoiceXML document shown in Example: open-dialog using Dragon Voice.

In this example the caller has requested to pay $50 to Visa from his/her checking account.
<?xml version="1.0" encoding="UTF-8"?> <result grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436">
<interpretation grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436"
confidence="99">
<input mode="voice">pay fifty dollars to visa from my checking account</input>
<instance>
<AMOUNT>50.00</AMOUNT>
<PAYEE>VISA</PAYEE>
<FROM_ACCOUNT>123412341234</FROM_ACCOUNT>
<SWI_literal>Pay fifty dollars to visa from my checking account</SWI_literal>
<SWI_spoken>Pay fifty dollars to visa from my checking account</SWI_spoken>
<SWI_meaning>
{"AMOUNT":"50.00","INTENT":"BILL_PAY","FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}
</SWI_meaning>
</instance>
</interpretation>
<interpretation grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436" confidence="0">
<input mode="voice">pay fifty dollars to visa from my checking account</input>
<instance>
<INTENT confidence="0">CHANGE_FROM_ACCOUNT<INTENT>
<FROM_ACCOUNT>123412341234</FROM_ACCOUNT>
<SWI_literal>Pay fifty dollars to visa from my checking account</SWI_literal>
<SWI_spoken>Pay fifty dollars to visa from my checking account</SWI_spoken>
<SWI_meaning>
{"INTENT":"CHANGE_FROM_ACCOUNT","FROM_ACCOUNT":"123412341234"}
</SWI_meaning>
</instance>
</interpretation>
<interpretation grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436" confidence="0">
<input mode="voice">pay fifty dollars to visa from my checking account</input>
<instance>
<AMOUNT>50.00</AMOUNT>
<PAYEE>VISA</PAYEE>
<FROM_ACCOUNT>123412341234</FROM_ACCOUNT>
<SWI_literal>Pay fifty dollars to visa from my checking account</SWI_literal>
<SWI_spoken>Pay fifty dollars to visa from my checking account</SWI_spoken>
<SWI_meaning>
{"AMOUNT":"50.00","INTENT":"NO_INTENT","FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}
</SWI_meaning>
</instance>
</interpretation>
</result>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.

Notice in this example that the caller has provided in one shot all of the information needed to process a bill payment: amount, date, payee, and account.
System |
Thank you for calling, how can I help you? |
Caller |
Pay thirty dollars to my Visa from checking on February 1, 2018. |
System |
Thanks, we'll pay ... ... <VISA> ... ... $<30.00> ... ... from account ... ... <123412341234> ... ... on ... ... <20180201>. |
<?xml version="1.0" encoding="UTF-8"?> <emma:emma xmlns:emma="http://www.w3.org/TR/2007/CR-emma-20071211" xmlns:nuance="http://nr11.nuance.com/emma" version="1.0">
<emma:grammar id="grammar_1" ref="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436" />
<emma:one-of id="nbest" emma:disjunction-type="recognition">
<emma:interpretation id="interp_1" emma:confidence="0.99" emma:grammar-ref="grammar_1" emma:tokens="Pay thirty dollars to my visa from checking on February first two thousand eighteen" emma:mode="voice">
<INTENT>BILL_PAY</INTENT>
<AMOUNT conf="1">30.00</AMOUNT>
<PAYEE conf="1">VISA</PAYEE>
<FROM_ACCOUNT conf="1">123412341234</FROM_ACCOUNT>
<DATE conf="1">20180201</DATE>
<SWI_literal>
Pay thirty dollars to my visa from checking on February first two thousand eighteen
</SWI_literal>
<SWI_spoken>Pay thirty dollars to my visa from checking on February first two thousand eighteen</SWI_spoken>
<SWI_meaning>{"DATE":"20180201","AMOUNT":"30.00","INTENT":"BILL_PAY", "FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}
</SWI_meaning>
</emma:interpretation>
<emma:interpretation id="interp_2" emma:confidence="0.0" emma:grammar-ref="grammar_1" emma:tokens="Pay thirty dollars to my visa from checking on February first two thousand eighteen" emma:mode="voice">
<INTENT>NO_INTENT</INTENT>
<AMOUNT conf="1">30.00</AMOUNT>
<PAYEE conf="1">VISA</PAYEE>
<FROM_ACCOUNT conf="1">123412341234</FROM_ACCOUNT>
<DATE conf="1">20180201</DATE>
<SWI_literal>
Pay thirty dollars to my visa from checking on February first two thousand eighteen
</SWI_literal>
<SWI_spoken>
Pay thirty dollars to my visa from checking on February first two thousand eighteen
</SWI_spoken>
<SWI_meaning>{"DATE":"20180201","AMOUNT":"30.00","INTENT":"NO_INTENT", "FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}
</SWI_meaning>
</emma:interpretation>
</emma:one-of>
</emma:emma>
Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.