Getting recognition results

Recognition results are returned in different XML formats, depending on the media type requested by the application. By default, Speech Server requests NLSML (Natural Language Semantic Markup Language).

The voice browser is responsible for mapping the results to the VoiceXML variable application.lastresult$. See the VoiceXML and MRCP standards documentation.

Confidence scores

Dragon Voice and Nuance Recognizer use different techniques to return confidence scores to your application:

For Nuance Recognizer recognition events-Use the VoiceXML “confidence” returned to your application. The value of confidence corresponds to the CONF token (which is also returned).
For Dragon Voice recognition events-To assess confidence scores, ignore “confidence” in the VoiceXML recognition results, and use the NLCONF token in the log events instead (see NLEinnd—QuickNLP interpretation end and NLEplnd—Pipeline end).

Nuance Recognizer results

The following example illustrates a typical case of a user utterance (“I want to go to Pittsburgh”) and the corresponding recognition result, where the system presents more than one possible interpretation.

<?xml version="1.0"?>

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2"

      xmlns:ex="http://www.example.com/example"

      grammar="http://www.example.com/flight">

  <interpretation confidence="0.6">

    <instance>

      <ex:airline>

        <ex:to_city>Pittsburgh</ex:to_city>

      <ex:airline>

    </instance>

      <input mode="speech">

        I want to go to Pittsburgh

      </input>

  </interpretation>

  <interpretation confidence="0.4">

    <instance>

      <ex:airline>

        <ex:to_city>Stockholm</ex:to_city>

      </ex:airline>

    </instance>

    <input>I want to go to Stockholm</input>

  </interpretation>

</result>

To specify the MIME media type of the recognition result, specify the value for the server.mrcp2.osrspeechrecog.mrcpdefaults.VSP.server.osrspeechrecog.result.mediatype parameter.

The optional swirec_result_enable_speech_mode parameter ensures that recognition results conform to the VoiceXML 2.0 specification.

Dragon Voice results: Krypton-only

Krypton-only results have a literal meaning. They never have a semantic intent.

Dragon Voice results: open-dialog

Open-dialog results have a meaning representation that is a combination of standard slots. Any slot can be captured at any turn—it is up to the application to extract (or ignore) the meaning (slots) returned.

To assess confidence scores, ignore “confidence” in the VoiceXML recognition results, and use the NLCONF token in the log events instead (see NLEinnd—QuickNLP interpretation end and NLEplnd—Pipeline end).

The following examples are taken from the pay-bill VoiceXML document shown in Example: open-dialog using Dragon Voice.

NLSML format result

In this example the caller has requested to pay $50 to Visa from his/her checking account.

<?xml version="1.0" encoding="UTF-8"?>
<result grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436">
    <interpretation grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436"

        confidence="99">

        <input mode="voice">pay fifty dollars to visa from my checking account</input>

        <instance>

             <AMOUNT>50.00</AMOUNT>

             <PAYEE>VISA</PAYEE>

             <FROM_ACCOUNT>123412341234</FROM_ACCOUNT>

             <SWI_literal>Pay fifty dollars to visa from my checking account</SWI_literal>

             <SWI_spoken>Pay fifty dollars to visa from my checking account</SWI_spoken>

             <SWI_meaning>

               {"AMOUNT":"50.00","INTENT":"BILL_PAY","FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}

             </SWI_meaning>

        </instance>

    </interpretation>

    <interpretation grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436" 
      confidence="0">

        <input mode="voice">pay fifty dollars to visa from my checking account</input>

        <instance>

            <INTENT confidence="0">CHANGE_FROM_ACCOUNT<INTENT>

            <FROM_ACCOUNT>123412341234</FROM_ACCOUNT>

            <SWI_literal>Pay fifty dollars to visa from my checking account</SWI_literal>

            <SWI_spoken>Pay fifty dollars to visa from my checking account</SWI_spoken>

            <SWI_meaning>

                {"INTENT":"CHANGE_FROM_ACCOUNT","FROM_ACCOUNT":"123412341234"}

            </SWI_meaning>

        </instance>

    </interpretation>

    <interpretation grammar="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436" 
      confidence="0">

        <input mode="voice">pay fifty dollars to visa from my checking account</input>

        <instance>

            <AMOUNT>50.00</AMOUNT>

            <PAYEE>VISA</PAYEE>

            <FROM_ACCOUNT>123412341234</FROM_ACCOUNT>

            <SWI_literal>Pay fifty dollars to visa from my checking account</SWI_literal>

            <SWI_spoken>Pay fifty dollars to visa from my checking account</SWI_spoken>

            <SWI_meaning>

                {"AMOUNT":"50.00","INTENT":"NO_INTENT","FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}

            </SWI_meaning>

         </instance>

     </interpretation>

</result>

Note: The samples are formatted for readability. Newlines and indentation are added. The actual return string is delimited by spaces.

EMMA format result

Notice in this example that the caller has provided in one shot all of the information needed to process a bill payment: amount, date, payee, and account.

System	Thank you for calling, how can I help you?
Caller	Pay thirty dollars to my Visa from checking on February 1, 2018.
System	Thanks, we'll pay ... ... <VISA> ... ... $<30.00> ... ... from account ... ... <123412341234> ... ... on ... ... <20180201>.

<?xml version="1.0" encoding="UTF-8"?>
<emma:emma xmlns:emma="http://www.w3.org/TR/2007/CR-emma-20071211" xmlns:nuance="http://nr11.nuance.com/emma" version="1.0">

   <emma:grammar id="grammar_1" ref="session:33411-PayBill_NR11-2018-02-07-eng-USA-20180207-160436" />

   <emma:one-of id="nbest" emma:disjunction-type="recognition">

      <emma:interpretation id="interp_1" emma:confidence="0.99" emma:grammar-ref="grammar_1"
       emma:tokens="Pay thirty dollars to my visa from checking on February first two thousand
            eighteen"
       emma:mode="voice">

         <INTENT>BILL_PAY</INTENT>

         <AMOUNT conf="1">30.00</AMOUNT>

         <PAYEE conf="1">VISA</PAYEE>

         <FROM_ACCOUNT conf="1">123412341234</FROM_ACCOUNT>

         <DATE conf="1">20180201</DATE>

         <SWI_literal>

           Pay thirty dollars to my visa from checking on February first two thousand eighteen

         </SWI_literal>

         <SWI_spoken>Pay thirty dollars to my visa from checking on February first two thousand eighteen</SWI_spoken>

         <SWI_meaning>{"DATE":"20180201","AMOUNT":"30.00","INTENT":"BILL_PAY",
           "FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}

         </SWI_meaning>

      </emma:interpretation>

      <emma:interpretation id="interp_2" emma:confidence="0.0" emma:grammar-ref="grammar_1"
       emma:tokens="Pay thirty dollars to my visa from checking on February first two thousand
            eighteen"
       emma:mode="voice">

         <INTENT>NO_INTENT</INTENT>

         <AMOUNT conf="1">30.00</AMOUNT>

         <PAYEE conf="1">VISA</PAYEE>

         <FROM_ACCOUNT conf="1">123412341234</FROM_ACCOUNT>

         <DATE conf="1">20180201</DATE>

         <SWI_literal>

           Pay thirty dollars to my visa from checking on February first two thousand eighteen

         </SWI_literal>

         <SWI_spoken>

          Pay thirty dollars to my visa from checking on February first two thousand eighteen

         </SWI_spoken>

         <SWI_meaning>{"DATE":"20180201","AMOUNT":"30.00","INTENT":"NO_INTENT",
           "FROM_ACCOUNT":"123412341234","PAYEE":"VISA"}

         </SWI_meaning>

      </emma:interpretation>

   </emma:one-of>