What is the Multimodal Content Editor?

Modality refers to various modes through which a bot can deliver the content to the user.

For example, Amazon Alexa is available in different modes:

  • Amazon echo and echo dot are voice-only devices

  • Amazon echo show and echo spot are smart display devices

Multimodal interaction makes use of several inputs and/or output modalities.

Multimodal Content Editor

Administrators can use the Multimodal Content Editor (MMCE) to customize content that is voiced and displayed in various smart devices. For example, instead of a bot reading the given text in a monotonous tone, you can add effects or modulation to the language, or show an image or a video.

For example, a user might ask the bot What are the symptoms of chickenpox? By specifying relevant information to all the tabs in MMCE, the user receives information in voice-only on a smart speaker, and videos or images with a voice on a smart display.

The following screens use MMCE:

  1. Prototype

  2. Content

  3. Calendars

  4. Dialogs

  5. Taxonomy

  6. Data

  7. Survey

  8. Say Node in Experience Designer

You have the following modes in individual tabs:

  • Voice Tab

  • Text Tab

  • Screen Tab

  • Button Tab

Voice Tab

Content in the Voice tab is converted to voice and voiced to the user. It has two parts: Say Text and Reprompt. After the Say Text is voiced, if the bot does not receive any response from the user for a limited time, the bot triggers voicing of the Reprompt text.

You have the following options for Say Text and Reprompt content:

  • HTML Source View

  • Insert Audio

  • Break

  • SayAs

  • W tag

  • Phoneme

  • Emphasize

  • Prosody

  • Sub

  • Amazon effect

  • Spellchecker

  • Voice simulator

  • Token

HTML Source View

View the HTML coding of your content. The HTML tag structure is shown at the bottom of the editor window.

You can view and edit the underlying HTML coding directly by clicking Source. For example:

<p>What is your favorite season?</p>

You can add, edit, or paste the valid HTML code into the Source view.

Audio

Insert an audio file to the content editor.

  1. Click Audio. The Audio Properties dialog box appears. To use the Audio Url field, see How do I link to an external audio file?

  2. Click Choose File.

  3. Locate and select your audio clip. (MP3, WAV, or OGG are supported)

  4. Double click the file (or click Open). The file appears next to Choose File.

  5. Click Send it to the Server. The URL is automatically filled in for you.

  6. Optionally click Preview to listen to the audio clip.

  7. Click OK to add the audio clip to your content. Click Cancel to exit without adding the audio clip.

Break

You may want to add pauses in speech to make the audio voice more natural, or to provide particular emphasis. Orbita synthesized voice adds pauses where punctuation exists but you can modify this further. To do this, click Break to display its dialog box.

  1. Click Break. The Break Tag Properties dialog box appears.

  2. Select the strength from the drop-down.

  3. Click Audio. The Audio Properties screen appears.

  4. Select the unit (seconds or milliseconds) from the drop-down.

  5. Give the time in the corresponding unit, selected earlier, in the Time field

  6. Click OK.

SayAs

Sometimes text can be read and said differently. For example:

The telephone number is 555-1212.

  • If you select Interpret As: > telephone, the audio says, “…five five five, one two one two.”

  • If you select Interpret As: > digits, the audio says, “…five hundred fifty-five, one thousand two hundred twelve.

The SayAs feature lets you customize natural language speech with the following properties: characters, cardinal, ordinal, digits, fraction, date, time, telephone, address, and interjection.

  1. Click SayAs. The SayAs Tag Properties dialog box appears.

  2. Select the text in the content editor (such as 555-1212).

  3. Select an option from Interpret As.

  4. Click OK.

W Tag

Sometimes a word has more than one pronunciation and more than one meaning. For example:

The leader led the lead balloon parade.

The audio voice defaults to saying “leed” balloon, but in this case, the lead should sound like “led.” To get the speech correct, select the word and click W Tag.

In the W Tag dialog box, select the Role that stipulates the correct pronunciation of the word. Role options include Word as a verb, Word as a past tense, Word as a noun, and Non-default sense of the word. In this case, Non-default sense of the word changes the pronunciation.

  1. Click W Tag. The W Tag dialog box appears.

  2. Select a Role.

Phoneme

Some words can be pronounced differently. For example, You say tomato (toe-may-toe), I say tomato (tah-mah-toe). You can modify the pronunciation by selecting a consonant or vowel in a word and clicking Phoneme.

See Speech Synthesis Markup Language (SSML) Reference > Phoneme > Supported Symbols (click supported symbols in the dialog box) for examples of consonant and vowel speech pronunciation.

Emphasize

Emphasize changes, rate, and volume of the speech. More emphasis is spoken louder and slower. Less emphasis is quieter and faster.

Prosody

Prosody is the volume, pitch, and rate of the tagged speech so that you can achieve the speech intonation or effects that you want. For example, enter a sentence and play the audio by clicking Voice Simulator. Then highlight a word in the sentence, and click Prosody Tag. Change the values for Rate, Volume, and Pitch, click OK, and click Voice Simulator again to hear the difference.

Sub

The Sub lets you substitute a pronunciation for a text that might be read in another way. For example, mg should be pronounced milligrams. Other examples: lbs. (pounds), Mb (megabytes), and so on.

Amazon effect

Amazon effect affects only Amazon devices and has no effect on non-Amazon devices. See https://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html for information about the types of enhancement effects that you can implement with Amazon devices.

Spellchecker

Checks the spellings and the sentences written in the content editor.

The Spell check menu shows options for improving your content. SCAYT stands for Spell check as you type. If you enable this feature, you have other options.

  • Options. Enable or disable Ignore All-Caps Words, Ignore Domain Names, Ignore Word with Mixed Case, and Ignore Words with Numbers.

  • Languages. Select the default language for the spell checker.

  • Dictionaries. Create a custom dictionary in which you can include your organizational terms.

  • About SCAYT. Displays the version number of the spell checker.

  • Check Spelling. Invokes a separate utility for checking spelling, grammar, and Thesaurus terms for your content.

Voice Simulator

Click Voice Simulator to hear the content read with the modifications you may have attached.

Token

Select dynamic text to the content editor by using Token. For example, if you select the Username in the token drop-down, it inserts a mustache tag in the content editor which will dynamically take the logged-in username.

For information about the options in the Voice tab, see https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html#amazon-effect

Text Tab

This text is what is displayed to the user as a chat bubble. The text entered in Chat Text in this tab can be different from the Say Text in the Voice tab.

Hover over an icon to display the tooltip.

Using the link icon, you can insert an image or a URL to display them in the chatbot.

Click on the link icon to open a pop window.

In the Link Info tab, fill in the details such as display text, Link type, URL, etc.

The Link Type has 3 options to choose from: URL, Link to anchor in the text, E-mail.

The protocol field dropdown has an option utterance (not a web protocol) along with different web protocols.

Using the Utterance protocol, you can define a hyperlink to trigger an utterance.

In the below example, we have used a link to trigger the “Hello world” utterance.

  1. Double click on the say node to edit it. Select the Text tab.

  2. Select a word in the Chat text field of the Text tab.

  3. Click on the link icon.

  4. Select utterance:// from the protocol dropdown.

  5. Give the utterance in the URL field.

In the chatbot, clicking on the link will return the utterance “Hello world“ and the corresponding intent will be triggered.

 

In the Target tab, you can choose how the link should be opened.

In the Upload tab, you can upload your image or video to your chat text.

Choose a file and click on the “Send it to the Server” button. This action will send the file to the server and the corresponding URL of the file will be changed in the Link info tab.

Insert Horizontal line (Line break)

You can display the text from the text bubble in separate bubbles using the horizontal line.

This text will be converted into two bubbles.

Image

You can use an image to view it in the chatbot. Click on the image button in the text tab for the same.

Please refer to this documentation on

If you already have an image uploaded in the database, you can browse the image or paste the corresponding URL in the URL field of the Image Info tab of Image Properties popup box.

Adjust image resolution to fit the chat bubble width

If you want to adjust the resolution of the image to fit within the chat bubble in the chatbot, you should place the below CSS code in the Bot View Template > Custom CSS.

Say node in Experience Designer
.endbubble.msg>p>img{ width: 100% !important; height: 100% !important; }
Flow studio
.msg>p>img { width: 100% !important; }

Screen Tab

You can add an image to show within the bot. The contents of this tab will be rendered by smart displays.

The screen tab is not available in the Prototype feature.

Buttons

You can add buttons after the chat bubble for the user to click or respond.

  • Value. The value of the button, when clicked, will be sent to the server.

  • Text. The display name of the button.

Related Articles