How can I control the way text is spoken?

This document describes the functions of the Orbita Speech Synthesis Markup Language (SSML) Editor and explains how you can use it to control the way text is spoken by voice agents like Amazon Alexa.

Source

In Source view, the HTML tag structure is shown at the bottom of the editor window.

You can view and edit the underlying HTML coding directly by clicking Source. For example:

<p>What is your favorite season?</p>

You can add, edit, or paste valid HTML code into the Source view.

Insert Audio

Insert Audio displays the Audio Properties dialog box from which you can add or specify an audio clip to use in the survey content. To upload an audio file:

  1. Click Choose File.

  2. Locate and select your audio clip. (MP3, WAV, or OGG are supported)

  3. Double click the file (or click open). The file appears next to the Choose File button.

  4. Click Send it to the Server. The URL is automatically filled in for you.

  5. Optionally click Preview to listen to the audio clip.

  6. Click OK to add the audio clip to your content. Click Cancel to exit without adding the audio clip.

To use the Audio Url field, see How do I link to an external audio file?

Break tag

As an audio voice reads the text of your control type, you may want to add pauses in speech to make the audio voice more natural, or to provide particular emphasis. Orbita synthesized voice adds pauses where punctuation exists but you can modify this further. To do this, click Break Tag to display its dialog box.

SayAs tag

Sometimes text can be read and said differently. For example:

The telephone number is 555-1212.

  • If you select Interpret As: > telephone, the audio says, “…five five five, one two one two.”

  • If you select Interpret As: > digits, the audio says, “…five hundred fifty-five, one thousand two hundred twelve.

The SayAs feature lets you customize natural language speech with the following properties: characters, cardinal, ordinal, digits, fraction, date, time, telephone, address, and interjection.

W tag

Sometimes a word has more than one pronunciation and more than one meaning. For example:

The leader led the lead balloon parade.

The audio voice defaults to saying “leed” balloon, but in this case, lead should sound like “led.” To get the speech correct, select the word and click W Tag.

In the W Tag dialog box, select the Role that stipulates the correct pronunciation of the word. Role options include Word as a verb, Word as a past tense, Word as a noun, and Non-default sense of the word. In this case, Non-default sense of the word changes the pronunciation.

Phonemic/Phonetic pronunciation

Some words can be pronounced differently. For example, You say tomato (toe-may-toe), I say tomato (tah-mah-toe). You can modify the pronunciation by selecting a consonant or vowel in a word and clicking the Phonemic/Phonetic pronunciation button.

See Speech Synthesis Markup Language (SSML) Reference > Phoneme > Supported Symbols (click supported symbols in the dialog box) for examples of consonant and vowel speech pronunciation.

If you wish to change the pronunciation of the word as a whole, change the source code of the Say node's Voice Tab to the below format.
<p><phoneme alphabet="ipa" ph="khoʊlmən">Coleman</phoneme></p>

The chatbot uses the pronunciation provided in the "ph" attribute rather than the text contained within the tag. However, you should still provide human-readable text within the tags.

In the above example, the chatbot displays the text within the tags(Coleman) and speaks the phonetic text provided in the ph attribute (khoʊlmən).

The list of supported symbols that are to be used in the ph attribute can found here. https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html#supported-symbols

Emphasize Tag

Highlight the text that you want to emphasize. You can select strong, moderate, or reduced emphasis.

Prosody Tag

Prosody Tag the volume, pitch, and rate of the tagged speech so that you can achieve the speech intonation or effects that you want. For example, enter a sentence and play the audio by clicking Voice Simulator. Then highlight a word in the sentence, and click Prosody Tag. Change the values for Rate, Volume, and Pitch, click OK, and click Voice Simulator again to hear the difference.

Sub Tag

Sub Tag you substitute a pronunciation for the text that might be read in another way. For example, mg should be pronounced milligrams. Other examples: lbs. (pounds), Mb (megabytes), and so on.

Amazon effect Tag

Amazon effect affects only Amazon devices and has no effect on non-Amazon devices. See https://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html for information about the types of enhancement effects that you can implement with Amazon devices.

Editor options (SCAYT)

The Spell check menu shows options for improving your content. SCAYT stands for Spell check as you type. If you enable this feature, you have other options.

  • Options. Enable or disable Ignore All-Caps Words, Ignore Domain Names, Ignore Word with Mixed Case, and Ignore Words with Numbers.

  • Languages. Select the default language for the spell checker.

  • Dictionaries. Create a custom dictionary in which you can include your organizational terms.

  • About SCAYT. Displays the version number of the spell checker.

  • Check Spelling. Invokes a separate utility for checking spelling, grammar, and Thesaurus terms for your content.

Voice Simulator

Click Voice Simulator to hear the content read with the modifications you may have attached.

Related Articles