Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel2
typeflat
separatorpipe

A

Acoustic Model: a representation that maps “the relationship between an audio signal and the phonemes or other linguistic units that make up speech. The model is learned from a set of audio recordings and their corresponding transcripts. It is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word.”

...

AWS Lambda function: The code uploaded to AWS Lambda. Lambda supports coding in Node.js, Java, Python, or C#. A smart home skill must be implemented as a Lambda function. You can also choose to use a Lambda function for the service for a custom skill. See AWS Lambda.

B

Barge-in: The ability of the user to interrupt system prompts while those prompts are being played. If barge-in is enabled in an application, then as soon as the user begins to speak, the system stops playing its prompt and begins processing the user’s input.

...

Bot Logic: The conversations and database that we build for the bot’s persona. This can be built using a platform or a framework.

C

Card: Displays information relating to the user’s request, such as displaying what the user asked and Alexa’s response, or a picture, or long numbers or lists, which can be difficult to process and remember when delivered through voice only. Also known as Home card or Detail card.

...

Customer Relationship Management (CRM): Using the customer’s data to improve customer relations.

D

Detail card: A card displayed in the Alexa app with information about the skill and how to use it. A user can review detail cards and enable the skills she or he wants.

...

Display template: Used for Echo Show to display a combination of text and images in a specified format (as cards in the Alexa app if the user does not have the Echo Show).

E

Earcon: The audio equivalent of an “icon” in graphical user interfaces. Earcons are used to signal conversation marks (e.g., when the system starts listening when the system stops listening) and to communicate the brand, mood, and emotion during a voice-first based interaction.

...

Explicit confirmation: See Confirmation.

F

False Accept: An instance where the ASR accepted mistakenly an utterance as a valid response. See also Dialog Errors.

...

Framework: A backend that connects the platform to the channel through an API.

G

Google Assistant: The cloud service provided by Google that powers Google’s Far-Field device (Google Home) as well as other Android-based devices (such as smartphones and tablets).

...

Gutenberg Parenthesis: The proposition that the last 500 years or so — the time between the invention of typeset printing, which ushered the era of the written word as the main mode of communicating knowledge, and the recent arrival of distributed social media — is a short parenthesis in a history of human communication that has relied on informal and decentralized communication, in oral form prior to Gutenberg, and currently via social media and orally.

H

Home card: An element displayed in the Alexa app to describe or enhance a voice interaction with a custom skill. Cards are useful when testing and debugging the Lambda function or web service for a skill.

Houndify: A platform by music identifier service SoundHound that lets developers integrate speech recognition and Natural Language Processing systems into hardware and other software systems.

I

Implicit confirmation: See Confirmation.

...

Invoke: A hardware Echo-like device manufactured by Harman Kardon that enables users to engage Cortana in Far-Field conversations.

K

Keyword: simple marker, used to trigger conversation.

L

Lambda blueprint: An option in the AWS Lambda console that provides sample code and a sample configuration for a new Lambda function. Use this to create Lambda functions with just a few clicks.

...

Low confidence errors: See Dialog Errors.

M

Machine Learning (ML): A broad set of AI in which computer algorithms are developed and trained to learn inherent patterns in datasets. The resulting algorithms are often used in a predictive fashion (e.g. given some input predict what the output should be), to automate human behavior and decision-making.

...

Mixed-initiative Dialog: Interactions where the user may unilaterally issue a request rather than simply provide exactly the information asked for by system prompts. For instance, while making a flight reservation, the system may ask the user, “What day are you planning to flight out?” Instead of answering that question, the user may say, “I’m flying to Denver, Colorado.” A Mixed-initiative system would recognize that the user has provided not the exact answer to the question asked, but also (additive), or instead (substitutive), volunteered information, that was going to be requested by the system later on. Such a system would accept this information, remember it, and continue the conversation. In contrast, a Directed Dialog system would rigidly insist on the departure date and won’t proceed successfully unless it received that piece of information.

N

Natural Language Processing (NLP): Technology that extracts the “meaning” of a user’s utterance or typed text. A meaning usually consists of an Intent and Name-Value pairs. The utterance, “I want to book a flight from Washington, DC to Boston,” has the Intent “Book-a-Flight” with the Name-Value pairs being, “Departure City”=”Washington, DC” and “Arrival City”=”Boston, MA”.

...

Notification: When the user takes an action that requires Alexa to inform them at a later time that an event is occurring or about to occur, such as alarms and timers.

O

Out of Scope (OOS) Error: See No-match Error.

P

Persona: The personality of the system (formal, playful, chatty, aggressive, friendly, etc.) that comes across the way the system engages with the user. The persona is influenced by factors such as the perceived gender of the system, the type of language the system uses, and how the system handles errors.

...

Push: conversation started by the bot, can be structured as either a broadcast or a sequence.

R

Recognition Tuning: The activity of configuring the ASR’s settings to optimize recognition accuracy and processing speed.

...

Re-prompt: A question prompt that occurs after a virtual assistant has answered a user question. For example, “Do you have any more questions?“. See also Prompt.

S

Sample utterance: A structured string of words that connects a specific intent to a likely utterance. You provide a set of sample utterances as part of your interaction model for a custom skill. When users say one of these utterances, the Alexa service sends a request to your service that includes the corresponding intent. Note: You only provide sample utterances for custom skills. Utterances for smart home skills are defined by the Smart Home Skill API.

...

System errors: When something unexpected happened, unrelated to the dialog between the user and Alexa such as the call for a data service used to get the information the user requested failed to send Alexa that information.

T

Tapered Prompting: the technique of eliding a prompt or a piece of a prompt in the context of a multi-step interaction or a multi-part system response. Instead of the system asking repetitively, “What is your level of satisfaction with our service?” “What is your level of satisfaction with our pricing?” “What is your level of satisfaction with our cleanliness,” the system would ask: “What is your level of satisfaction with our service?” “How about our pricing?” “And Cleanliness?” The technique is used to provide a more natural and less robotic-sounding user experience.

...

Turn: A conversational turn is a single request to or response from Alexa. Sometimes shorthand for only the request side of a conversation, so “Alexa, Open Horoscope”, “What horoscope sign would you like?”, “Pisces”, “Today’s horoscope for Pisces is …” might be referred to as a two-turn interaction, rather than the 4 turns that it technically contained.

U

Utterance: The words the user says to Alexa to convey what they want to do or to provide a response to a question Alexa asks. For custom skills, you provide a set of sample utterances (see slots) mapped to intents as part of your custom interaction model. For smart home skills, the Smart Home Skill API provides a predefined set of utterances.

V

Voice Biometrics: Technology that identifies specific markers within a given piece of audio that was spoken by a human being and uses those markers to uniquely model the speaker’s voice. The technology is the voice equivalent of technology that takes a visual fingerprint of a person and associates that unique fingerprint with the person’s identity. Voice Biometrics technology is used for both Voice Identification and Voice Verification.

...

Voice Verification: The capability of confirming an identity claim based on a speaker’s voice input. Unlike Voice Identification, which attempts to match a given speaker’s voice input against a universe of speaker voices, Voice Verification compares a voice input against a given speaker’s voice and provides a likelihood match score. Voice Verifications are usually done in an “Identity Claim” setting: the user claims to be someone and then is “challenged” to verify their identity by speaking.

W

Wake Word: Spoken keywords that activate an always-listening device. Amazon offers a choice of Alexa, Amazon, Echo, or Computer for as word to activate a device. A device is always listening for a wake word.

...