Orbita Glossary of Terms

Orbita Glossary of Terms

Last version: Mar 1, 2022

Shortcuts:

[ 1 A ] [ 2 B ] [ 3 C ] [ 4 D ] [ 5 E ] [ 6 F ] [ 7 G ] [ 8 H ] [ 9 I ] [ 10 K ] [ 11 L ] [ 12 M ] [ 13 N ] [ 14 O ] [ 15 P ] [ 16 R ] [ 17 S ] [ 18 T ] [ 19 U ] [ 20 V ] [ 21 W ]

A

Acoustic Model: a representation that maps “the relationship between an audio signal and the phonemes or other linguistic units that make up speech. The model is learned from a set of audio recordings and their corresponding transcripts. It is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word.”

Adaptive System: a system that adapts its behavior to changing parameters, such as the user’s identity, the time of day, day of week or month, the context of the interaction, etc.

Alexa / Alexa Service: A Cloud-based ‘intelligent personal assistant’ that processes your requests and supplies answers back to you on devices such as the Amazon Echo, Amazon Echo Dot, Echo Show, and Amazon Tap. You can give Alexa new abilities by creating your own cloud-based service that accepts requests from Alexa and returns responses. The service is also available to license by third-party hardware and software providers using the Alexa Voice Service (AVS).

Alexa App: Settings for Alexa to customize the user experience; enable and disable skills, display output from interactions with Alexa, and so on.

Alexa-enabled Device: A device that provides access to the Alexa service. Examples include Amazon Echo, Amazon Echo Dot, Amazon Tap, Echo Show, and devices that use the Alexa Voice Service.

Alexa Skill: See Skill.

Alexa Skills Kit (ASK): a Software Development Kit (SDK) that lets developers build and launch an Alexa skill. See also Skill.

Alexa Voice Service (AVS): a Software Development Kit (SDK) that lets hardware manufacturers and software developers integrate Alexa to their hardware/software. For example, a manufacturer of a Bluetooth speaker may add microphones to their speaker, use AVS, and then turn their once simple Bluetooth speaker into an Echo-like device.

Alexa Skills Store: The marketplace where users of Alexa-enabled products search for skills and enable/disable skills.

Always Listening Device: A device that is always listening to detect a “wake word.” When detected, the audio is captured after the wake word is sent for additional processing. See Wake word.

Application Programming Interface (API): It connects “bot logic“ to advanced capabilities, including channel, content, abilities, etc.

Artificial Intelligence: The study and design of a system that perceives its environment and is able to perform tasks that normally require human intelligence. These tasks include visual perception, speech recognition, decision-making, and translations between languages.

Ask: A keyword to ask Alexa to invoke a particular custom skill. This is used in combination with the invocation name for the skill. See also Tell.

ASR: Automatic Speech Recognition, or Automatic Speech Recognizer; software that maps audio input to a word or a language utterance.

ASR Tuning: The activity of iteratively configuring the ASR software to better map, both in accuracy and in speed, the audio input to a word or an utterance.

AWS: Amazon Web Service.

AWS Lambda: An AWS compute service that runs your code (see Lambda function) in response to events and automatically manages the compute resources for you without managing servers. Lambda is required for smart home skills. You can also choose to use a Lambda function for the service for a custom skill.

AWS Lambda function: The code uploaded to AWS Lambda. Lambda supports coding in Node.js, Java, Python, or C#. A smart home skill must be implemented as a Lambda function. You can also choose to use a Lambda function for the service for a custom skill. See AWS Lambda.

B

Barge-in: The ability of the user to interrupt system prompts while those prompts are being played. If barge-in is enabled in an application, then as soon as the user begins to speak, the system stops playing its prompt and begins processing the user’s input.

Bixby: Samsung’s voice assistant.

Bot Logic: The conversations and database that we build for the bot’s persona. This can be built using a platform or a framework.

C

Card: Displays information relating to the user’s request, such as displaying what the user asked and Alexa’s response, or a picture, or long numbers or lists, which can be difficult to process and remember when delivered through voice only. Also known as Home card or Detail card.

Channel: The medium where users can interact with the bot e.g Facebook Messenger.

Chatbot: A computer program designed to simulate conversation with human users, especially over the internet.

Cloud-based service: See Service.

Cloud-enabled device: In the context of smart home devices, a customer device such as a light bulb, switch, thermostat, or another smart home device with the ability to connect to the Internet. The device is normally controlled by the device cloud.

Companion app: See the Alexa app.

Compatibility Testing: Test the skill with different devices and browsers.

Confidence Score: A number (usually a fraction between 0.00 and 1.00 – e.g., 0.87) that is returned by the ASR and that reflects the confidence that the ASR has in the result provided. A 1.00 confidence means that the ASR is as certain as it can be that it has returned the correct result. A result with a confidence score of 0.91 is deemed more likely to be correct by the ASR than one with a score of 0.78.

Confidence Threshold: a number (usually a fraction between 0.00 and 1.00 – e.g., 0.87) that sets the mark below which ASR results are ignored. For instance: if the user were to say, “Austin,” and the recognizer were to return, “Austin” with a score of 0.92, “Boston” with 0.87, “Houston” with 0.65, “Aspen” with 0.52, and “Oslo” with 0.43, and if the threshold were set at 0.70, only the first two, “Austin” and “Boston” would be returned.

Confirmation: An Alexa response to make sure the user knows she understood correctly. Types of confirmation:

  • Implicit confirmation (also known as land-marking): a prompt that subtly repeats back what Alexa heard to give the user assurance that they were correctly understood. Example: User: Alexa, ask Astrology Daily for my horoscope. Astrology Daily: Horoscope for what sign?

  • Explicit confirmation: A prompt that repeats back what Alexa heard and explicitly asks the user to confirm whether she was correct. Example: User: Alexa, ask Astrology Daily for my horoscope. Astrology Daily: You wanted a horoscope from Astrology Daily, right?

Content Management System (CMS): This allows the users to create, and manage web content without much technical knowledge.

Conversational AI: The use of messaging apps, speech-based assistants, and chatbots to automate communication and create personalized customer experiences at scale.

Conversation: See Interaction.

Cooperative Principle: The proposition that listeners and speakers must act cooperatively and mutually accept one another to be understood in a particular way to carry out an effective verbal conversation.

Cortana: Microsoft’s voice assistant.

Custom interaction model: An interaction model that you define for a custom skill that consists of an intent schema that defines the requests the skill can handle and a set of sample utterances that users can say to invoke those requests.

Custom skill: A skill that uses a custom interaction model. You, as the developer, can define the requests your skill can handle (intents) and the words users say to make (or invoke) those requests (sample utterances). The mapping between the intents and sample utterances creates the interaction model or voice user interface for the skill. A complete custom skill includes the code hosted as a cloud-based service and a configuration that provides the information the Alexa service needs to route requests to the service. This is the most flexible kind of skill you can build, but also the most complex, as you must provide the voice interface yourself.

Custom Skills: A custom skill is a flexible, yet complex skill with a custom interaction model provided by developers. For custom skills, the developer should define three things: Intents: requests the skill can handle such as ordering food delivery. Interaction Model: the words users may say to make intents such as “Order spicy tuna roll.” Invocation Name: The name Alexa uses to identify the skill such as Food Delivery.

Customer Relationship Management (CRM): Using the customer’s data to improve customer relations.

D

Detail card: A card displayed in the Alexa app with information about the skill and how to use it. A user can review detail cards and enable the skills she or he wants.

Device cloud: Back-end cloud service that can control a cloud-enabled device. For a smart home skill, you write code hosted as a Lambda function that translates commands from the Alexa Smart Home Skill API to the device cloud.

Device cloud account: Unique customer account used to access the device cloud. The customer links the device cloud account with the Alexa service using the Alexa app. OAuth 2.0 is the preferred mechanism for linking.

Device directive: A set of data and instructions, expressed in JSON, sent from Alexa to a smart home or video skill.

Device discovery: Process by which the Alexa Smart Home Skill API or Video Skill API discovers the devices that can be controlled with a skill.

Device event: A response to a device directive, expressed in JSON, sent from a smart home or video skill to Alexa.

Device with Alexa: A device that provides access to the Alexa service. Examples include Amazon Echo, Amazon Echo Dot, Amazon Tap, and devices that use the Alexa Voice Service.

Dialog errors: When something unexpected happened in the conversation between Alexa and the user. Types of dialogue errors:

  • Low confidence errors: When Alexa has low confidence that she correctly understood what the user said. When this occurs, Alexa cannot proceed in the interaction without asking the question again or ending the interaction.

  • Timeouts/Silence/No input: When the user does not respond to a question Alexa asked. A re-prompt is usually played to encourage the user to respond.

  • False accept: When Alexa has mid-to-high confidence that she correctly understood what the user said, but she actually misunderstood.

Digital Asset Management (DAM): This allows the users to organize media assets such as images, videos, presentations.

Directed dialog: An interaction between the user and the system that is guided by the application: the system asks questions or offers options and the user responds to them.

Directive language: JSON protocol that enables the communication between the Alexa Smart Home Skill API and a smart home skill.

Directive: See Device directive.

Discovery: The process of learning what a system can do; inferring from a user’s request what capability to surface.

Disfluency: Verbal utterances such as “ah,” “hum,” and so on, exhibited by speakers when hesitating or when claiming retention of a speaking dialog turn.

Display template: Used for Echo Show to display a combination of text and images in a specified format (as cards in the Alexa app if the user does not have the Echo Show).

E

Earcon: The audio equivalent of an “icon” in graphical user interfaces. Earcons are used to signal conversation marks (e.g., when the system starts listening when the system stops listening) and to communicate the brand, mood, and emotion during a voice-first based interaction.

Echo (The Amazon Echo): A Far Field device released by Amazon. “Echo” has also come to represent the Amazon-branded category of devices (Echo Dot, Echo Tap, Echo Look, Echo Show) that interact with the Amazon Alexa cloud service.

Echo app: See the Alexa app.

Echo cancellation: A technique that filters out audio coming out of a device while processing incoming audio for speech recognition into that same device.

Endpoint: Represents a physical device, virtual device, group or cluster of devices or a software component. Used in smart home and video skills.

End-pointing: The marking of the start and the end of a speaker’s utterance for the purposes of ASR processing.

Example phrase: A phrase showing users what they need to say to begin using your custom skill. You enter these phrases on the Publishing Information section in the Amazon developer portal. The phrases must also be included in your list of sample utterances.

Exit command: When the user says a command like “exit” or “stop” to end the interaction.

Explicit confirmation: See Confirmation.

F

False Accept: An instance where the ASR accepted mistakenly an utterance as a valid response. See also Dialog Errors.

False Reject: An instance where the ASR mistakenly rejected an utterance as an invalid response.

Far-Field Speech Recognition: Speech recognition technology that is able to process speech spoken by a user from a distance (usually 10 feet away or more) to the receiving device, usually in a context where there is ambient noise.

Flash Briefing: Provides a quick overview of news and other content such as comedy, interviews, and lists. For a Flash Briefing Skill, the developer must define the name, description & images for a flash briefing skill and one or more content feeds for a flash briefing skill. The API defines the words users may use to make those requests such as “Tell me the weather.”

Flash Briefing Skills: Compatibility with Alexa’s native Flash Briefing ability. See Flash briefing.

Form Flow: Pre-structured conversations like a “choose your own adventure“.

Framework: A backend that connects the platform to the channel through an API.

G

Google Assistant: The cloud service provided by Google that powers Google’s Far-Field device (Google Home) as well as other Android-based devices (such as smartphones and tablets).

Google Home: Google’s Far-Field device (similar to the Amazon Echo).

Google Action: The equivalent of an Alexa Skill. Google also variably refers to Actions as “Agents,” “Assistants” and “Apps”.

Grammar: A shorthand, encoded description of the set of utterances that the ASR can accept.

Gricean Maxims: A set of specific rational principles observed by people who obey the Cooperative Principle that enable effective verbal conversational communication between humans. British philosopher of language, Paul Grice, proposed four conversational maxims: Quality, Quantity, Relevance, and Manner.

Gutenberg Parenthesis: The proposition that the last 500 years or so — the time between the invention of typeset printing, which ushered the era of the written word as the main mode of communicating knowledge, and the recent arrival of distributed social media — is a short parenthesis in a history of human communication that has relied on informal and decentralized communication, in oral form prior to Gutenberg, and currently via social media and orally.

H

Home card: An element displayed in the Alexa app to describe or enhance a voice interaction with a custom skill. Cards are useful when testing and debugging the Lambda function or web service for a skill.

Houndify: A platform by music identifier service SoundHound that lets developers integrate speech recognition and Natural Language Processing systems into hardware and other software systems.

I

Implicit confirmation: See Confirmation.

Intent schema: A JSON structure that declares the intents that can be handled by the service for a custom skill.

Intent: Determines what a user is trying to accomplish. Within the code, this is how you define your function. There are three types:

  • Full Intent: A spoken request in which the user expresses everything that is required to complete their request, all at once, such as “Alexa ask Elle.com for today’s horoscope for Virgo.”

  • Partial Intent: A spoken request in which the user expresses just partial information of what is required to complete their request such as “Alexa ask Elle.com for the horoscope.”

  • No Intent: A spoken request with minimal information such as “Alexa talk to Elle Magazine.”

Interaction: An exchange of dialog between the user and Alexa. This may be a single request-response or a more extended set of turns.

Interaction Flow: A flowchart used to visualize the steps taken by a user as they move through a conversational experience.

Interaction Model: An abstract definition of the interaction that will take place between your API and the applications that use it (similar to a graphical user interface in a traditional app). Instead of pressing buttons, users make requests by voice. For a custom skill, you define the interaction model by creating an intent schema and set of sample utterances. For a smart home skill, this is defined by the Smart Home Skill API.

Interruptions: When the interaction between Alexa and the user is interrupted by another event such as alarms and timers going off while the user is talking to Alexa.

Invocation: The act of beginning an interaction with a particular Alexa ability.

Invocation Name: The word or phrase used to trigger your skill. For example: “Alexa, ask History Buff what happened on June third” to open start the History Buff skill. Note: invocation names are only needed for Custom Skills.

Invoke: A hardware Echo-like device manufactured by Harman Kardon that enables users to engage Cortana in Far-Field conversations.

K

Keyword: simple marker, used to trigger conversation.

L

Lambda blueprint: An option in the AWS Lambda console that provides sample code and a sample configuration for a new Lambda function. Use this to create Lambda functions with just a few clicks.

Lambda function: See AWS Lambda function.

Landmark (also known as implicit confirmation): A prompt that subtly repeats back what Alexa heard to give the user assurance that they were correctly understood.

Low confidence errors: See Dialog Errors.

M

Machine Learning (ML): A broad set of AI in which computer algorithms are developed and trained to learn inherent patterns in datasets. The resulting algorithms are often used in a predictive fashion (e.g. given some input predict what the output should be), to automate human behavior and decision-making.

Max error condition: When consecutive dialog errors occur. This terminates the interaction and is designed to keep Alexa from making the same mistake repeatedly.

Mixed-initiative Dialog: Interactions where the user may unilaterally issue a request rather than simply provide exactly the information asked for by system prompts. For instance, while making a flight reservation, the system may ask the user, “What day are you planning to flight out?” Instead of answering that question, the user may say, “I’m flying to Denver, Colorado.” A Mixed-initiative system would recognize that the user has provided not the exact answer to the question asked, but also (additive), or instead (substitutive), volunteered information, that was going to be requested by the system later on. Such a system would accept this information, remember it, and continue the conversation. In contrast, a Directed Dialog system would rigidly insist on the departure date and won’t proceed successfully unless it received that piece of information.

N

Natural Language Processing (NLP): Technology that extracts the “meaning” of a user’s utterance or typed text. A meaning usually consists of an Intent and Name-Value pairs. The utterance, “I want to book a flight from Washington, DC to Boston,” has the Intent “Book-a-Flight” with the Name-Value pairs being, “Departure City”=”Washington, DC” and “Arrival City”=”Boston, MA”.

Natural Language Understanding (NLU): is a subset of NLP, which uses syntactic and semantic analysis of text to speech to determine the meaning of the sentence.

N-Best: In speech recognition, given an audio input, an ASR returns a list of results, with each result ascribed a confidence score (usually a fraction between 0 and 1 (e.g., “0.87”) or a percentage). N-Best refers to the “N” results that were returned by the ASR and that were above the “confidence threshold”. For instance, if the user were to say, “Austin,” and the recognizer were to return, “Austin” with a score of 0.92, “Boston” with 0.87, “Houston” with 0.65, “Aspen” with 0.52, and “Oslo” with 0.43, and the threshold were set at 0.70, only the first two, “Austin” and “Boston” would be returned.

Near Field Speech Recognition: In contrast to “Far Field” speech recognition, which processes speech spoken by a human to a device from a distance (usually of 10 feet or more), the Near Field speech recognition technology is used for handing spoken input from hand-held mobile devices (such as Siri on the iPhone) that are used within inches or two feet away at best.

Network Service (SaaS): Cloud-enabled service that delivers skills and takes requests from Alexa with an intent to give responses from text and speak back to the user.

No-input Error: A situation where the system did not detect any speech input from the user.

No-match Error: A situation where the system was not able to match the user’s response to the responses that it expected the user to provide.

Notification: When the user takes an action that requires Alexa to inform them at a later time that an event is occurring or about to occur, such as alarms and timers.

O

Out of Scope (OOS) Error: See No-match Error.

P

Persona: The personality of the system (formal, playful, chatty, aggressive, friendly, etc.) that comes across the way the system engages with the user. The persona is influenced by factors such as the perceived gender of the system, the type of language the system uses, and how the system handles errors.

Phrases: A list of randomly selected responses that are spoken by the device. For example, “OK. I’ll be glad to help.” “Sure thing! I’ll get right on it.” Similar to slots, which are spoken by the user.

Platform: A graphical interface to construct and manage conversations.

Progressive Prompting: The technique of beginning an exchange by providing the user with minimal instructions and elaborating on those instructions only if encountering response errors (e.g., no-input, no-match, and so on).

  • Prompt: Words that should be spoken to the user to ask for more information. You include the prompt text in your response to a user’s request. Types of prompts:

  • Open-ended: A prompt that asks the user a question intended to elicit a wide range of responses such as “What would you like to do?”

  • Menu-style: A prompt that asks the user a question intended to elicit a response from a small set of possible options (recommended 5 or fewer) such as “Minecraft Helper. You can ask for a recipe, the ingredients of a potion, or game instructions. Now, which would you like?”

  • Re-prompt: A prompt that asks the user a question after a dialog error has occurred. The general purpose of a re-prompt is to help the user recover from errors. For example:

    User: Alexa, open Score Keeper.
    Score Keeper: Score Keeper. What’s your update?
    User:(no response)
    Score Keeper: You can add points for a player, ask for the current score, or start a new game. To hear a list of everything you can do, say Help. Now, what would you like to do?

  • Landmark: (Also known as implicit confirmation.) A prompt that subtly repeats back what Alexa heard to give the user assurance that they were correctly understood.

Pull: response to a user.

Push: conversation started by the bot, can be structured as either a broadcast or a sequence.

R

Recognition Tuning: The activity of configuring the ASR’s settings to optimize recognition accuracy and processing speed.

Regression Testing: Test to ensure that the old code works when new code is added.

Re-prompt: A question prompt that occurs after a virtual assistant has answered a user question. For example, “Do you have any more questions?“. See also Prompt.

S

Sample utterance: A structured string of words that connects a specific intent to a likely utterance. You provide a set of sample utterances as part of your interaction model for a custom skill. When users say one of these utterances, the Alexa service sends a request to your service that includes the corresponding intent. Note: You only provide sample utterances for custom skills. Utterances for smart home skills are defined by the Smart Home Skill API.

Second Orality: Orality that is dependent on literate culture and the existence of writing, such as a television anchor reading the news or radio. While it exists in sound, it does not have the features of primary orality because it presumes and rests upon literate thought and expression, and may even be people reading written material.

Secure File Transfer Protocol (SFTP): A secure location on a server where parties with access can safely pass files back and forth. Used for files with PHI.

Self-Service API (Application Program Interface): A software platform through which developers organize access to data and business processes, and to enable web applications to interact with other applications. Self-service APIs let developers access a large range of features so they can evolve and customize their projects.

Service: A cloud-based service you create to support a skill. This service takes requests from Alexa and returns responses. For a custom skill, the service accepts requests with intents and returns responses with the text to speak back to the user. For a smart home skill, the service takes device directives, communicates with the device cloud to control devices such as lights and thermostats, and sends device events back to Alexa. You can deploy the service for a custom skill either as an AWS Lambda function or a web service. Smart home skills can only be hosted using Lambda.

Siri: A voice-based assistant launched by Apple.

Skill: Code for Alexa (in the form of a cloud-based service) and the configuration provided on the Amazon developer portal to accomplish a task. (On Google Home, a skill is called an action.) See also Custom Skill, Flash-briefing skills, and Smart Home Skill.

Skill Adapter: The code that makes the skill respond to a particular directive such as decreasing the temperature when the user requests “decrease the temperature by 3 degrees.”

Slot: A slot represents a possible utterance spoken by the user, which relates to an intent to understand the request. For example, “Alexa: What is the [weather] in [Boston]?”The weather in {CityName} is {getCityWeather}”. Amazon has built-in slot types, such as dates, numbers, durations, time, and so on. You can create a custom slot for a list of utterances that are specific to your skill.

Slot type: Determines how the user input is handled and passed on to your skill. You can assign slot types from the detail page for intent or the slot detail page for the slot.

Small Talk: A base set of questions that someone may ask specifically to the virtual assistant such as “What is your name?“, “Who created you?“, and so on.

Smart Home skill: A skill intended to control smart home devices such as lights and thermostats. When using the Smart Home Skill API, the API defines the requests the skill can handle (device directives) and the words the users say to make those requests. A complete smart home skill includes the code hosted as an AWS Lambda function and a configuration that provides the information the Alexa service needs to route requests to the Lambda function. The code for the skill must be able to control the device (such as a light) using the cloud.

Smart Home Skill API: An API to create skills that give Alexa the ability to control smart home devices such as lights and switches. The Smart Home Skill API translates utterances such as “turn on the lights” into device directives that it routes to a lambda function that can control a cloud-enabled device.

Speech Input Data: The set of mapped values added for any custom slots supported by the developer’s skill and a list of sample utterances or common statements that invoke the intents.

Speech Recognition Application: Enable a device/computer to convert spoken words into written text to find the best matching word sequence. A Speech-To-Text (STT) engine lets you dictate a message and your device will send it as text. A Text-To-Speech (TTS) engine will reproduce the sound of the written words.

Speech Recognizer: See ASR.

Speech To Text (STT): Software that converts an audio signal to words (text). “Speech to Text” is a term that is less frequently used in the industry than “Speech Recognition,” “Speech Reco,” or “ASR.”

System errors: When something unexpected happened, unrelated to the dialog between the user and Alexa such as the call for a data service used to get the information the user requested failed to send Alexa that information.

T

Tapered Prompting: the technique of eliding a prompt or a piece of a prompt in the context of a multi-step interaction or a multi-part system response. Instead of the system asking repetitively, “What is your level of satisfaction with our service?” “What is your level of satisfaction with our pricing?” “What is your level of satisfaction with our cleanliness,” the system would ask: “What is your level of satisfaction with our service?” “How about our pricing?” “And Cleanliness?” The technique is used to provide a more natural and less robotic-sounding user experience.

Taxonomy: Library of all intents and entities for a particular application of conversational AI, structured for a medical context.

Tell: A keyword a user can say to tell Alexa to invoke a particular custom skill. See also Ask.

Template: See the Display template.

Text to Speech (TTS): Technology that converts text to audio that is spoken by the system. TTS is usually used in the context of dynamically retrieved information (a product ID), or when the list of possible items to be spoken by the system (such as full addresses) is very large, and therefore, recording all of the options is not practical.

Touch interaction: A touch on Echo Show that produces a specified response, such as touching an item in a list on the screen to see more information about the item.

Turn: A conversational turn is a single request to or response from Alexa. Sometimes shorthand for only the request side of a conversation, so “Alexa, Open Horoscope”, “What horoscope sign would you like?”, “Pisces”, “Today’s horoscope for Pisces is …” might be referred to as a two-turn interaction, rather than the 4 turns that it technically contained.

U

Utterance: The words the user says to Alexa to convey what they want to do or to provide a response to a question Alexa asks. For custom skills, you provide a set of sample utterances (see slots) mapped to intents as part of your custom interaction model. For smart home skills, the Smart Home Skill API provides a predefined set of utterances.

V

Voice Biometrics: Technology that identifies specific markers within a given piece of audio that was spoken by a human being and uses those markers to uniquely model the speaker’s voice. The technology is the voice equivalent of technology that takes a visual fingerprint of a person and associates that unique fingerprint with the person’s identity. Voice Biometrics technology is used for both Voice Identification and Voice Verification.

Voice First: A primary interface between the user and an automated system that is voice-based. “Voice First” does not necessarily mean “Voice Only”. A Voice-First interface can have an additional, adjunct interface (usually a visual one) that can supplement the experience.

Voice Identification (Voice ID): The capability of discriminating a speaker’s identity among a list of possible speaker identities based on the characteristics of the speaker’s voice input. Voice ID systems are usually trained by being provided with samples of speaker voices.

Voice Interface or Voice User Interface (VUI): A way for humans to interact with computers using primarily voice communication. For a custom skill, the voice interface consists of a mapping between users’ spoken utterances and the intents your cloud-based service can handle.

Voice Verification: The capability of confirming an identity claim based on a speaker’s voice input. Unlike Voice Identification, which attempts to match a given speaker’s voice input against a universe of speaker voices, Voice Verification compares a voice input against a given speaker’s voice and provides a likelihood match score. Voice Verifications are usually done in an “Identity Claim” setting: the user claims to be someone and then is “challenged” to verify their identity by speaking.

W

Wake Word: Spoken keywords that activate an always-listening device. Amazon offers a choice of Alexa, Amazon, Echo, or Computer for as word to activate a device. A device is always listening for a wake word.

Webchat: An unbranded channel that we can customize for websites, portals, etc.

Web service: In the context of the Alexa Skills Kit, an Internet-accessible service that can accept requests from the Alexa service and return responses. You can use a web service as a cloud-based service for a custom skill.

Widget: The device that “holds“ webchat on the website.