CSV Ingestion Lambda

1 Overview
2 Lambda Event Properties
3 Example Event
4 environments.json
5 Scheduling Ingestions

Overview

csv-ingestion is a generalized Lambda for ingesting content (typically user records) from a CSV file located in an S3 bucket on the Orbita account into an Orbita project using a particular content schema.

Lambda Event Properties

description (string, optional)

A description of the particular ingestion process (ex. “CVS Pharmacy Virtual Assistant - Lisinopril - Production”) which is used in notifications if specified. If omitted but notifications are enabled, the environment name will be used.

notify (array of emails, optional)

Email addresses to send a notification to when the process runs.

environment (string, required)

The name of the environment as specified in environments.json. New environments will need their connection information added to environments.json.

projectID (string, required)

The ID of the project in Orbita on the target environment that the content will be ingested in to.

s3Bucket (string, required)

The path to the S3 bucket that the file will be read from.

filename (string|object, required)

There are two potential formats for the filename parameter:

A string containing the exact filename or file path to look for in the specified S3 bucket, which must have an extension of .csv (ex. “test.csv” or “inbound/test.csv”)
An object with the following two properties, that will result in a formatted CSV filename or path with today’s date (based on Eastern Time):
- prefix (string, required) - The portion before the date stamp (ex. “patients-” or “inbound/patients-”)
- dateFormat (string, required) - The moment.js date format to use to generate the date stamp (ex. “YYYYMMDD”)

As an example of the latter format, the following filename parameter, if run on October 27th, 2021 (in Eastern Time) would resolve to a target file path of “inbound/patients-20211027.csv“:

"filename": {
  "prefix": "inbound/patients-",
  "dateFormat": "YYYYMMDD"
}

encoding (string, optional)

The encoding of the file to be ingested. Acceptable values are:

“utf8” (Default)
“utf16le”

schema (object, required)

An object indicating the target Orbita content schema, with the following properties:

key (string, required) - The key of the schema, which can be found by looking at the schema details in Orbita:
type (string, required) - The type of the schema, either “content” or “dynamic”

triggerIndex (boolean, optional)

Indicates whether or not a request should be made after ingestion to trigger an Elastic search index of the primary schema.

nullValues (array of strings, optional)

An array of string values that should be considered a null value when evaluating required fields.

mapping (array of field mappings, required)

An array of mappings of fields in the ingested CSV file to fields in the Orbita schema, using the following properties:

from (string, required) - The column header in the CSV file.
to (string|object, required) - The field to map to in the Orbita schema, with two potential formats:
- A string containing the field name (ex. “firstName”), in which case the value will be treated as a string.
- An object with the following properties:
  - field - The schema field name (ex. “phoneNumber”).
  - type - The type to convert the value to (ex. “phone”). Valid options at the moment are string, boolean, date, Date, time, number, and phone. Specifying an unrecognized type will result in a default treatment of the value as a string.

The field to map to in the Orbita schema can be a property path using dot syntax. The from value can also be a property path, though this is only of use when mapping secondary records.

required (boolean, optional) - Whether or not a value is required for this field for the CSV record to be considered valid. Invalid entries will be ignored and not ingested.
acceptableValues (array of strings, optional) - An array of acceptable values for this field in the CSV input. If the value for a record does not match one of the acceptable values, the record will be considered invalid and ignored.

Below are some examples of the different mapping formats described above:

{
  "from": "PTNT_LAST_NM",
  "to": "lastName",
  "required": true
},
{
  "from": "PTNT_DT_OF_BRTH",
  "to": {
    "field": "dateOfBirth",
    "type": "date"
  },
  "required": true
},
{
  "from": "Opportunity Type",
  "to": "drug",
  "required": true,
  "acceptableValues": [
    "Lisinopril"
  ]
}

staticFields (array of static field initializations, optional)

An array of objects indicating that certain fields in the Orbita schema should always be initialized to a particular value, using the following properties:

field (string, required) - The schema field name (ex. “campaignStatus”). This can be a property path using dot syntax.
value (any, required) - The value to initialize the field to (ex. “new”). Can be any type.

Below are some examples:

"mapping": [{
  "field": "campaignStatus",
  "value": "new"
},
{
  "field": "hasEngaged",
  "value": false
},
{
  "field": "channel",
  "value": null
}]

secondaryRecords (array of secondary record definitions, optional)

An array of definitions for secondary records to be created, containing the schema, mappings from the primary record, and any static field initializations:

schema (schema, required) - The schema for the secondary record
mapping (mapping, required) - The field mappings for the secondary record.

This mapping is applied to the primary record (i.e. the record created by the ingestion process). To map a field from the original ingested data, prepend the from value with “$row.” and use the column name from the ingested sheet.

staticFields (staticFields, optional) - Any static field initializations for the secondary record.

Below is an example:

Example Event

Here is a complete example of the JSON for an event that could be used to run the csv-ingestion Lambda:

environments.json

environment.json is a configuration file in the Lambda source that contains the connection information for various environments.

host and serviceAccount are only needed if triggerIndex is set to true

An example of an entry:

Scheduling Ingestions

To have an ingestion run on a scheduled basis, put in a request to dev ops to create a new scheduled CloudWatch event (on whatever schedule is necessary) to trigger the csv-ingestion Lambda, and provide them an event object as outlined in Lambda Event Properties and Example Event that the CloudWatch event will pass when invoking the Lambda.

Orbita Help Center