Automating Salesforce Data Imports with Powershell and the Salesforce CLI

Published on 05 June 2023

Some years ago I wrote about using Powershell to automate the Salesforce Data Loader, the idea being that sometimes you dont want to use an ETL tool (think graphical interface with boxes and arrows and properties dialogs - like Talend or Azure Data Factory) to load data into Salesforce. Sometimes you want to do it with some code or scripts, where you can have low level control of all the details, and maybe track the changes in git.

This post is an update on that idea. I wouldn't bother with automating the Java-based Data Loader any more (it always was a bit fiddly). These days you're better off using the Salesforce CLI, which has easy to use commands that hook directly into Salesforce's Bulk API.

And it works pretty nicely with Powershell. This post sets up an example Powershell script which pulls some data out of Salesforce with an SOQL query, modifies it, and then upserts it back into Salesforce with the Bulk API.

Obviously for this you will need the Salesforce CLI installed.

Querying data

In Powershell you can use the here-string syntax @" "@ to define a multi-line SOQL statement:

$soql = @"
    select id, FirstName, LastName
    from Contact
    where FirstName = 'Johnny'
"@

This means you can keep it tidy and not have to squash it all onto one line.

The command to run an SOQL query is "sf data query" and it does expect the SOQL to all be on one line so then you can do a quick replace to get rid of the line breaks:

$flatsoql = $soql.Replace("`n"," ")

Whats interesting with sf data query is that its very easy to get the results of the query as a stream of objects that Powershell can then work on directly:

$contacts = sf data query --query $flatsoql --target-org $targetUser --result-format csv | ConvertFrom-CSV

The trick is to call sf data query with --result-format csv which gives you a CSV separated string for each row of data. Then you pipe that straight to the ConvertFrom-CSV cmdlet and you have a nice tidy array of objects that you can then use with Powershell cmdlets like Where-Object, Sort-Object, Select-Object etc.

Uploading data

Once you've modified your data you need to get it into a CSV file which is the Bulk APIs preferred format. That's easy:

$csvfile = "c:\Temp\test.csv"
$contacts | Export-CSV $csvfile

Note that for the Bulk API, the column headers in the CSV should match the API name of the field - in our example that just happens anyway because the data came out of Salesforce with the right headers. If you are referencing an External ID on a related object, the column should be called "RelationshipName.ExternalAPIName", e.g. "Relationship__r.Field_Name__c". Dates should be formatted as yyyy-MM-dd (see here). If you need to rename columns, you can rename them in Powershell before exporting to CSV by using Select-Object with Calculated Properties

Then the CSV can be uploaded via the Bulk API like this:

sf force data bulk upsert --sobject Contact --file $csvfile --external-id Id --target-org $targetUser

Note that 'upsert' is kindof the only option, but you can make it into an update or insert by specifying --external-id Id and then controlling whats in the Id column

Note also that sf force data bulk upsert uses Bulk API 1, if you want to use Bulk API 2 you can use similar sounding but slightly different sf data upsert bulk.

Salesforce CLI has commands that will show you the current status of your Bulk API job and batches, but it doesn't currently have a way of telling you which rows succeeded and which rows failed (and why they failed). This is a strange ommission because finding out what worked and what didn't is usually pretty vital in real-world usage of the Bulk API. So you have to get that info by logging into Salesforce and looking at the Bulk Data Load Jobs page, or poking around with the endpoints of the Bulk API directly.

Logging on

There are lots of ways to authenticate with Salesforce CLI but I do it using OAuth like this:

$targetUser = "yourusername@whatever.sandbox"
Write-Output "Please log in as $targetUser :"
sf org login web --instance-url https://test.salesforce.com

The Write-Output line is there to remind me which account I'm supposed to log in as, because sf org login web just takes you to a blank login page. The $targetUser variable is then used throughout the script with the other Salesforce CLI commands. If you are logging into live instead of a sandbox you can get rid of the --instance-url parameter.

Note that once you've logged in like this, Salesforce CLI gets an access token that lasts a long time, typically several weeks. You probably wont need to log in again for ages, so you can comment out the sf org login web command. To see which instances you've currently got access tokens for, use sf org list. To log out of them (aka clear the access token) use sf org logout --target-org (username)

Putting it all together

Here's the whole script - it queries the contacts with first name 'John', then adds 'Test' to FirstName and LastName and then pushes the data back in:

# Powershell and Salesforce CLI example
# NB: Install Salesforce CLI first 

$targetUser = "yourusername@whatever.sandbox"

Write-Output "Please log in as $targetUser :"
sf org login web --instance-url https://test.salesforce.com

$soql = @"
    select id, FirstName, LastName
    from Contact
    where FirstName = 'John'
"@
$flatsoql = $soql.Replace("`n"," ")
$contacts = sf data query --query $flatsoql --target-org $targetUser --result-format csv | ConvertFrom-CSV

$contacts | Format-Table

foreach($c in $contacts)
{
    $c.FirstName = "$($c.FirstName)Test"
    $c.LastName = "$($c.LastName)Test"
}

$csvfile = "c:\Temp\test.csv"
$contacts | Export-CSV $csvfile
sf force data bulk upsert --sobject Contact --file $csvfile --external-id Id --target-org $targetUser