.ultrageek. Computers, Music, and Thoughts

12Nov/190

How to Run Data Loader from the Command Line

There are several guides from Salesforce and other blogs on how to run Salesforce Data Loader from the command line. In addition you can script sfdx to do data loads as well but it seems that only supports bulk which for certain objects can be buggy so we might explore that later for this entry we’re going to stick with standard data loader.

Note that this is specific to the Microsoft Windows OS

This guide will break up the implementation in to 5 parts:

  1. Setting up the Environment
  2. Creating the Password
  3. Configuration Files
  4. Batch File
  5. Schedule Job

Virtual Machines

For reference Microsoft has several spots where you can download a limited use Microsoft Windows VM. This is the site I used: https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/ (Password: Passw0rd! )

On the mac I use VirtualBox: https://www.virtualbox.org/

Environment

Your environment will need to following tools installed:

Note: These instructions assumes that you know how to drop to a command prompt and that you can navigate and change directories.

  • Java JDK
    • Salesforce is recommending ‘Zulu Java JDK' over the standard Oracle Java JDK most likely because of changes Oracle made in 2019 to Java that basically every thing be licensed including ‘free’ downloads.
    • Download: https://www.azul.com/downloads/zulu-community/?&version=java-11-lts&os=&os=windows&architecture=x86-64-bit&package=jdk
      • Download Filters:
        • Java 11 (LTS) - LTS means Long Term Support
        • Windows Operating System
        • 64-bit…because seriously, no one runs 32-bit anymore
        • Package: JDK
      • Gotcha’s
        • The installation seems to installed the java bin directory into the ‘path’ but misses the ‘JAVA_HOME’ setting. Use this page to set it:
          • https://docs.azul.com/zulu/zuludocs/ZuluUserGuide/PostInstallationTasks/AddPathsToZulu_Windows.htm
          • Or basically right click “This PC” and Properties
          • Advanced System Settings
          • Advanced Tab → click “Environment Variables…” button
          • Under System Variables add the following:
            • Variable Name: JAVA_HOME
            • Variable Value: C:\Program Files\Zulu\zulu-11 - or where ever you installed Zulu. Under Zulu-11 you would find bin, etc.
          • Basically you should be able to do ‘java -version’ in the command line (cmd) and it return a value
  • Salesforce Data Loader
    • Log in to your instance, either production or sandbox and click setup
    • Search for ‘Data Loader’
    • Click ‘Download Data Loader for Windows’
    • It will be basically your instance plus + /dwnld/DataLoader/dataloader_win.zip
    • Unzip
    • Launch a Command Console (Run → CMD)
    • Navigate to your download directory.
      • Example C:\Users\<YOUR_USER>\Downloads> where your_user is your user directory.
      • CD changes directories so
        • CD \ - changes to the root directory
        • CD Users - changes to the C:\Users directory
        • Use tab to complete directory so that if you type ‘cd us’ and tab is till complete the command to ‘cd users’ or 'cd da' will fill in the ‘cd dataloader_win’ from the downloads folder when your dataload zip has been unzipped
    • Navigate to the C:\Users\<YOUR_USER>\Downloads\dataloader_win\dataloader_win and run ‘install.bat’. This will run your through the installation, take all the defaults.
  • CSVFIX

Bonus Items:

Passwords

Storing passwords in clear text is bad practice so the Data Loader installation comes complete with a utility that will allow you to encrypt your password so that it can be safely stored in a text file. Here are the steps:

  1. Create Key - You need to create a key file by doing the following:
    1. Drop to command line (run → cmd)
    2. Change to the following directory C:\Users\<YOUR_USER>\dataloader\v47.0.0\bin
      1. CD “C:\Users\IEUser\dataloader\v47.0.0\bin”
    3. run “encrypt.bat -k”
    4. This creates a file in the following directory:
      1. C:\Users\IEUser\.dataloader\dataLoader.key
      2. Save this path in notepad as you will need it again
  2. Encrypt Password - use the following command to create a password token
    1. From the same command line and directory run the following command
    2. encrypt -e password+token "C:\Users\IEUser\.dataloader\dataLoader.key"
      1. Take whatever you type in the saleforce data loader GUI and use that including the security token in the password if your instance requires it otherwise your password will not work.
    3. This will generate something that looks like this (sample output only):
      1. The output string of encryption is:2ba4b2db5ec9fb557c7bbd062bd60d4a87a3db45eb7a9236ab06bfcf9fb26455
    4. Copy that second long line of text to note pad as we will use that the configuration file

Configuration Files

  1. Create the mapping file using data loader. When you are in the field picker, set everything and then save it off in your data directory (C:\Project for this example).
  2. Copy Sample Configuration
    1. In the folder where Data Loader is installed there is a samples/conf directory. Within that directory is a file called ‘process-conf.xml’. Copy that file over to the ‘configs’ folder. Example path:
      1. C:\Users\IEUser\dataloader\v47.0.0\configs
  3. Open the process-conf.xml file from the configs folder (that you just copied) using notepad making sure to leave the original sample file intact in case you need to reference it.
    1. Right Click → Open With → Notepad
    2. 23123
    3. Update the following
      1. (line 1) Update the id - id=”somename” - I picked project based on the project but the sample will ‘accountMasterProcess’
      2. (line 4) Update the description or leave it, good documentation process to update it
      3. (line 5) Update the property name to match the id up top
      4. (line 9) Update to a valid path
      5. (line 10) Update the endpoint. Do not worry if you are using a custom domain, you just need to switch the host to either test or login depending on your environment
      6. (line 11) Update the sfdc.username to match the account username
      7. (line 14) Update the sfdc.password to the encrypted value you saved up top
      8. (line 15) Update the process.encryptionKeyFile to the location you saved up to
      9. (line 18) Update the key if you are doing an upsert
      10. (line 19) Update the object name
      11. (line 20) Update the operation type (insert/update/upsert)
      12. (line 21) Update the mapping file location
      13. (line 22) Update the csv file location
      14. (line 23) Update this to something else if not csv…
      15. You can delete line 24
<bean id="project"
          class="com.salesforce.dataloader.process.ProcessRunner"
          singleton="false">
        <description>Update this</description>
        <property name="name" value="project"/>
        <property name="configOverrideMap">
            <map>
                <entry key="sfdc.debugMessages" value="true"/>
                <entry key="sfdc.debugMessagesFile" value="c:\project\project_Trace.log"/>
                <entry key="sfdc.endpoint" value="https://test.salesforce.com"/>
                <entry key="sfdc.username" value="eric.clay+clientname@ultrageek.com.project"/>
                <!-- password below has been encrypted using key file, therefore it will not work without the key setting: process.encryptionKeyFile
                the password is not a valid encrypted value, please generate the real value using encrypt.bat utility -->
                <entry key="sfdc.password" value="ce5f00297a4069da07e519ac039d3ed65e0a61567a463560d21b4ddcb6f14623e1cb475eff119697b56197c945c4e214820c1f0ab5f09fea2b8eb1fb7b984761"/>
                <entry key="process.encryptionKeyFile" value="c:\Users\IEUser\.dataloader\dataLoader.key"/>
                <entry key="sfdc.timeoutSecs" value="600"/>
                <entry key="sfdc.loadBatchSize" value="200"/>
                <entry key="sfdc.externalIdField" value="Key__c"/>
                <entry key="sfdc.entity" value="Project_Policy__c"/>
                <entry key="process.operation" value="upsert"/>
                <entry key="process.mappingFile" value="c:\Project\mapping.sdl"/>
                <entry key="dataAccess.name" value="c:\Project\project.csv"/>
                <entry key="dataAccess.type" value="csvRead"/>
                <entry key="process.initialLastRunDate" value="2005-12-01T00:00:00.000-0800"/>
            </map>
        </property>
    </bean>
  1. You can keep the other values/definitions in the same files as we will be calling this by the id/name in the batch file
  2. Make sure to save your changes

Batch File

The batch file is an ETL file which stands extract, transform, and load. As such this batch file will be broken up in to 3 parts.

Data Management Notes

You will want to create a folder structure for this job off the root of C so for this example since this is just one job you can create just one folder like C:\Project to put the csv, batch file, and logs. If there are going to be multiple jobs then the recommendation would be to create C:\Data and then C:\Data\Project so that the next project can go under Data for easy backup and navigation.

Creating the Batch File

In the folder that will be housing your data files, C:\Project for this example, create a text document and name it process.bat or start_project_etl.bat or whatever name helps you remember what the purpose of the batch file is.

Constructing the Batch File

  1. Extract - Get the Files
    1. This will be either from FTP or UNC
    2. REM This file is used to refresh the project data
      CD\
      C:
      cd "C:\Project"
      REM ** Clean up files! **
      del SFDC*
      winscp /script=FTP.script
  2. Transform - Fix the Files
    1. I use CSVFIX for command line updates to the CSV files. I’m going to write a separate entry for how that works but for this project this is an example:
      1. The project data does not have a database record id or row id so we have to make one. The below is taking 3 fields and creates a ‘key’ field in order to ensure that the lines are unique for upsert.
      2. csvfix merge is the command
        1. -ifn strips the headers/first line of the csv
        2. -f 1,2,3 is specifying that the 5th, 12th, and 13th fields be used
        3. -s - is specifying that we use a dash to split those values up
        4. -k tells csvfix to keep the existing columns otherwise it removes them. Example: Good for if you were merging firstname and lastname in to one name column
        5. -p 1 tells csvfix to put the data in the first column otherwise it puts it as the last column
        6. -hdr tells csvfix to add the following header names to the first row of the CSV
        7. (line 9) test.csv is the source and SFDC_Upsert.csv is the output
      3. Code - I removed some of the fields to make this easier to read:
csvfix merge -ifn -f 1,2,3 -s - -k -p 1 -hdr "Key","ID #","Agency ID","Policy Number" test.csv > SFDC_Upsert.csv
  1. Load - Upload the Files
    1. Use the following command line
C:\Users\IEUser\dataloader\v47.0.0\bin>process.bat ../configs project

Create a batch file using notepad with those commands and save it. The entire file should look something like this:

REM This file is used to refresh the project data
CD\
C:
cd "C:\Project"
REM ** Clean up files! **
del SFDC*
REM FTP.script has the username/password and all the commands to pull the files
winscp /script=FTP.script
csvfix merge -ifn -f 1,2,3 -s - -k -p 1 -hdr "Key","ID #","Agency ID","Policy Number" test.csv > SFDC_Upsert.csv
REM Upsert in to Salesforce cd Users\IEUser\dataloader\v47.0.0\bin process.bat ../configs project

Run this from the command to test before scheduling.

Schedule Job

  1. Create Task (not basic task)
    1. General Tab
      1. Set Name - Make it count, you cannot rename tasks
      2. Set the running user if different than logged in user (service user)
      3. Run whether user is logged on or not
      4. Check: Run with highest privileges
    2. Triggers
      1. New
      2. Daily
      3. Pick Start Date/Time
      4. Leave the rest
    3. Actions
      1. New
      2. Start a program
      3. browse to the script
    4. Leave the rest
    5. Enter in Password to Save
    6. Right Click and Run to test
    7. Check Logs to see if it ran successfully
24Oct/191

Running CSVFIX on Mac

CSVFIX has been my go to for command line ETL but I have always run it on Windows. After running in to some issues with a CSV file loading in to Salesforce I needed something on the mac to 'peak' in to the file and do the usual amount of stuff that I'm used to doing with CSVFIX. A quick google search and I find some handy instructions but nothing is ever easy and because the internet sometimes comes and goes I'm putting those here with the changes I made to work on Mac (vs. Linux) and a couple of the apps I had to update or install.

Here's the list of commands:

  1. wget https://bitbucket.org/neilb/csvfix/get/version-1.6.zip
  2. unzip version-1.6.zip
  3. cd neilb-csvfix-e804a794d175
  4. make lin
  5. cd csvfix/bin
  6. sudo cp csvfix /usr/local/bin

Wget by default is not on the mac. I used "brew install wget" to install it. It is a handy package manager that is specifically for Mac: https://brew.sh/

On step #4 when you try and 'Make' or compile you might get this error:

  • xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

To fix this run: xcode-select --install and that installation will fix the error and allow you to compile/make csvfix.

The last thing that I changed from the original instructions is to update step #6 from /usr/bin to /usr/local/bin which is specific to mac (vs Linux).

Here is the link to the blog post that started this journey: Install csvfix on linux

 

Tagged as: , 1 Comment
19Jul/160

Quick Tip: Double Quoted CSV Data

Sometimes I get SAP data that is quoted with both single quotes and double quotes. The data looks something like this:

  • "'001'","'01'","'100123'","'#'","'#'","'#'","'100123'"

CSVFIX will strip out the extra single quotes with the following command:

  • csvfix read_dsv -s "," -csv Input_File1.csv > Output_File1.csv

Seems unneeded but otherwise your output is double quoted. I blame SAP.

The magic is the '-CSV' switch. Make this your first command and you're right as rain. I made it the second in a script I had that first stripped the first header row, see today's previous posting, and results were not as expected. Could be just a fluke as it should not have mattered.

BTW Excel calls this 'Text Qualifier" on Step 2 of the data import wizard, another way to fix or strip these out but then that eliminates automation which is the whole point of CSVFIX. Excel can do single and double but in the above example will read it correctly ad-hoc analysis and test ETL.