Manipulating Dataload Files with Powershell

If you have a fancy Mac…then you have all kinds of linux/unix tools and utilities built right in to the OS or just a click a way that you can use. Must be nice to have some standard stuff that has been used for decades with lots of documentation. For the rest of us slumming on Microsoft Windows our choices are plentiful but if manipulating data needs to be outside of Excel and automated and it isn’t something you do a lot then finding the right set of ‘free’, reliable, or inexpensive utilities can be a challenge. Believe it or not Bat files and powershell are well tested and are up to the task.

So say you want to automate a nightly or weekly dataload. Unless you are lucky enough to get the file in exactly the format you need for importing in to Salesforce.com you will need to do some data manipulation. I am going to share with you some of the tips and tricks I learned this week. All of these examples assume you are working with text files with tab, comma, or other delimited text and have been tested with some rather large (700k+ rows) data sets.

Missing Headers?

This is an old DOS trick. My current challenge is I am working with a set of data files from SAP ERP that lack any headers. What is this craziness? Thankfully I have the headers defined but do I really have to have headers? Well no but to ease the efforts of data mapping it does make it a bit easier to visual plus I know Apex Dataloader can start at a specific point but I’m not sure the Apex dataload can load without headers.

So what I do is export out the headers to a file with the same delimiter as the data file and then do the following command: headers_file.txt + data_file.csv combined_destination.csv

Basically you can chain file plus file plus file plus file, etc. and then what ever the last filename is will be the destination. Unfortunately you cannot copy files on to themselves.

Converting Tab Delimited to Comma Delimited CSV

In addition to no headers, SAP gives me a file that is tab delimited. You will read that Salesforce.com data loader command line and other tools can read and use tab delimited but let me help you cut the corner on the frustration making this a reality. Don’t bother, convert it and move on.

Here is some Powershell love that converts tab delimited to CSV: import-source_file.csv -delimiter “`t” | export-csv destination_file.csv -NoTypeInformation

You will need to create a text file with an extension of ‘PS1’ and call it from a bat (or batch if you’re old school) like so: PowerShell -NoProfile -ExecutionPolicy Bypass -Command “& ‘C:\path_to_file\tabconversion.ps1′”

This assumes that you are calling it from your profile, if you need something that is more universal, google it!

More Powershell Love in the next post…got to keep this short and sweet otherwise I’ll never get it posted.

Leave a Reply

Your email address will not be published. Required fields are marked *