.ultrageek. Computers, Music, and Thoughts


Quick Tip: CSVFIX Duplicate Column

One tool I use to do bulk data loads in to Salesforce.com does not allow mapping one source to multiple destination fields. You can of course use workflows or code to backfill the data once it gets in but a cheaper/easier alternative if it is just one field is the following:

  • csvfix eval -e "if($1!='',$1,$1)" etl_temp.csv > etl_temp2.csv

It is a bit of a hack but basically as long as there is a value in the column it copies it to the last column. $1 is the field position of that row, != is the not equals, and two single quotes for 'Empty String'. The if statement uses double quotes for windows, use single quotes for non-windows.


ERROR: std::bad_alloc

If you are using CSVFIX and running UNIQUE against a large data set you have to remember that CSVFIX loads the entire file in to memory to do the processing. So if you get this...check and watch your environments memory. Mine only had 3gig...it could use a little more for parsing 3+ million rows.

ERROR: std::bad_alloc


Quick Tip: Double Quoted CSV Data

Sometimes I get SAP data that is quoted with both single quotes and double quotes. The data looks something like this:

  • "'001'","'01'","'100123'","'#'","'#'","'#'","'100123'"

CSVFIX will strip out the extra single quotes with the following command:

  • csvfix read_dsv -s "," -csv Input_File1.csv > Output_File1.csv

Seems unneeded but otherwise your output is double quoted. I blame SAP.

The magic is the '-CSV' switch. Make this your first command and you're right as rain. I made it the second in a script I had that first stripped the first header row, see today's previous posting, and results were not as expected. Could be just a fluke as it should not have mattered.

BTW Excel calls this 'Text Qualifier" on Step 2 of the data import wizard, another way to fix or strip these out but then that eliminates automation which is the whole point of CSVFIX. Excel can do single and double but in the above example will read it correctly ad-hoc analysis and test ETL.


Quick Tip: CSVFIX Skip First Line

Update Oct 2017 - For most csvfix commands the '-ifn' command will also skip the first line. I say most as I know that it does not work for "read_dsv" which is ok as long as you are not combining files in addition of converting to CSV. The below solution is also good if you're getting output that has an extended header. I have one source where the first 6 lines lists out how the report is created out of SAP.

Got headers that you don't want? Want to skip that first line and the 1.6 manual does not elude to how? Let me cut to the chase:

  • csvfix remove -if "$line == 1" input_file.csv > output_file.csv

This will skip the first line so that you can add your own headers.

Tagged as: No Comments