Converting a CSV to SQL using 1 line in bash
The command line is very powerful and
can do amazing stuff in one single line by pipelining a series of
commands. This post is inspired after creating a line that mixed
sed and awk, but with just only awk I'll show you an example on
how to convert a CSV file to an SQL insert
Let's take an input CSV named
events-2013-06-06.csv
with 16 columns per line. It
looks like this:
k51b04876036e2,192.168.54.67,3a8d6196,2013-06-06,03:29:42,started,,,,no active campaign,Spain,ca-es,,v1.0,1370482182 k51b04876036e2,192.168.54.67,3a8d6196,2013-06-06,03:29:43,first-run,,,,no active campaign,Spain,ca-es,,v1.0,1370482183 k51b04876036e2,192.168.54.67,3a8d6196,2013-06-06,03:30:17,close,34,,,no active campaign,Spain,ca-es,34,v1.0,1370482217 k51b0494a76071,192.168.54.67,febd870c,2013-06-06,03:33:14,started,,,,no active campaign,Spain,ca-es,,v1.0,1370482394 ...
I took 16 because is more or less a real-life example. I added some empty columns too to make it more real.
And this is the SQL output we want:
INSERT INTO tracking VALUES ('k51b04876036e2',INET_ATON('192.168.54.67'),'3a8d6196','2013-06-06','03:29:42','started','','','','no active campaign','Spain','ca-es','','v1.0','1370482182',''); INSERT INTO tracking VALUES ('k51b04876036e2',INET_ATON('192.168.54.67'),'3a8d6196','2013-06-06','03:29:43','first-run','','','','no active campaign','Spain','ca-es','','v1.0','1370482183',''); INSERT INTO tracking VALUES ('k51b04876036e2',INET_ATON('192.168.54.67'),'3a8d6196','2013-06-06','03:30:17','close','34','','','no active campaign','Spain','ca-es','34','v1.0','1370482217',''); INSERT INTO tracking VALUES ('k51b0494a76071',INET_ATON('192.168.54.67'),'febd870c','2013-06-06','03:33:14','started','','','','no active campaign','Spain','ca-es','','v1.0','1370482394',''); ...
The one-liner that produced the output above was:
cat events-2013-06-06.csv | awk -F',' '{ printf "INSERT INTO tracking VALUES (\x27%s\x27,INET_ATON(\x27%s\x27),\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27,\x27%s\x27);",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16;print ""}'
Don't get scared yet... more generically:
cat your_file.csv | awk -F',' '{ printf "INSERT INTO table VALUES (\x27%s\x27,\x27%s\x27);",$1,$2;print ""}'
The big line decomposed:
- The awk command will scan your buffer/file
looking for commas, then splits every line into several fields.
The comma was passed as parameter to awk as column separator:
This was:
awk -F','
- Then awk will assign a numeric variable for
every column found, allowing you to write any text using as
placeholders these variables. If you want to ignore any of the
columns, just don't use them when printing.
'{ printf "...YOUR TEXT AND PLACEHOLDERS %s HERE...",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16;print ""}'
- Then there is plenty of this strange
\x27
. Well, it is pretty messy in the reading but this is just an encoded way of writing single quotes so they are not interpreted as the command itself.
As a final note, do notice that I used printf
in the
command instead of print
. The only difference is
that using printf you put the %s placeholder in the text, while
if you use print you write the number right away. So, if your CSV
has a few fields maybe is more visual for you the print command,
like this:
cat yourfile | awk -F',' '{ print "INSERT INTO tracking values(" $1 ")" }'
The only precaution here is that the $1 must be out of the quotes.
Let me know if any of these worked for you! Works like a charm!
And if you want to know more, there is a very cheap reference book for awk and sed: