In the past 20 years or so I’ve used and played with different computers and operating systems, but for the last 10 years I’ve almost exclusively used a Mac, both at home, and at work when I can get away with it (and I have).1

The Mac is a great productivity tool for my work-related tasks, which includes software development, application deployment, database management and those sort of things. One big reason the Mac is so productive to me is that the core of the Mac operating system, OS X, is POSIX-compliant and provides a command-line interface, or CLI for short.

The CLI is powered by a Unix shell that interprets the commands you enter and executes them. There are many different shell interpreters but the default one on OS X is Bash.

Bash provides a lot of powerful features, but I’ve found that learning just a few simple commands and tricks can go a long way to becoming more productive, especially in automating repetitive tasks.

The Basics

First of all, there are a lot of basic shell commands that are needed to do almost anything useful. Examples include: ls, cp, mv, more, grep, tail, cat, echo, wc, sleep.2 I will not describe these commands here but it’s easy to find examples of how they work on the web, or you can read the documentation using the man command, e.g.:

> man mv

Another useful functionality provided in the shell is “pipes”, which is a method to direct the output of one command to become the input of another command. This makes it possible to compose commands into more complex structures. A simple example:

> cat file.txt | grep "world" | wc -l 

The command above outputs the content of the file file.txt to grep, which matches lines containing the word “world”. These matching lines are then provided as input to wc (short for “word count”), which counts the lines (the -l flag tells wc to only count lines). So, the above command answers the question: How many lines in file.txt contain the word “world”?

Repetition

It is often useful to execute the same command multiple times for different values. This facilitates automation and batch jobs. The simplest example is the for-loop, common from most programming languages. The command takes a variable name, and a list of entries to iterate over. The simplest example would be:

> for i in 1 2 3; do echo $i; done
1
2
3

The variable name given is i and it can be referenced using $i. The list consists of the three numbers 1, 2, and 3. For each item in the list, we execute the command echo $i. As expected, the result is that the shell prints the numbers 1, 2, and 3.

We can do something slightly more useful:

> for i in `ls *`; do echo $i; done

The above command is a complicated way of listing all files in the current directory. The result from the command ls * makes up the list we are looping over (notice the back ticks). Then each entry, i.e. each filename, is echoed back in the shell.

There is also a while command that loops until a condition is no longer true. An example would look like this:

> i=1; while [ $i -lt 4 ]; do echo $i; i=$[$i+1]; done
1
2
3

The loop condition is true while the variable i is less than 4. The variable i is initially set to 1 and is incremented each time the loop body is executed. The semi-colon separates two commands, executing the second after the first.

Processing Data

There is another set of commands that are useful for processing data during batch operations. We start with cut. The cut command allows extracting specific values from a (structured) line of text. Here is an example:

> echo "Hello terrible world" | cut -d " " -f 1,3 
Hello world 

We specify that the delimiter is a space, and that we want the first and third fields (using the -f flag). cut can also find specific byte or character ranges.

Another useful command is read which can assign variable names to fields in a line of text.

> read f1 f2 f3 <<< "Hello terrible world"; echo $f1 $f3
Hello world 

The <<< operator indicates a “here string”, which I will not go into here, but essentially it provides a way to specify input to the read command inline, directly in your script, without necessarily reading the contents from a file.

Together, cut and read allow us to filter data sets, and extract data into variables that we can reference in our scripts. Here is a contrived example:

> read f1 f2 <<< `echo "Hello terrible world" | cut -d " " -f 1,3`; echo $f1 $f2
Hello world 

Example: Images

Let’s put our knowledge to practice. ImageMagick is a set of command-line tools for processing images. ImageMagick provides a tool called identify that shows details about a particular image.

> identify image1.png 
image1.png PNG 240x135 240x135+0+0 8-bit sRGB 15.3KB 0.000u 0:00.009

We see the image name itself, the image type, the image dimensions and some other metadata, each piece of information separated by a space (good to know, right?). If we are only interested in the image name and the dimension we can do this:

> identify image1.png | cut -d " " -f 1,3 
image1.png 240x135

Need to move things around?

> read image type dim <<< `identify image1.png | cut -d " " -f 1,2,3`; echo $type $image $dim
PNG image1.png 240x135

If you have a directory full of PNG images, you can do this:

> for i in `ls *.png`; do identify $i | cut -d " " -f 1,3; done
image1.png 240x135
image2.png 240x135

You can perhaps imagine how easy it is to batch-convert a folder full of images using the convert tool, also part of ImageMagick.

Bonus example where we use printf to re-format the output:

> for i in `ls *.png`; do read file size <<< `identify $i | cut -d " " -f 1,3`; printf "%-20s %s\n" "$file" "$size"; done
image1.png           240x135
image2.png           240x135

Example: Database

Here is another example that I’ve come across if my work where scripting is very useful. Let’s assume we have a MySQL table with summary reports for a set of data records. Each report, among other things, has a unique ID (column id) and a count (column count) for the number of records in that report.

If you execute a MySQL SELECT query from the MySQL prompt, you get a table-like structure:

mysql> SELECT id,count FROM `report`;
+-----+--------+
| id  | count  |
+-----+--------+
|   1 | 100000 |
|   2 |  10000 |
|   3 |   1000 |
+-----+--------+

However, if you execute the same query from the command-line and re-direct the result to a file, you get a space-separated format, which is easier to handle given our newly acquired skills of filtering and processing such data.

> mysql -e "SELECT id,count FROM \`report\`" > data.txt
> cat data.txt
id      count
1       100000
2       10000
3       1000

Let’s further assume we have a Web service that calculates some statistics for each report, given the report’s ID, and we want to process a large amount of reports. The first thing to do is to remove the first line in the file, which contains the column headers. For sake of simplicity, this can be done manually using a text editor.

Now we can read each line of the file using a somewhat familiar command:3

> while read i; do echo $i; done < data.txt 
1 100000
2 10000
3 1000

The last part of the command, < data.txt, sends the contents of the file as input to read, causing each line in turn to be read and associated with the variable i.

Now we can break up the data for each line:

> while read i; do read id count <<< $i; echo "updating: $id; count: $count"; done < data.txt
updating: 1; count: 100000
updating: 2; count: 10000
updating: 3; count: 1000

Now, of course, instead of simply echoing the report ID we can make a request to the Web service using curl, something like this:

> curl -X POST -d "{\"id\": 1}" http://localhost/api/update/ --header "Content-Type:application/json" 

Better yet, put this command in a script:

> cat update.sh
#!/bin/sh
curl -X POST -d "{\"id\": $2}" http://$1/api/update/ --header "Content-Type:application/json" 
> chmod 755 update.sh 

This script can be called like this:

> ./update.sh localhost 1
{ status: OK }

The assumption is that this call to the Web service running at http://localhost/api/update/ will update the statistics for the report referenced by the ID (here: 1). The “{ status: OK }” output on the console is the result from the Web service call, here showing that the request was successful.

Putting all this together we can now do the following:

> while read i; do read id count <<< $i; echo "updating: $id; count: $count"; ./update.sh localhost $id; done < data.txt
updating: 1; count: 100000
{ status: OK }
updating: 2; count: 10000
{ status: OK }
updating: 3; count: 1000
{ status: OK }

Conclusions

The result of all this is that we have automated the extraction of information from a MySQL database and performed data processing using a Web service with a few simple shell commands.

There is so much more that can be done, but this has hopefully given an indication of the power of knowing some simple commands, and really how easy it is to be more productive using the shell and the command line.

  1. Most recently I’ve used the 13” MacBook Pro, which is a great choice for a lot of situations.

  2. It is also very useful to know a text editor that can be run in the shell, rather than needing a windowing server to display the interface. This is in particular useful in cloud infrastructures and in any kind of remote login scenario where this might be the only option. My current recommendation is Vim.

  3. For this to work properly the “IFS” variable value needs to be inspected, or best, cleared, which can be done like so: unset IFS.