• ☆ Deploying Jekyll with Git

    When developing software, and especially Web applications, I’ve found that thinking about the best deployment method early on is usually a good thing. Solving the deployment situation answers the question: What’s the best way to make the latest version of my software available to users?

    The answer very much depends on the software being developed, and the tools used to do so. I’ve recently played with a tool called Jekyll, so I’ll talk about that.


    Jekyll is a simple but powerful tool for creating Web sites. It’s a little different from Web application frameworks like Django, or other similar frameworks that are available.1 A traditional Web application is typically backed by a database and supports dynamic content generation. When a request comes in from a Web browser the application first needs to determine what the request is for. Then it calls the code in the application that is responsible for handling the request. The application might fetch some data from a database, and then use a template engine to iterate over the data to generate the final page content. Finally, the result is returned and sent over the wire to the requesting browser. The backend server is not just returning static files, but runs application code that handles requests and returns results based on the request and the current state of the database.

    Jekyll works very much the same way, but with the difference that all the resources available to be requested are pre-generated. For this reason, Jekyll and similar frameworks are often referred to as static Web site generators. Keep in mind that Jekyll is very different from hand-coding your own HTML pages. Jekyll provides an advanced template system, enables data-driven page generation, has built-in support for Markdown and Sass, and can incrementally re-compile pages when their content changes.

    The whole point of a static Web site generator is that it pre-generates the content to be served, i.e. the Web pages. This greatly simplifies content deployment. The entire Web site can be deployed as simple resources on a Web server (e.g. nginx), just like images or other static content. There is no need to run a separate application on the backend server to interpret requests and determine the exact content to return.

    What does static mean?

    The word “static” sounds limiting. Really, the word “static” should only be used to describe how the resulting Web site is deployed. This does not mean that you cannot create data-driven Web sites, or that you cannot provide dynamic experiences to users. There just isn’t any dynamism on the backend, per se.

    Much of the dynamism on the Web today comes from the client-side, taking place directly in the browser using JavaScript and CSS. Client-side frameworks like AngularJS and Ember help develop rich user experiences, and other JavaScript frameworks make it easy to integrate dynamic content like maps (e.g. Leaflet).

    Jekyll also supports data files that can be parsed and used during content generation. These data files can be created by hand, by tools, or generated from databases. But keep in mind that when the data changes, the Web pages that serve that data need to be re-generated.

    Finally, using Jekyll to generate the content of a Web site or application does not preclude using a separate companion backend application to handle data storage and management, e.g. via an HTTP-based API. I have not tried it myself, but this setup could provide for a nice separation between Web page content and backend data services.

    To create and build your first Jekyll Web site is easy:2

    > jekyll new lovelysite.com
    > cd lovelysite.com
    > jekyll build 

    The generated site will end up in a directory called _site.

    Deploying with Git

    Jekyll is a very interesting and powerful framework, but I really wanted to discuss deployment scenarios using Git. The basic idea we want to achieve is the following:

    1. Keep all content version controlled using Git
    2. Keep a remote Git repository on the deployment server
    3. To re-deploy updated content, simply push changes to the remote Git repository

    Git has two kinds of repositories: “bare” and “non-bare”. The main difference is that bare repositories do not contain a working directory. That is, you cannot directly see the files that are part of that repository.3 Another minor difference is the convention that the name of the root directory of bare repositories ends with .git. It is not recommended that you push changes from a non-bare repository to another non-bare repository, because it already contains a working directory and it can cause issues unless you know what you are doing.

    We want to have:

    • A bare repository on the deployment server
    • A non-bare clone of the repository on our local machine where we do work

    We login to our remote server to setup our bare repository:

    remote> mkdir -p ~/git/lovelysite.com.git
    remote> cd ~/git/lovelysite.com.git
    remote> git init --bare --shared .

    Notice the .git ending on the directory name. Then on our local machine:

    local> mkdir -p ~/git/lovelysite.com
    local> jekyll new ~/git/lovelysite.com
    local> cd ~/git/lovelysite.com
    local> git init . 
    local> git add .
    local> git commit -m "first release"
    local> git remote add origin user@remoteserver.com:~/git/lovelysite.com.git
    local> git push --set-upstream origin master

    For the non-initiated Git user, this might seem like a lot of strange commands. But we did the following: We created an empty bare Git repository on the remote machine to hold the content of our site. We then created the initial content on our local machine using Jekyll (it creates sample pages that we can modify later). We made the Jekyll site a non-bare Git repository, added all the files in the current directory to be committed, and then committed them using a simple commit message: “first release”. Next we told Git about our remote Git repository, found at ~/git/lovelysite.com.git on the machine remoteserver.com that we can access using the user user. In the last step we pushed our first commit to the remote server (--set-upstream tells Git to remember the link between our local repository and the remote one).

    Automatic deployment with Git hooks

    Almost done. The last step is to configure the remote repository to re-build the entire Jekyll site when new content is pushed into that repository. This is done using Git hooks. Hooks are just scripts that are run at certain times in the Git workflow process. We will use the post-receive hook that runs after Git has done all its work. For bare Git repositories, hooks are stored in a folder called hooks in the root repository directory. Make sure that there is a file called post-receive with the following contents and that it is executable.

    remote> cat ~/git/lovelysite.com.git/hooks/post-receive 
    git clone $GIT_REPO $TMP_GIT_CLONE
    jekyll build -s $TMP_GIT_CLONE -d $PUBLIC_WWW
    rm -Rf $TMP_GIT_CLONE
    remote> chmod 755 ~/git/lovelysite.com.git/hooks/post-receive

    The assumption here is that $HOME/webapps/lovelysite is the location where your Web provider tells you to put the contents of your site.

    Now, when we push new content, the post-receive script is run. The script creates a temporary clone of the repository with a working directory, and then uses that to build the site. Once the site is built and deployed, the temporary clone is removed.

    With this setup, your deployment strategy becomes:

    local> git add [files to update]
    local> git commit -m "added new content"
    local> git push 

    It doesn’t get much more simple than that. Revision control and deployment strategy closely integrated.4

    1. There are many Web application frameworks available, in a range of different languages and with different architectural styles and patterns.

    2. Jekyll first has to be installed and the jekyll binary has to be on your $PATH. See Jekyll’s Web site for instructions.

    3. There is a relationship between bare and non-bare repositories. Each non-bare repository has a .git folder that contains an object database with all the revisions, along with other metadata and scripts. Next to the .git folder are all the files in the working directory. A bare repository on the other hand does not have a working directory, instead all the files in the .git directory in a non-bare repository are in the root repository directory.

    4. In an ideal world, you also want to have a non-deployment remote repository that you can push to without causing a re-deployment. You can for example host this on GitHub or in your Dropbox folder. Make this remote repository your default remote repository, and name your remote deployment repository deploy. Now your deployment strategy becomes: git push deploy master.

  • → Data is a public good

    Kathryn Sullivan, the US National Oceanic and Atmospheric Administration (NOAA) administrator:

    The big data revolution could lead to currently unimagined uses for the data we receive from satellites. Entrepreneurs could come up with new applications and ideas for mashing up data. But the data itself should, I believe, be regarded as a public good. How to guarantee this, in a world where public budgets are squeezed and space exploration is becoming increasingly affordable for private players, is a question that deserves serious thought and active engagement.

    I certainly agree that data is a public good, and unrestricted access to data is important to fuel new discoveries and put it to new valuable uses.

    Sullivan brings up a good question about the balance between government and private-funded satellite programs (or access to space in general). The private sector is certainly contributing to exploring space and the earth, which is good for the short term, but will it also serve us in the long term?

    SpaceX is resupplying the International Space Station (ISS) and Planet Labs has raised well over $100 million in order to take high-resolution (5 meters or less) images of the entire globe.

  • ☆ Increased productivity with shell scripting

    In the past 20 years or so I’ve used and played with different computers and operating systems, but for the last 10 years I’ve almost exclusively used a Mac, both at home, and at work when I can get away with it (and I have).1

    The Mac is a great productivity tool for my work-related tasks, which includes software development, application deployment, database management and those sort of things. One big reason the Mac is so productive to me is that the core of the Mac operating system, OS X, is POSIX-compliant and provides a command-line interface, or CLI for short.

    The CLI is powered by a Unix shell that interprets the commands you enter and executes them. There are many different shell interpreters but the default one on OS X is Bash.

    Bash provides a lot of powerful features, but I’ve found that learning just a few simple commands and tricks can go a long way to becoming more productive, especially in automating repetitive tasks.

    The Basics

    First of all, there are a lot of basic shell commands that are needed to do almost anything useful. Examples include: ls, cp, mv, more, grep, tail, cat, echo, wc, sleep.2 I will not describe these commands here but it’s easy to find examples of how they work on the web, or you can read the documentation using the man command, e.g.:

    > man mv

    Another useful functionality provided in the shell is “pipes”, which is a method to direct the output of one command to become the input of another command. This makes it possible to compose commands into more complex structures. A simple example:

    > cat file.txt | grep "world" | wc -l 

    The command above outputs the content of the file file.txt to grep, which matches lines containing the word “world”. These matching lines are then provided as input to wc (short for “word count”), which counts the lines (the -l flag tells wc to only count lines). So, the above command answers the question: How many lines in file.txt contain the word “world”?


    It is often useful to execute the same command multiple times for different values. This facilitates automation and batch jobs. The simplest example is the for-loop, common from most programming languages. The command takes a variable name, and a list of entries to iterate over. The simplest example would be:

    > for i in 1 2 3; do echo $i; done

    The variable name given is i and it can be referenced using $i. The list consists of the three numbers 1, 2, and 3. For each item in the list, we execute the command echo $i. As expected, the result is that the shell prints the numbers 1, 2, and 3.

    We can do something slightly more useful:

    > for i in `ls *`; do echo $i; done

    The above command is a complicated way of listing all files in the current directory. The result from the command ls * makes up the list we are looping over (notice the back ticks). Then each entry, i.e. each filename, is echoed back in the shell.

    There is also a while command that loops until a condition is no longer true. An example would look like this:

    > i=1; while [ $i -lt 4 ]; do echo $i; i=$[$i+1]; done

    The loop condition is true while the variable i is less than 4. The variable i is initially set to 1 and is incremented each time the loop body is executed. The semi-colon separates two commands, executing the second after the first.

    Processing Data

    There is another set of commands that are useful for processing data during batch operations. We start with cut. The cut command allows extracting specific values from a (structured) line of text. Here is an example:

    > echo "Hello terrible world" | cut -d " " -f 1,3 
    Hello world 

    We specify that the delimiter is a space, and that we want the first and third fields (using the -f flag). cut can also find specific byte or character ranges.

    Another useful command is read which can assign variable names to fields in a line of text.

    > read f1 f2 f3 <<< "Hello terrible world"; echo $f1 $f3
    Hello world 

    The <<< operator indicates a “here string”, which I will not go into here, but essentially it provides a way to specify input to the read command inline, directly in your script, without necessarily reading the contents from a file.

    Together, cut and read allow us to filter data sets, and extract data into variables that we can reference in our scripts. Here is a contrived example:

    > read f1 f2 <<< `echo "Hello terrible world" | cut -d " " -f 1,3`; echo $f1 $f2
    Hello world 

    Example: Images

    Let’s put our knowledge to practice. ImageMagick is a set of command-line tools for processing images. ImageMagick provides a tool called identify that shows details about a particular image.

    > identify image1.png 
    image1.png PNG 240x135 240x135+0+0 8-bit sRGB 15.3KB 0.000u 0:00.009

    We see the image name itself, the image type, the image dimensions and some other metadata, each piece of information separated by a space (good to know, right?). If we are only interested in the image name and the dimension we can do this:

    > identify image1.png | cut -d " " -f 1,3 
    image1.png 240x135

    Need to move things around?

    > read image type dim <<< `identify image1.png | cut -d " " -f 1,2,3`; echo $type $image $dim
    PNG image1.png 240x135

    If you have a directory full of PNG images, you can do this:

    > for i in `ls *.png`; do identify $i | cut -d " " -f 1,3; done
    image1.png 240x135
    image2.png 240x135

    You can perhaps imagine how easy it is to batch-convert a folder full of images using the convert tool, also part of ImageMagick.

    Bonus example where we use printf to re-format the output:

    > for i in `ls *.png`; do read file size <<< `identify $i | cut -d " " -f 1,3`; printf "%-20s %s\n" "$file" "$size"; done
    image1.png           240x135
    image2.png           240x135

    Example: Database

    Here is another example that I’ve come across if my work where scripting is very useful. Let’s assume we have a MySQL table with summary reports for a set of data records. Each report, among other things, has a unique ID (column id) and a count (column count) for the number of records in that report.

    If you execute a MySQL SELECT query from the MySQL prompt, you get a table-like structure:

    mysql> SELECT id,count FROM `report`;
    | id  | count  |
    |   1 | 100000 |
    |   2 |  10000 |
    |   3 |   1000 |

    However, if you execute the same query from the command-line and re-direct the result to a file, you get a space-separated format, which is easier to handle given our newly acquired skills of filtering and processing such data.

    > mysql -e "SELECT id,count FROM \`report\`" > data.txt
    > cat data.txt
    id      count
    1       100000
    2       10000
    3       1000

    Let’s further assume we have a Web service that calculates some statistics for each report, given the report’s ID, and we want to process a large amount of reports. The first thing to do is to remove the first line in the file, which contains the column headers. For sake of simplicity, this can be done manually using a text editor.

    Now we can read each line of the file using a somewhat familiar command:3

    > while read i; do echo $i; done < data.txt 
    1 100000
    2 10000
    3 1000

    The last part of the command, < data.txt, sends the contents of the file as input to read, causing each line in turn to be read and associated with the variable i.

    Now we can break up the data for each line:

    > while read i; do read id count <<< $i; echo "updating: $id; count: $count"; done < data.txt
    updating: 1; count: 100000
    updating: 2; count: 10000
    updating: 3; count: 1000

    Now, of course, instead of simply echoing the report ID we can make a request to the Web service using curl, something like this:

    > curl -X POST -d "{\"id\": 1}" http://localhost/api/update/ --header "Content-Type:application/json" 

    Better yet, put this command in a script:

    > cat update.sh
    curl -X POST -d "{\"id\": $2}" http://$1/api/update/ --header "Content-Type:application/json" 
    > chmod 755 update.sh 

    This script can be called like this:

    > ./update.sh localhost 1
    { status: OK }

    The assumption is that this call to the Web service running at http://localhost/api/update/ will update the statistics for the report referenced by the ID (here: 1). The “{ status: OK }” output on the console is the result from the Web service call, here showing that the request was successful.

    Putting all this together we can now do the following:

    > while read i; do read id count <<< $i; echo "updating: $id; count: $count"; ./update.sh localhost $id; done < data.txt
    updating: 1; count: 100000
    { status: OK }
    updating: 2; count: 10000
    { status: OK }
    updating: 3; count: 1000
    { status: OK }


    The result of all this is that we have automated the extraction of information from a MySQL database and performed data processing using a Web service with a few simple shell commands.

    There is so much more that can be done, but this has hopefully given an indication of the power of knowing some simple commands, and really how easy it is to be more productive using the shell and the command line.

    1. Most recently I’ve used the 13” MacBook Pro, which is a great choice for a lot of situations.

    2. It is also very useful to know a text editor that can be run in the shell, rather than needing a windowing server to display the interface. This is in particular useful in cloud infrastructures and in any kind of remote login scenario where this might be the only option. My current recommendation is Vim.

    3. For this to work properly the “IFS” variable value needs to be inspected, or best, cleared, which can be done like so: unset IFS.

  • → Climate Modeling

    What if humans continue to release greenhouse gases into the atmosphere at a rising rate? How will this affect global climate? How will the biosphere (life on Earth) respond to these changes? Actually, these questions were first asked more than 100 years ago by Swedish physicist Svante Arrhenius.

    Interesting write-up by David Herring (from 1999) about climate modeling using computer models, different model types (simulation vs. data-driven), their role and function, caveats and their potential to help us understand the future.

  • → Bangladesh Confronts Climate Change

    River deltas around the globe are particularly vulnerable to the effects of rising seas, and wealthier cities like London, Venice and New Orleans also face uncertain futures. But it is the poorest countries with the biggest populations that will be hit hardest, and none more so than Bangladesh, one of the most densely populated nations in the world. In this delta, made up of 230 major rivers and streams, 160 million people live in a place one-fifth the size of France and as flat as chapati, the bread served at almost every meal.


    Bangladesh relies almost entirely on groundwater for drinking supplies because the rivers are so polluted. The resultant pumping causes the land to settle. So as sea levels are rising, Bangladesh’s cities are sinking, increasing the risks of flooding. Poorly constructed sea walls compound the problem.

    The country’s climate scientists and politicians have come to agree that by 2050, rising sea levels will inundate some 17 percent of the land and displace about 18 million people, Dr. Rahman said.


    Rising seas are increasingly intruding into rivers, turning fresh water brackish. Even routine flooding then leaves behind salt deposits that can render land barren.


    Mr. Karim estimated that as many as 50 million Bangladeshis would flee the country by 2050 if sea levels rose as expected.

    Fascinating story in The New York Times about the devastation climate change, rising sea levels and coastal erosion have on a low-lying and poor country like Bangladesh.

View older posts