• Poor man's service monitoring using HAProxy

    There are many tools and services available to monitor your web services. Pingdom, for example, is a popular service, but you can also write your own custom shell scripts to ensure your services are up and healthy.

    A compromise between a service you can pay for and a completely custom solution is to rely on existing tools to do the job for you. For example, why not let a load balancer with support for health checks monitor your web services? HAProxy is a great load balancer with a built-in UI for displaying the status of the services you are (supposedly) load balancing between. Of course in our case we are not load balancing between anything, because we are not sending requests to the service instances. We just want to know if the services are up or not. Doesn’t this look great?

    At least this gives you something to look at with clear color-coded indications if something is wrong with your services.

    However, you may want to be notified if something goes down, so how do you accomplish that? Luckily, HAProxy can export this list as CSV using an HTTP endpoint (/;csv). So if you wanted to you could just curl that endpoint, grep the result for “DOWN” and send a push notification to your phone for each match using something like Pushover.

    Here is an example HAProxy configuration for this setup (file: /etc/haproxy/haproxy.cfg):

        log local0 notice
        maxconn 100
        user haproxy
        group haproxy
        log global
        mode http
        option httplog
        option dontlognull
        retries 3
        option redispatch
    frontend frontend
        bind *:80
        mode http
    backend tomcat-servers
        mode http
        balance roundrobin
        option httpclose
        option forwardfor
        option httpchk GET /health_check/ 
        # nodes to monitor 
        server check fall 3 rise 2 maxconn 10 
        # more services here...
    backend docker-hosts
        mode http
        balance roundrobin
        option httpclose 
        option forwardfor
        option httpchk GET /health_check/ 
        # nodes to monitor 
        server check fall 3 rise 2 maxconn 10 
        # more services here...
    listen stats *:1936
        stats enable
        stats uri /
        stats hide-version

    Of course, you want to install and configure HAProxy using Ansible. And if you get creative you can be smart about which hosts from your Ansible inventory to include to be monitored, and how to categorize the services in HAProxy to better understand what is down when something goes wrong.

    The key to this particular configuration is the httpchk option which uses a custom endpoint (/health_check/) that needs to be implemented by the services being monitored. As long as the services returns a valid HTTP 200 response for this endpoint, everything will look good. The implementation of this endpoint can be as simple as to immediately return a response, or exercise the service is some way, for example by querying any configured databases.

    What’s your poor man’s web service monitoring setup?

  • Ansible playbooks and mastering task execution

    In my attempt to promote the usage of Ansible, I have often tried to succinctly explain some of its core concepts. In particular, I’ve tried to communicate a few simple features that I personally have gotten a lot of leverage out of:

    1. Roles
    2. Tags
    3. Host limits

    Master these simple concepts and you can do powerful things with Ansible.

    Roles can create some confusion when learning Ansible. I’ve always talked about them as an abstraction for task reuse. But that’s not very clear. It’s easier to think of them as reusable tasks executing against hosts in a context. That context can provide files, templates, variables, handlers, that the role’s tasks can refer to and make use of during a play. But remember that, in the end, applying a role to a host is still all about the tasks.

    That brings us to two main ways of slicing and dicing the execution of tasks against hosts during a play. This is in particularly useful when you have large inventories, and master playbooks for entire deployments (playbooks that in turn include playbooks for specific services/applications).1 Let’s say you have a master playbook site.yml and can do:

    $ ansible-playbook -i prod site.yml 

    That’s great because you can quickly deploy your entire system with a single command. It’s also easy to remember. But you rarely need to deploy your entire system.

    By mastering the usage of tags and host limits, you can precisely target the execution of specific tasks across your entire deployment. I use that all the time. Visually, this is how I think about it:

    All hosts in the play (across the bottom in the graphics above) can have tasks executed against them. By default, all applicable tasks gets executed against all hosts in the play. However, within a given play, exactly which tasks gets executed against which hosts can be controlled using tags and host limits.

    If all tasks in your roles have tags with a sensible naming convention, it’s easy to only run tasks in a specific role.2

    $ ansible-playbook -i prod site.yml -t "mysql-server"

    This play might only be applicable to specific servers anyway. You can easily limit the hosts to be considered with limits, like so:

    $ ansible-playbook -i prod site.yml -t "mysql-server" -l "mysql-servers"

    Alternatively, consider this example, re-deploying all Tomcat applications:

    $ ansible-playbook -i prod site.yml -t "tomcat-war" -l "tomcat-servers"

    Now you have limited the play to a specific subset of hosts, and a subset of tasks. This gives you a lot of freedom and control. Below are some more examples that I use all the time.

    Pull all Docker images across the deployment infrastructure:

    $ ansible-playbook -i prod site.yml -t "docker-images-pull" -l "docker-hosts"

    Deploy all Docker containers:

    $ ansible-playbook -i prod site.yml -t "docker-containers" -l "docker-hosts"

    Re-deploy a specific application on Tomcat:

    $ ansible-playbook -i prod site.yml -t "tomcat-war" -l "front-end-applications"

    Re-configure your front-end application load balancer:

    $ ansible-playbook -i prod site.yml -t "haproxy" -l "front-end-applications"

    Spend a few moments learning these concepts. I can guarantee you will find them useful.

    1. As recommended in Ansible’s best practices documentation.

    2. I like to tag all tasks in a role with the name of the role, and then provide additional tags with the role name as a prefix. For example, I might give the following tags to a task in the role mysql-server that updates the configuration: mysql-server, mysql-server-config.

  • Programming and Consistency

    In software development there is a tension between obsessing about a single line of code, and shipping. Focus too much on the perfect code and you will never ship a product, but ship without caring about the underlying code and its structure introduces the risk of poor product quality (bugs, inefficiencies) and building up long-term technical debt. This tension is not specific to software development; the tension equally well applies to other kinds of creative work like writing a paper or a book.

    In the end, shipping becomes most important, because otherwise the value of all your efforts are to some extent squandered.1

    Perfect is the enemy of good.

    Not every line or block of code has to be perfect, but there is great value in consistency. I believe consistency is a highly valuable trait in a programmer. What do I mean by consistency? Consistency can mean many different things, ranging from coding style to the choice of libraries and architectural designs.

    Here are a few examples of poor stylistic (syntactical) decisions:

    • Sometimes indent you code with tabs and sometimes with spaces, or if using spaces, sometimes using 2 spaces and sometimes 4
    • Sometimes write if(var==2){ and sometimes if (var == 2) {, or any combination in between
    • If semi-colons are optional at end of statements, sometimes include them and sometimes not
    • If braces are optional for single-statement blocks, sometimes include use them and sometimes not2
    • Sometimes horizontally align variables and sometimes not3

    You get the point. But why does it matter you say? It’s not going to effect the running code. Am I just being pedantic? Perhaps, but I strongly believe in syntactical and stylistic consistency. It’s important for readability, which is important for maintainability, which in turn is important for evolving and improving the code in the long run. Readability is more than just avoiding writing “too smart” code, it’s also about style.

    Examples of consistency in library choices (unless there are specific needs) include avoiding using two different libraries for handling JSON in the same program, or using two different libraries for making HTTP requests. Sometimes a language, or framework, comes with built-in capabilities to make HTTP requests. If the built-in library is used in some situations but an external library is used in others, it requires programmers to be familiar with the quirks of both libraries. Be pragmatic, but think twice before adding a second way to make HTTP requests, or for anything else.

    Examples of architectural decisions can for example be as simple as having consistent API designs with common formats and structures for handling errors and providing error messages.

    To me consistency in programming and software development is just common sense, but I’m surprised how many programmers don’t seem to care about it.

    To be consistent, you have to be detail-oriented. It’s impossible to be consistent if you pay no attention to details.

    The kind of detail-orientation I’ve given examples of above are not necessarily requirements in deciding whether a person can, for example, design scalable and effective software architectures. However, I do believe attention to detail is somewhat of a canary in the coal mine to determine if a developer will be able to effectively develop, organize and maintain large-scale code-bases and architectures.

    The need for attention to details is not only for the code itself, but also for the documentation, the issue tracking, the tests, and the deployments and orchestration scripts.

    Douglas Rushkoff writes in The New York Times about grammar and clear thinking:

    [A]n employee who can write properly is far more valuable and promotable than one whose ambiguous text is likely to create confusion, legal liability and embarrassment. Moreover, a thinking citizen deserves the basic skills required to make sense through language, and to parse the sense and nonsense of others.

    Writing has to be clear, otherwise it causes confusion. This also applies to software development and architectural design, choice of tools and libraries, and, yes, the style of a single line of code.

    1. However, there is also inherent value in the work itself because you always learn something along the way.

    2. I’m looking at you Java.

    3. Actually, never align your variables horizontally

  • Ansible and Amazon EC2

    Ansible is a great software orchestration tool that I’ve enjoyed using for a while. A core concept in Ansible is the inventory: a description of all the hosts available for deployments. The inventory is not only a list of hosts, but provides ways to group hosts into logical named units. The names of the groups can describe what role the hosts play in your architecture, or their geographical location or something else. A simple inventory can look like this:

    # file: prod 

    This inventory has one host in the mysql-servers group, and two hosts in the web-applications group. This communicates that there is one host for running a MySQL server and two hosts for running a front-end web application.

    Now we can write a simple playbook to configure the MySQL hosts:

    # file: mysql-servers.yml
    - hosts: mysql-servers
        - mysql-server

    Notice how we reference the logical inventory group mysql-servers in the playbook. We can then configure the MySQL hosts by running the Ansible playbook:

    $ ansible-playbook -i prod mysql-servers.yml 

    This is fairly straightforward and the naming makes everything easy to understand. Importantly, the playbook is generic and can be reused for a different deployment environment were we also need MySQL servers. This as long as the new inventory file contains a similar organization and grouping as the first one, using the same names.

    Amazon EC2

    Amazon Elastic Compute Cloud (EC2) is a popular platform for deploying applications and services. When managing a larger infrastructure, in particular a dynamic one, it can get difficult to manually manage the inventory file and keep track of all the different hosts. Ansible supports dynamic inventories that help with exactly this problem. In short, instead of referencing a simple text inventory file when running playbooks, you can give Ansible a script as the inventory. The task of this script is to figure out the current state of the infrastructure and return the inventory in a particular format that Ansible understands (a JSON object).

    Ansible provides some example scripts for common platforms, including Amazon’s EC2. This script (ec2.py) returns all your EC2 hosts using different kinds of grouping concepts, including EC2 regions, instance types, and EC2 instance tags (key-value pairs). If I create an EC2 instance with the tags Name=mysql1 and Group=mysql-servers, the Ansible EC2 inventory script would return something like this:

    $ ec2.py --list 
        "us-east-1": [
        "tag_Name_mysql1": [
        "type_m3_medium": [
        "ec2": [
        "tag_Group_mysql_servers": [

    Now I can reference all my instances in a particular region, or by instance type, or by my tags. But to configure my MySQL servers on EC2 using my very simple and supposedly reusable playbook from above, I need to change it to this:

    # file: mysql-servers.yml
    - hosts: tag_Group_mysql_servers
        - role: mysql-server

    However, I don’t want to have a separate set of playbooks for different platforms. There are several options when it comes to resolving this issue. One solution is to define a text-based inventory that does the mapping for us, something like this:

    # file: prod-mapping 

    Now you can reference mysql-servers in your playbooks and it will include all the hosts in the tag_Group_mysql_servers group. However, this has to be done for all groups, so you have to do some work in managing and maintaining the mapping.

    The solution I have taken so far is to have a very simple script in front of the bundled ec2.py inventory script that allows us to augment the grouping of hosts. I’ve called this script ec2-augment.py and it can be configured to create additional host groups based on the instance tags. For example, for the group tag_Group_mysql_servers to add the same group of hosts under the name mysql-servers. So, if my script is used as the inventory script, it would return the following:

    $ ec2-augment.py --list 
        "us-east-1": [
        "tag_Name_mysql1": [
        "type_m3_medium": [
        "ec2": [
        "mysql-servers": [
        "tag_Group_mysql_servers": [

    This will work as long as I tag all my EC2 instances with the key Group and the value of the group name I want. For example, Group=mysql-servers, or Group=web-applications. This is easy to do consistently if you create all EC2 instances using the Ansible ec2 module.

    I can now run my existing playbooks against EC2:

    $ ansible-playbook -i ec2-augment.py mysql-servers.yml 

    I’ve published the current version of the script as a Gist.

  • Software Technology (2015)

    The past 12 months or so have been intersting for me from a software technology perspective. I’ve been trying to balance learning and introducing new technologies into the software stack for production systems I work on, while at the same time staying conservative enough to not introduce too many complexities and minimizing day-to-day operational issues.

    As documented in Scaling Pinterest, via Marco.org, choices should often fall on well known, liked, battle-tested, and performant tools. There is a good case to be made to adopt technologies that have had at least 3-5 years to mature and self-select in the marketplace, especially if they are to be used in production systems with requirements on stability and uptime.

    But it’s not only technology choices that affect a system’s runtime that matter. Choices about technology and tools for infrastructure management, as well as development environments and tools, also play an important role. Below is a highlight of some of the technologies I’ve used in the past few months that I believe I will be using on a regular basis over the next few years for a range of products and solutions.


    Ansible is a powerful software orchestration and management platform. Ansible can be used to issue commands on machines (or hosts) over SSH. The hosts can be local or remote, physical or virtual machines, and can be accessed via username and password, or more commonly using a PKI. Commands are issued via tasks, and can be for the purpose of configuring hosts for specific purposes, or for directly running services or applications. Ansible provides a large library of modules for performing common tasks on hosts. The true power of Ansible comes from needing to perform repeatable tasks on a large set of hosts.

    There are a relatively few number of core concepts that need to be understood in order to be proficient using Ansible.1 Once this is accomplished, a great deal of power is suddenly at your fingertips.

    I use Ansible to setup all new clouds VMs that I create with a consistent configuration. Using the powerful Ansible roles concept, I can easily configure machines for particular purposes, for example, MySQL servers, Redis instances, Tomcat containers, and lots of other things. I’ve used Ansible for complex deployments of service-oriented applications for production. I’ve come to rely on Ansible on a daily basis, and it’s actually fun to use.

    One of the simplest task is to issue single “ad-hoc” commands against a host or a set of hosts. This can be extremely useful, and often powerful. I often use it to restart a service (Tomcat) or check some state (running processes). Just to show an example, this command stops all Tomcat servers in the specified inventory of hosts:

    $ ansible -i dev tomcat-servers -s -m "service" -a "name=tomcat7 state=stopped"

    I once had a rogue Java process running on one of 20+ VMs that I had launched. At least, that’s what I suspected. That’s a lot of VMs to log into to investigate. So, instead I simply issued the following Ansible command:

    $ ansible -i dev all -m "shell" -a "ps aux | grep java"

    The output quickly told me where the process was running so I could stop it.

    A best practice suggestion is to define your entire infrastructure in a master site.yml Ansible playbook. This is not only useful for deploying everything in a new environment, but this general-purpose playbook can also be used to perform very specific actions using the powerful concepts of tags and host limits.2 A command like this could configure and deploy all Tomcat applications:

    $ ansible-playbook -i prod site.yml --limit tomcat-servers

    Combining host limits with tags can narrow down the tasks executed even further, for example to only re-deploy the Tomcat applications (without necessarily configuring everything). That could look something like this:

    $ ansible-playbook -i prod site.yml --limit tomcat-servers -t "tomcat-war"

    These simple examples show how you can organize your Ansible playbooks to be able to configure an entirely new environment, and deploy your entire infrastructure, but also perform cross-cuts to execute very specific tasks, against a limited set of hosts.

    Software will always need to be deployed on some hosts that need to be configured, and other infrastructure (for example, databases) needs to be put in place. Ansible is a great and elegant tool for all these tasks, so I expect to be using Ansible for a long time.3


    Docker is a virtualization project and set of tools that leverage capabilities of the Linux kernel to run processes in independent containers. Containers can roughly be seen as a lightweight virtual machines, providing process isolation (from the host machine) and resource controls (memory, CPU, I/O, network).

    Process isolation and sandboxing is certainly a core feature of containers, but from a practical day-to-day perspective, far more important (to me) are two tangible benefits of containerizing your applications:

    1. Dependency management
    2. Application interface

    By dependency management I mean in the same was you would use Python virtual environments to manage and separate dependencies for different applications. Often a developer will have all application dependencies correctly configured in their environment, but will forget to document or update the setup scripts to install the dependencies when installed in a fresh environment. This has the obvious downsides of breaking deployments and delaying integration. If applications are shared, tested and delivered as containers, these issues can often be avoided.

    Providing a consistent application interface to start and stop applications can greatly simplify deployments, and application lifecycle mangement (e.g. restarts). I’ve been responsible for complex deployments of service-oriented applications where different components (services) are written in completely different languages and frameworks. As such, each component has a different process for managing its lifecycle: Grails applications running on Tomcat, Python web services using WSGI, Java command-line interfaces, Python scripts, shell scripts etc. No single application interface is particularly difficult by itself, but taken all together it gets somewhat unwieldy. How do you start and stop 8 diffrent Java CLI programs running on different VMs? How do you shutdown only a subset of the processes on a subset of VMs? How do you easily restart two Python web services and a Java process that are coordinating their activities?

    With containers and Docker, the launching of an application is described in the relevant Dockerfile. Once the Docker images are created (and published on Docker Hub), the lifecycle management of the applications is identical regardless of the underlying application or service implementation details (including choice of programming languages or frameworks).

    Now everything is a container that can be started, restarted and stopped. The simplicity of that common application interface is a huge benefit. And how do you manage the containeres across your deployment infrastructure? With Ansible of course.


    Redis is a multi-purpose in-memory data structure store. Data can be persisted to disk and Redis can be used as a database, but I’ve never used it like that. I’ve used it as a key-value cache, and as a message broker. Redis is great as a simple cache, but one of the benefits of Redis is that it is general-purpose, allowing the same technology to be used in different scenarios. If Redis can solve multiple needs for a project, then I can reduce the number of technologies that I incorporate into the overall software stack.

    As I mentioned, I’ve used Redis as part of a message system, more specifically for task processing using queues. A queue is simply an ordered list with first-in-first-out (FIFO) characteristics. Push tasks into one side of the queue, and take tasks out from the other side. JSON makes it easy to serialize and store complex structured task descriptions in the queue.

    There are also other uses for Redis that would be interesting to explore in the future, including sorted sets for statistics applications. For example, consider the following commands to track the occurrence of terms (in some data source):> ZINCRBY myzset 1 "my-word"
    "1"> ZRANGE myzset 0 -1 WITHSCORES
    1) "my-word"
    2) "1"> ZINCRBY myzset 2 "my-word"
    "3"> ZRANGE myzset 0 -1 WITHSCORES
    1) "my-word"
    2) "3"

    These statistics can be stored ephemerally in Redis, and then queried and later stored in an OLAP system. I’m sure there are other interesting situations where Redis would be a good fit.


    I’ve created VMs for various needs for a while, but this year I really started to make heavy use of cloud infrastructures, mainly centered around OpenStack. I don’t know the in and outs of OpenStack, but I’ve become familiar with the OpenStack Horizon Dashboard interface, which is serviceable. I’m a user of cloud infrastructure, but it’s still useful to know some of the basics of how OpenStack works, and in particular how to use the compute capabilities provided by its Nova component. For instance, there is a command-line interface to Nova for managing compute instances (VMs) that I’ve found useful when setting up infrastructures.

    As a cloud user, the important part is how to manage and organize a large set of resources, and for that Ansible has been the key for me. Next I want to explore the specifics of Amazon’s EC2, but again, the details are not so important because I will be using Ansible there too.


    I’ve played with Python for many years, and I really like it. To me, it’s an elegant language and it’s easy to get things done using it. There is no shortage of 3rd party modules (libraries) and frameworks, and documentation is usually pretty good. The sooner you move to Python 3 the better probably (because there will be no Python 2.8).

    Of course make sure you use virtual environments, to keep your dependencies organized. In fact, that’s how I install the Ansible CLI, and the OpenStack CLI set of tools like the Nova client.

    What I’ve used Python for a lot lately is writing simple system management and monitoring scripts. For example to ensure that all deployed web services are running and responding to simple HTTP requests. Another example is to ensure that task queues in Redis are not growing faster than they can be processed for an extended period of time. Yet another example is a script to organize component-specific documentation files before building a system-wide documentation website. Python makes it easy to write well organized and maintainable code.4 Furthermore, the script dependencies are documented in requirements files so that they can easily be installed in a new virtual environment, or in a Docker image, which is of course how the Python scripts are deployed.

    This brings things around: I’ve really enjoyed writing containerized Python 3 scripts for monitoring processing queue stored in Redis (and other tasks) that have been deployed on an OpenStack cloud using Ansible.


    Git is of course great for source code management and versioning. I wanted to mention Jekyll again and what a great choice it is for information-centric websites. Since writing Deploying Jekyll with Git, I’ve used Jekyll for organizing the documentation for a large-scale application, which has worked out really well. Webpages in Jekyll are written in Markdown, which is how I also write this blog. Writing in Markdown keeps your content in a simple non-proprietary format, while making it easy to publish to the Web. Mac OS X is an excellent platform for developers. It’s a modern operating system with great application support, while also providing access to all the greatest Linux-based tools from the command-line, often installable via Homebrew. I write a lot of code in TextMate 2, use LaunchBar 6 for clipboard management, and TextExpander 5 for improved typing efficiency.

    1. Some of these tasks include inventories and hosts, playbooks and tasks, variables, roles and modules.

    2. Of course the site.yml playbook is only a set of includes of more specific playbooks, which can also be run separately.

    3. There are of course other orchestration platforms besides Ansible, but I have not found them as approchable or elegant.

    4. Although of course it requires some diligence on the part of the programmer.

View older posts