Enter, The NATS

Reading Time: 11 minutes

Deliver a message from point A to point B as fast as possible. This is a requirement for humankind for a long time ago. History says there are records from 900 B.C. about Royal Post system in China to practice communication by written documents carried by an intermediary from one person or place to another. As you can imagine, since then, the way to deliver a message has been changed a lot not only the envelope but also the means of transmission.

Nowadays, in the era of communication between machines, there is still the need of delivering messages, and as fast as possible too. In some cases, you need speed, other cases you prefer to be sure about the reception of the message delivered besides the speed, and you may have many other needs. The systems to deliver messages were called just queue systems, just a couple of decades ago, but today they are frequently called message brokers.

There are several message brokers in the market in the current time. In this post, you will read about NATS. Engage!

nats.io logo

NATS.io is a simple, secure and high-performance open source messaging system for cloud-native applications, IoT messaging, and microservices architectures.

nats.io

That is the definition for NATS from the official web page (nats.io) but it is not the definition that I would choose. I would say that NATS is a simple and lightweight message broker.

Giving more context about this message broker. NATS is an open-source project licensed under Apache 2.0, is written in Go and it is available as a client for approximately a dozen languages, many of which are supported by the Synadia team. Here’s the full list of NATS clients. NATS borns in 2011 and its last stable version, 2.1.7 at the time of publishing this post, was released on May 14th, 2020.

NATS as an incubator project in the Cloud Native and Computing Foundation

Something that during last years is becoming important for a system it’s being cloud-native. This post will not go into this concept, but let’s spend some time to explain what is the Cloud Native and Computing Foundation since NATS is an incubator project in this foundation.

Cloud Native Computing Foundation logo. License.

The fact that a certain project is under this foundation, could give you some guarantees that this technology is not something that will just be dropped or not maintained at some point. To keep a project in the CNCF requires a lot of effort from many maintainers around the world which are available to poke when you face problems in production.

Follow the white rabbit

As mentioned before, NATS is one of the message brokers that are nowadays in the market and here the main (and difficult) question to be answered comes: Which message broker should I use for my project?. The answer is, as many times happen in computer science, depends. In this case, the answer could be more accurate if we put on the table your use case. You have to follow the white rabbit; your use case. The use case that you are handling will determine the message broker to be used. Because we are discussing NATS, let’s read some of the use cases were NATS makes sense:

  • Send messages that must be read by one single receptor.
  • You will move many small messages across the system.
  • Fire and forget, meaning that the message is introduced in the queue and “somebody” will take it at some point, not caring at all if, in fact, it was read by any service.
  • There is no need for message persistence.
  • No need for real-time delivery.

Sparring program

Enough talking!, let’s put our hands over the keyboard and play a bit with this.

In this section, you will find the follow-up and commands to run the NATS Server with Docker and Kubernetes. Be able to run NATS directly in Docker would make sense to know when you need some “raw” set up, for example, during the implementation and testing phase of your project. Running via Kubernetes is something like a must on these days, even more, when we talk about a cloud-native technology, and definitely for going to a more sophisticated testing environment for your project or when you go into production.

The Dojo Sparring Program
Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. By PhanthomZtryker 

Let’s start with something simple, run a single Docker container for NATS.

$ docker run -d --name nats-main -p 4222:4222 -p 6222:6222 -p 8222:8222 nats

Immediately, the container is created and ready to listen for clients to connect on port 4222:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f9e55d4bc74e nats "/nats-server --conf…" 5 seconds ago Up 3 seconds 0.0.0.0:4222->4222/tcp, 0.0.0.0:6222->6222/tcp, 0.0.0.0:8222->8222/tcp nats-main

Some usage statistics

$ docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
f9e55d4bc74e nats-main 0.06% 3.422MiB / 11.4GiB 0.03% 10.7kB / 0B 4.1kB / 0B 10

Just these commands will help you a lot to speed up your local development or just to rapidly test this technology. From here, and in many cases it could be really helpful for some tests, you can create a docker-compose file very easily. However, when you go to staging or production environments, deploy single containers could bring some complexity when you try to connect the rest of the pieces around NATS and also to handle NATS itself as a service. Now Kubernetes comes to the light.

In the official documentation, there is a lot of information about how to deploy NATS via HELM charts or using the Kubernetes Operator. In order to get more visibility about how to deploy NATS on Kubernetes and know exactly what we are deploying, you will find in this article a more manual way. Let’s deploy resource by resource. The goal: to have 3 pods running under a service exposed.

We will manually deploy 3 kind of Kubernetes resources, based on the official simple nats:

  • Configmap, with small configuration needed
  • StatefullSet, to define the pod/s
  • Service, with all the ports exposed

In the official Kubernetes resources, NATS uses StatefulSet instead of Deployment kind to define the pods. See this simple Stackoverflow explanation about the difference between both.

configmap.yaml. It’s important to notice that the NATS configuration resides on nats.conf file, like any other conventional application, it’s mounted in a volume by all the pods deployed and used to run the nats-server binary. In this configmap.yaml we can see that is overwriting the default values for the properties pid_file and http (port for monitoring purposes). Also, in the routes section you can see something like nats://nats-0.nats.default.svc:6222 where default is the name of the Kubernetes namespace (see the statefulset below for reference). This routes list could be replaced by something more sophisticated, like an environment variable in which value could be injected from outside. We could think “What happens if I have to scale-up the number of NATS pods?, do I have to edit this configmap as well?“. It’s useful to have the initial list of NATS nodes from the beginning, to avoid issues in the initial connection. From there, scale-up/down is not an issue and NATS is ready for it.

apiVersion: v1
kind: ConfigMap
metadata:
  name: nats-config
data:
  nats.conf: |
    pid_file: "/var/run/nats/nats.pid"
    http: 8222
    cluster {
      port: 6222
      routes [
        nats://nats-0.nats.default.svc:6222
        nats://nats-1.nats.default.svc:6222
        nats://nats-2.nats.default.svc:6222
      ]
      cluster_advertise: $CLUSTER_ADVERTISE
      connect_retries: 30
    }

service.yaml. Here we can see that the ports are defined and the service type is ClusterIP, which means that the service is exposed on a cluster-internal IP and reachable from within the cluster only. This approach makes sense in a production environment because most of the time NATS will not be exposed to the outside directly but will be routed by some kind of exposed gateway that would provide, for example, authentication. Important to mention in this Kubernetes service the port named metrics (7777), which is the one to be used to expose the NATS metrics to be consumed, for example, by Prometheus via the prometheus-nats-exporter.

Last but least, in this Statefulset (next section) comes with a sidecar container (and the annotations required), which is needed to export the NATS metrics in Prometheus format for your observability setup. Some sugar that will make your life easier in production.

apiVersion: v1
kind: Service
metadata:
  name: nats
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: "7777"
    prometheus.io/scrape: "true"
  labels:
    app: nats
spec:
  selector:
    app: nats
  clusterIP: None
  ports:
  - name: client
    port: 4222
  - name: cluster
    port: 6222
  - name: monitor
    port: 8222
  - name: metrics
    port: 7777
  - name: leafnodes
    port: 7422
  - name: gateways
    port: 7522

statefulset.yaml. One of the important things about this resource is that the container ports are fixed in the YAML file. Why this is important?, because in the Statefulset there are 3 replicas (3 pods) defined and it is not possible to have more than 1 pod, using the same set of ports opened, in the same node. That’s why usually the container port is not fixed in deployment, to let the cluster determine the port to open for the internal communication and the distribution of the pods within the nodes. On production, you will have more than 1 node. Actually, you will have more than 3 nodes so you will not face the API error that you could see if you deploy this on minikube.

Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.

It takes around 20 seconds for this deployment to spin up. To ramp up the whole deployment maybe it’s not too much, however, I would expect less time, not because the docker image is only ~10MB (alpine-based) but because this is a cloud-native application. Nevertheless, I’d say the important time for spinning up is during the scaling process.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nats
  labels:
    app: nats
spec:
  selector:
    matchLabels:
      app: nats
  replicas: 3
  serviceName: "nats"
  template:
    metadata:
      labels:
        app: nats
    spec:
      # Common volumes for the containers
      volumes:
      - name: config-volume
        configMap:
          name: nats-config
      - name: pid
        emptyDir: {}

      # Required to be able to HUP signal and apply config reload
      # to the server without restarting the pod.
      shareProcessNamespace: true

      #################
      #               #
      #  NATS Server  #
      #               #
      #################
      terminationGracePeriodSeconds: 60
      containers:
      - name: nats
        image: nats:2.1.7-alpine3.11
        ports:
        - containerPort: 4222
          name: client
          hostPort: 4222
        - containerPort: 7422
          name: leafnodes
          hostPort: 7422
        - containerPort: 6222
          name: cluster
        - containerPort: 8222
          name: monitor
        - containerPort: 7777
          name: metrics
        command:
         - "nats-server"
         - "--config"
         - "/etc/nats-config/nats.conf"

        # Required to be able to define an environment variable
        # that refers to other environment variables.  This env var
        # is later used as part of the configuration file.
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: CLUSTER_ADVERTISE
          value: $(POD_NAME).nats.$(POD_NAMESPACE).svc
        volumeMounts:
          - name: config-volume
            mountPath: /etc/nats-config
          - name: pid
            mountPath: /var/run/nats

        # Liveness/Readiness probes against the monitoring
        #
        livenessProbe:
          httpGet:
            path: /
            port: 8222
          initialDelaySeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /
            port: 8222
          initialDelaySeconds: 10
          timeoutSeconds: 5

        # Gracefully stop NATS Server on pod deletion or image upgrade.
        #
        lifecycle:
          preStop:
            exec:
              # Using the alpine based NATS image, we add an extra sleep that is
              # the same amount as the terminationGracePeriodSeconds to allow
              # the NATS Server to gracefully terminate the client connections.
              #
              command: ["/bin/sh", "-c", "/nats-server -sl=ldm=/var/run/nats/nats.pid && /bin/sleep 60"]

      ##############################
      #                            #
      #  NATS Prometheus Exporter  #
      #                            #
      ##############################
      - name: metrics
        image: synadia/prometheus-nats-exporter:0.6.2
        args:
        - -connz
        - -routez
        - -subz
        - -varz
        - -prefix=nats
        - -use_internal_server_id
        - -DV
        - http://localhost:8222/
        ports:
        - containerPort: 7777
          name: metrics

To deploy the NATS on Kubernetes cluster just run as follows:

$ kubectl apply -f configmap.yaml
$ kubectl apply -f statefulset.yaml
$ kubectl apply -f service.yaml

To finish with this chapter is necessary to remark that, for deploying any service in Kubernetes, it’s important to use the official artifact delivered by the entity that it’s behind the technology. Of course, withing each company could have different needs that drive you to tune up these official resources, that’s why in this article we’ve inspected the Kubernetes resources.

Some links:

Bullet time

As has been mentioned in previous sections of this article, going to production means deploy NATS in a Kubernetes cluster in most of the cases. Of course, there are other options not related to the containers world, but here we will stand by Kubernetes and NATS Server.

Cameras ready to record bullet time effect
An Attempt At Bullet Time

In a production environment is recommended to deploy, at least, 3 instances (pods) of the NATS Server to have high availability. Also, from some conversations in the Slack channel, it’s recommended to establish 2 CPU per NATS Pod, via resources.requests property. By default, the HELM chart does not come with affinity or anti-affinity policies, which means that to improve the high availability and having each of the pods in separate Kubernetes nodes, you will have to configure the values.yaml or the statefulset.yaml files to achieve this. The affinity and anti-affinity rules are something important that you will have to think about, not only for NATS but for any other production system you deploy in the Kubernetes world.

Something important to notice as well, when we go into production, is that the HELM chart does not come with Horizontal Pod Autoscaler (HPA). At the time of writing this article, no official documentation was found that explains why HPA Kubernetes resource is not provided officially in the HELM chart. However, you can write your own HPA if it’s needed.

Observability is the last (or the first) stone for putting NATS into production. People from NATS are aware of this and both Prometheus exporter and Grafana dashboards (see also here and here) are provided by the community. The Prometheus exporter comes in a container sidecar, as was mentioned in the previous chapter. Prometheus plus Grafana, both part of the CNCF too, are the standard de-facto these days, so makes sense that NATS and the community put efforts in this direction. A probe of this is that NATS invested time to create a webinar to monitor NATS Prometheus and Grafana.

Will I be able to fly?

Performance is something to really take care of. The NATS community has published the nats-bech to measure the performance of your NATS setup. With this tool, you will have a baseline performance that would help you to know when you have to scale up/down.

In addition to the previous tool, the NATS community puts in place another tool to measure another very important parameter for any message-broker or distributed system: latency. Using the latency-test available in Github, you will have a good estimation about the msgs/sec, bandwidth/sec, the minimum, median, and maximum latencies that your NATS setup can perform for your use case.

Something that you should keep in mind using NATS as a message broker to deliver your messages is that NATS performs better with many small messages per unit time instead of big messages and less amount of them. From my experience with NATS, you will have to put the effort into the services that will write and read the information on the NATS queues. Getting a balance between the size of the messages and the number of messages will bring you better performance on your system as a whole. The NATS will perform really well with these few tips.

The last part of this section is dedicated to some numbers of NATS deployment working in production for a recent use case that I’m aware of. This use case involves 3 NATS Pods with 2 queues. Below, you will find Table 1 where some information about the messages sent in the NATS system. Table 2 displays information about the CPU and Memory consumption from the Statefulset point of view. Finally, Table 3 shows some results obtained from the usage of the NATS Server Grafana Dashboard.

AVG Number Messages InAVG Message SizeMax Message SizeMin Message Size
2497.6 KB280 KB0.139 KB
Table 1: Messages information

The AVG Number Message In is calculated using Prometheus query rate(nats_varz_in_msgs[10m]).

CPUMemory
0.18 Cores1.3 GB
Table 2: NATS Statefulset Consumption
AVG Server CPU UsageAVG Memory Usage
6%200 MB
Table 3: NATS CPU and Memory Consumption

Let It All Go, Free Your Mind

NATS is an open-source project, and that means there is an open-source community supporting and collaborating, on which you could collaborate.

The principal places where the community is around are Github and the Slack channel. In the Github, you will find all the projects around NATS technology, from the nats-server to the nats-streaming-server passing through the official HELM charts for NATS. The Slack channel is a great place to share concerns, ask for help, and advice. People like Waldemar Quevedo (@wallyqs) or Derek Collison (@derekcollison) are really friendly and always willing to help or provide guidelines. Even in the official NATS Twitter account, you will be able to find help, use cases, and announces (really important to get notice of upgrades or bug fixes).

It is not new that a strong community, with a clear set of people of reference that is playing the role of “user satisfaction”, is one of the keystones of any open-source project. Any technology could be difficult to adopt, but with an open-minded and strong community, the project will have more possibilities to be spread and succeed.

To be continued

In this post you had read a high-level overview of the NATS, the cloud-native message broker, addressing the importance of being part of the Cloud Native and Computing Foundation (CNCF), mentioning the main uses cases where NATS makes sense, doing some hands-on to set up and run NATS and providing advice for going to production.

Definitely, NATS is a technology that must be considered in the analysis and selection of a message broker for your project and always taking into account your use case.

There are many other technologies that are creating bounds with NATS, and that is something that will be addressed in future articles on this web page.

Leave a Reply

Your email address will not be published. Required fields are marked *