Adopting a distributed key/value storageReading Time: 6 minutes
Organization of the information. This is one of the most important challenges in the current era where a massive amount of data is managed. Looking for a distributed key/value store for your system is something not easy.
In this article, we will define some desirable features for these kinds of systems and then we will address some of the options in the current market: TiKV, etcd and Consul. Engage!
Everything starts with the WHY (somebody said once). When you have to choose a technology, the use case must be everything for driving you on that journey. Based on the use case, some features are key and others are just not needed.
The effort investigating the use case, is the directly proportional to the amout time you will save looking for the technology that will help you to satisfy that use case. However, in general, you will look for a key/value store in scenarios like
- Distributed configuration store
- Feature flagging
Important features for adoption
Let’s base our analysis on this set of features that could be desirable, from my point of view, for any key/value store system.
|The project offers an official Kubernetes operator
|The project offers official clients to communicate with the server side
|Allows communication via TLS with the clients
|Capability to provide backup and restore mechanisms
|The storage system permits hierarchy on the keys
|Capability to detect changes, from the client side, on the values stored on the system
|The project provides a way to export metrics to be scraped by Prometheus
|Number of stars a project has on Github
|Amount of people working on a project from Github
As I said before, this is the set of features I find interesting for these systems from my experience. From here, let’s explore 3 options we have on the market today: TiKV, etcd, and Consul.
A distributed transactional key-value database, based on the design of Google Spanner and HBase, but simpler to manage and without dependencies on any distributed filesystem, and it has been graduated from the Cloud Native Computing Foundation in 2020. Let’s describe this project basing on the list of features mentioned in the previous chapter.
All projects under the CNCF should be easy to deploy in cloud-based environments. TiKV offers an official Kubernetes operator, with some basic examples on their repository, which makes the adoption easier for foreigners. I miss some more complex examples, like for TLS configuration. The HELM chart behind this operator comes with a Deployment, RBAC, Service Account, and HPA.
This project provides an official Java Client (which uses gRPC for communication), like many others, to manage the information within the TiKV system. The existence of production-ready implemented clients to communicate with a system is key for a fast adoption from the companies and partners.
TiKV comes with TLS feature OOTB and its configuration is explained here. However, this TLS configuration is not implemented on the Kubernetes Operator, which for me it’s a really important point nowadays that we all are going to cloud-native deployments.
From version 4.0, TiKV provides the capability to make backups. This is an important feature for any kind of storage system because they are meant to persist on the time, and not act as a cache system. Writing this article, I was not able to find much information about how to configure this feature, a part of this video from Jay Lee (PingCAP).
About the hierarchies, TiKV does not provide such a feature. All keys are in the same directory, which means the only way to do this is to encode keys in some way to simulate hierarchies.
The watch changes feature could be something really important for some use cases. So far, TiKV does not offer watch for changes mechanism. There is an open issue here though.
About observability, the TiKV monitoring framework adopts two open-source projects: Prometheus and Grafana. TiKV encourages the usage of those systems and provides documentation to set them up. As it happens with the TLS, the observability does not come OOTB on the Kubernetes operator.
The terms of community, the TiKV is kind of active on the Internet, it has 8.7K stars on Github, they publish blog posts on monthly basis (more or less), and their Twitter account posts often. Also, they provide a Slack channel for developers.
etcd is introduced as a distributed, reliable key-value store for the most critical data of a distributed system. The main features advertised on their web page are simplicity, hierarchical key-value storage, and watch for changes. etcd is also an important part of the Kubernetes project itself.
You can find several implementations on the client-side for connecting to etcd clusters. On the official web page, you can see client libraries for Java, Go, Python, or C/C++.
etcd supports TLS connectivity. This guarantees your cluster will have an extra security measure. as in many projects, configure TLS communication is no simple task. Here you have the official documentation, but also I would recommend this interesting article for Setting up Etcd Cluster with TLS Authentication Enabled.
About the backups on etcd system, after some research, it’s not clear to me they provide this functionality out of the box. I found some providers of these services that offer that possibility, but it looks like something implemented by those companies. It is true that etcd allows you to configure the data-dir on which the data will be stored, so we could put in place some mechanism for backing up that directory.
By definition, etcd is a hierarchal key-value store. The API provides the capability to store the information in different values. The primary API of etcd is a hierarchical key space. The key space consists of directories and keys which are generically referred to as “nodes”.
The capability of watching changes in a key-value is one of the important featured in etcd. The Watch API provides an event-based interface for asynchronously monitoring changes to keys.
Each etcd server provides local monitoring information on its client port through http endpoints. The monitoring data is useful for both system health checking and cluster debugging, as you can imagine, and this is critical for any project when we plan to move it to production. The metrics exposed are compatible with Prometheus, as you can see in the monitoring etc section of the official web page.
In terms of community, I must confess that I’m a bit surprised. Etcd is a keystone for projects like Kubernetes but I see that places like the official web page, their blog, and the Twitter account look a bit unattended. The opensource community is really important for this kind of projects and the lack of movement should raise an alarm at the time to consider its adoption on your stack.
Consul is a service mesh solution providing a full-featured control plane with service discovery, configuration, and segmentation functionality. Nowadays, HashiCorp does not promote the use case of key/value datastore as one of the primary use cases, but still, you can find it as “key features” on their official overview page.
I could not find an official Kubernetes operator for Consul. However, HashiCorp provides an official HELM Chart, which brings more confidence about the commitment from the company with the cloud use case, as well for you going to production with this technology.
In terms of libraries and SDKs, Consul offers a wide set of languages on which the communication with Consul API is already implemented and ready to use. It’s important to notice that not all of them are officially supported and some of them rely on the effort of individual contributors. These are the cases when you have to take a look carefully, to understand which is the good SDK to use in your system, thinking always on the support when the problems come (because they will come).
TLS communication for Consul cluster is fully supported. HashiCorp has very good documentation about it, in my opinion. In most of your use cases, going production will imply using TLS on your communications. With good documentation, this step is easier for everybody.
Consul supports backup via what they call snapshot feature. The KV store can be accessed by the consul kv CLI subcommands, HTTP API, and Consul UI. The data store itself is located on the Consul servers in the data directory. To ensure data is not lost in the event of a complete outage, use the
consul snapshot feature to backup the data.
In Consul, the capability to store the data in a hierarchy is possible. In a very natural way, that could be seen as directory exploration, you can store and read the information at different levels.
Watches are a way of specifying a view of data (e.g. list of nodes, KV pairs, health checks) which is monitored for updates. Watches can be configured as part of the agent’s configuration, causing them to run once the agent is initialized. Reloading the agent configuration allows for adding or removing watches dynamically. Alternatively, the watch command enables a watch to be started outside of the agent. This can be used by an operator to inspect data in Consul or to easily pipe data into processes without being tied to the agent lifecycle.
Consul is around since a while and with an strong option in the open-source side. As we mention several times, having an strong community is crucial for those projects on which the paid option is not considered.
After all these analyses based on features, the conclusion is what you expected: it depends on your use case. However, we can take some high-level conclusions. TiKV looks more database-oriented than the rest of the systems analyzed in this article, and only etcd and Consul support watch for changes in the key/values.
My recommendations for you would be: base your selection on the important features you really need to satisfy for your use case, don’t think about implementing the missing features on your own, and ensure the support, of the technology you choose, is covered.