Docker containers have reshaped the way people think about developing, deploying, and maintaining software. Drawing on the native isolation capabilities of modern operating systems, containers support VM-like separation of concerns, but with far less overhead and far greater flexibility of deployment than hypervisor-based virtual machines.
Containers are so lightweight and flexible, they have given rise to new application architectures. The new approach is to package the different services that constitute an application into separate containers, and to deploy those containers across a cluster of physical or virtual machines. This gives rise to the need for container orchestration—a tool that automates the deployment, management, scaling, networking, and availability of container-based applications.
Enter Kubernetes. This open source project spun out of Google automates the process of deploying and managing multi-container applications at scale. While Kubernetes works mainly with Docker, it can also work with any container system that conforms to the Open Container Initiative (OCI) standards for container image formats and runtimes. And because Kubernetes is open source, with relatively few restrictions on how it can be used, it can be used freely by anyone who wants to run containers, most anywhere they want to run them.
Kubernetes vs. Docker
Kubernetes doesn’t replace Docker, but augments it. However, Kubernetes does replace some of the higher-level technologies that have emerged around Docker.
One such technology is Docker Swarm, an orchestrator bundled with Docker. It’s still possible to use Swarm instead of Kubernetes, but Docker Inc. has chosen to make Kubernetes part of the Docker Community and Docker Enterprise editions going forward.
Not that Kubernetes is a drop-in replacement for Swarm. Kubernetes is significantly more complex than Swarm, and requires more work to deploy. But again, the work is intended to provide a big payoff in the long run—a more manageable, resilient application infrastructure. For development work, and smaller container clusters, Docker Swarm presents a simpler choice.
Related video: What is Kubernetes?
In this 90-second video, learn about Kubernetes, the open-source system for automating containerized applications, from one of the technology’s inventors, Joe Beda, founder and CTO at Heptio.
Kubernetes container management
High-level languages, like Python or C#, provide the user with abstractions and libraries so they can focus on accomplishing the tasks at hand, instead of getting mired down in details of memory management.
Kubernetes works the same way with container orchestration. It provides high-level abstractions for managing groups of containers that allow Kubernetes users to focus on how they want applications to run, rather than worrying about specific implementation details. The behaviors they need are decoupled from the components that provide them.
Kubernetes is designed to automate and simplify a number of container management tasks. Some of its key functions:
Kubernetes deploys multi-container applications
Many applications don’t live in just one container. They’re built out of a bundle of containers—a database here, a web front end there, perhaps a caching server. Microservices are constructed in this fashion as well, typically drawing on separate databases for each service and web protocols and APIs to tie services together. Although there are long-term advantages to building apps as microservices, it comes with a lot of near-term heavy lifting.
Kubernetes reduces the amount of work needed to implement such applications. You tell Kubernetes how to compose an app out of a set of containers, and Kubernetes handles the nitty-gritty of rolling them out, keeping them running, and keeping the components in sync with each other.
Kubernetes scales containerized apps
Apps need to be able to ramp up and down to suit demand, to balance incoming load, and make better use of physical resources. Kubernetes has provisions for doing all these things, and for doing them in an automated, hands-off way.
Kubernetes rolls out new versions of apps without downtime
Part of the appeal of a container-based application development workflow is to enable continuous integration and delivery. Kubernetes has mechanisms for allowing graceful updates to new versions of container images, including rollbacks if something goes awry.
Kubernetes provides networking, service discovery, and storage
Kubernetes handles many other fiddly details of container-based apps. Getting containers to talk to each other, handling service discovery, and providing persistent storage to containers from various providers (e.g., Amazon’s EBS) are all handled through Kubernetes and its APIs.
Kubernetes provides monitoring, logging, and debugging
Containers have been notoriously opaque to work with. Kubernetes provides services for obtaining live information about the states of containers and container groups, and details about developer actions within a cluster.
Kubernetes allows for non-destructive customizations
Kubernetes clusters may need tweaking for their target environments, but some tweaks make future upgrades hard. One of Kubernetes’s native features, Custom Resource Definitions, allows a developer to extend and customize Kubernetes’s APIs without breaking compatibility with stock Kubernetes.
Kubernetes operates in any environment
Kubernetes isn’t tied to a specific cloud environment or technology. It can run wherever there is support for containers, which means public clouds, private stacks, virtual and physical hardware, and a single developer’s laptop are all places for Kubernetes to play. Kubernetes clusters can also run on any mix of the above. This even includes mixes of Windows and Linux systems.
How Kubernetes works
Kubernetes’s architecture makes use of various concepts and abstractions. Some of these are variations on existing, familiar notions, but others are specific to Kubernetes.
Kubernetes clusters
The highest-level Kubernetes abstraction, the cluster, refers to the group of machines running Kubernetes (itself a clustered application) and the containers managed by it. A Kubernetes cluster must have a master, the system that commands and controls all the other Kubernetes machines in the cluster. A highly available Kubernetes cluster replicates the master’s facilities across multiple machines. But only one master at a time runs the job scheduler and controller-manager.
Kubernetes nodes and pods
Each cluster contains Kubernetes nodes. Nodes might be physical machines or VMs. Again, the idea is abstraction: whatever the app is running on, Kubernetes handles deployment on that substrate. It is also possible to ensure that certain containers run only on VMs or only on bare metal.
Nodes run pods, the most basic Kubernetes objects that can be created or managed. Each pod represents a single instance of an application or running process in Kubernetes, and consists of one or more containers. Kubernetes starts, stops, and replicates all containers in a pod as a group. Pods keep the user’s attention on the application, rather than on the containers themselves. Details about how Kubernetes needs to be configured, from the state of pods on up, is kept in Etcd, a distributed key-value store.
Pods are created and destroyed on nodes as needed to conform to the desired state specified by the user in the pod definition. Kubernetes provides an abstraction called a controller for dealing with the logistics of how pods are spun up, rolled out, and spun down. Controllers come in a few different flavors depending on the kind of application being managed. For instance, the recently introduced “StatefulSet” controller is used to deal with applications that need persistent state. Another kind of controller, the deployment, is used to scale an app up or down, update an app to a new version, or roll back an app to a known-good version if there’s a problem.
Kubernetes services
Because pods live and die as needed, we need a different abstraction for dealing with the application lifecycle. An application is supposed to be a persistent entity, even when the pods running the containers that comprise the application aren’t themselves persistent. To that end, Kubernetes provides an abstraction called a service.
A service describes how a given group of pods (or other Kubernetes objects) can be accessed via the network. As the Kubernetes documentation puts it, the pods that constitute the back end of an application might change, but the front end shouldn’t have to know about that or track it. Services make this possible.
A few more pieces internal to Kubernetes round out the picture. The scheduler parcels out workloads to nodes so that they’re balanced across resources and so that deployments meet the requirements of the application definitions. The controller manager ensures the state of the system—applications, workloads, etc.—matches the desired state defined in Etcd’s configuration settings.
It’s important to keep in mind that none of the low-level mechanisms used by containers, like Docker itself, are replaced by Kubernetes. Rather, Kubernetes provides a larger set of abstractions for using them for the sake of keeping apps running at scale.
How Kubernetes eases containerized applications
Because Kubernetes introduces new abstractions and concepts, and because the learning curve for Kubernetes is high, it’s only normal to ask what the long-term payoffs are for using Kubernetes. Here’s a rundown of some of the specific ways running apps inside Kubernetes becomes easier.
Kubernetes manages app health, replication, load balancing, and hardware resource allocation for you
One of the most basic duties Kubernetes takes off your hands is the busywork of keeping an application up, running, and responsive to user demands. Apps that become “unhealthy,” or don’t conform to the definition of health you describe for them, can be automatically healed.
Another benefit Kubernetes provides is maximizing the use of hardware resources including memory, storage I/O, and network bandwidth. Applications can have soft and hard limits set on their resource usage. Many apps that use minimal resources can be packed together on the same hardware; apps that need to stretch out can be placed on systems where they have room to groove. And again, rolling out updates across a cluster, or rolling back if updates break, can be automated.
Kubernetes eases the deployment of preconfigured applications with Helm charts
Package managers such as Debian Linux’s APT and Python’s Pip save users the trouble of manually installing and configuring an application. This is especially handy when an application has multiple external dependencies.
Helm is something like a package manager for Kubernetes. Many popular software applications must run in Kubernetes as multiple, ganged-together containers. Helm provides a definition mechanism, a “chart,” that describes how a given piece of software can be run as a group of containers inside Kubernetes.
You can create your own Helm charts from scratch, and you might have to if you’re building a custom app to be deployed internally. But if you’re using a popular application that has a common deployment pattern, there is a good chance someone has already composed a Helm chart for it and published it by way of the Kubeapps.com directory.
Kubernetes simplifies management of storage, secrets, and other application-related resources
Containers are meant to be immutable; whatever you put into them isn’t supposed to change. But applications need state, meaning they need a reliable way to deal with external storage volumes. That’s made all the more complicated by the way containers live, die, and are reborn across the lifetime of an app.
Kubernetes has abstractions to allow containers and apps to deal with storage in the same decoupled way as other resources. Many common kinds of storage, from Amazon EBS volumes to plain old NFS shares, can be accessed via Kubernetes storage drivers, called volumes. Normally, volumes are bound to a specific pod, but a volume subtype called a “Persistent Volume” can be used for data that needs to live on independently of any pod.
Containers often need to work with “secrets”—credentials like API keys or service passwords that you don’t want hardwired in a container or stashed openly on a disk volume. While third-party solutions are available for this, like Docker secrets and HashiCorp Vault, Kubernetes has its own mechanism for natively handling secrets, although it does need to be configured with care. For instance, Etcd must be configured to use SSL/TLS when sending information including secrets between nodes, rather than in plaintext.
Kubernetes applications can run in hybrid and multi-cloud environments
One of the long-standing dreams of cloud computing is to be able to run any app in any cloud, or any mix of clouds public or private. This isn’t just to avoid vendor lock-in, but also to take advantage of features specific to individual clouds.
Kubernetes provides a set of primitives, collectively known as federation, for keeping multiple clusters in sync with each other across multiple regions and clouds. For instance, a given app deployment can be kept consistent between multiple clusters, and different clusters can share service discovery so that a back-end resource can be accessed from any cluster. Federation can also be used to create highly available or fault-tolerant Kubernetes deployments, whether or not you’re spanning multiple cloud environments.
Federation is still relatively new to Kubernetes. Not all API resources are supported across federated instances yet, and upgrades don’t yet have automatic testing infrastructure. But these shortcomings are slated to be addressed in future versions of Kubernetes.
Where to get Kubernetes
Kubernetes is available in such a wide variety of environments, and by so many delivery mechanisms, that the best way to figure out where to get it is by use case.
- If you want to do it all yourself: The source code, and pre-built binaries for most common platforms, can be downloaded from the GitHub repository for Kubernetes.
- If you’re using Docker Community or Docker Enterprise: Docker’s most recent editions come with Kubernetes as a pack-in. This is ostensibly the easiest way for container mavens to get a leg up with Kubernetes, since it comes by way of a product you’re almost certainly already familiar with.
- If you’re deploying on-prem or in a private cloud: Chances are good the infrastructure you’re using for your private cloud or for your on-premises systems has Kubernetes built in. For instance, CoreOS Tectonic pairs a container-centric Linux distribution with Kubernetes, providing an automated way to set up and deploy Kubernetes on one’s hardware of choice. Red Hat Enterprise Linux Atomic Host and Mirantis Cloud Platform, a distribution of OpenStack, also provide Kubernetes in a standard-issue, supported offering.
- If you’re deploying in a public cloud: The three major public cloud vendors now offer Kubernetes as a service. Google Cloud Platform offers Google Kubernetes Engine. Microsoft Azure offers the Azure Container Service, with Kubernetes as one of several possible choices of orchestrator. And Amazon recently announced support for Kubernetes in conjunction with its existing Elastic Container Service, now in preview and set to go live sometime in 2018.