How To Provision a Kubernetes Cluster Using CoreOS

Introduction

Kubernetes, often abreviated as k8s, is a system designed to manage applications built within containers across a cluster of nodes. It handles the entire life cycle of a containerized application including deployment and scaling. Kubernetes was designed within Google and released as open source.

With AWS, Red Hat, Microsoft, IBM, Mirantis OpenStack, and VMware (and the list keeps growing) working to integrate Kubernetes into their platforms, going through this tutorial will provide you with a working cluster on Digital Ocean and a strong foundation as well as fundamental understanding of a framework that is here to stay.

In this tutorial, we will give step-by-step instructions on how to create a single-controller/multi-worker Kubernetes cluster on CoreOS hosted by Digital Ocean. This system will allow us to group related services together for deployment as a unit on a single host using what Kubernetes calls "Pods". Kubernetes also provides health checking functionality, high availability, and efficient usage of resources through schedulers.

This tutorial was tested with Kubernetes v1.1.2. Keep in mind that this software changes frequently.

Prerequisites and goals

We will provision each component of our Kubernetes cluster as part of this tutorial, no existing architecture is required. Experience with Docker is expected, experience with Systemd and CoreOS is a plus, but each concept is introduced and explained as part of this tutorial. If you are not familiar with CoreOS, it may be helpful to review some basic information about the CoreOS system.

After a high-level overview of the Kubernetes Architecture, we will configure our client machine to work with our Digital Ocean resources from the terminal using Doctl & Jq. Once this is done we will be able to quickly and repeatedly provision our droplets with cloud-config files. This allows us to declaratively customize network configuration, Systemd units, and other OS-level items. We will also ensure a Public Key Infrastructure (PKI) is available by going through the instructions to set up a local Certificate Authority.

Will then first provision an Etcd cluster to reliably provide storage of meta data across a cluster of machines. Etcd provides a great way to store configuration data reliably for Kubernetes. Thanks to the watch support provided by Etcd, coordinating components can be notified very quickly of changes. This component is crucial to our Kubernetes cluster.

With the help of our Etcd cluster, we will also configure Flannel, a network fabric layer that provides each machine with an individual subnet for container communication. This satisfies a fundamental requirement for running a Kubernetes cluster. Docker will be configured to use this networking layer for its containers.

We will provision our Kubernetes controller Droplet and to ensure the security of our Kubernetes cluster, we will generate the required certificates for communication between Kubernetes components using openssl & securely transfer these using scp to each Droplet. We will configure the command line client utility, kubectl, to work with our cluster from our client next.

Finally, we will provision worker nodes pointing to the controller nodes and deploy the internal cluster DNS through the DNS add-on. We will have a fully functional Kubernetes cluster allowing us to deploy our workloads and easily add worker nodes as required with the cloud-config files created through this tutorial.

Working through this tutorial may take you a few hours, but it will give you a good understanding of the moving pieces of your cluster and set you up for success in the long run.

The structure and idea for this tutorial was taken from the Getting started with CoreOS and Kubernetes Guide and updated with detailed step by step instructions for Digital Ocean. Let's get started.

Kubernetes Architectural Overview

In this section we will give an overview of the Kubernetes Architecture. For a more detailed look, refer to the official Kubernetes documentation.

At a high level, we need to differentiate the services that run on every node, referred to as node agents (kubelet, ... ), the controller services (APIs, scheduler, ...) that compose the cluster-level control plane and the distributed storage solution (Etcd).

A crucial component which runs on every node is the kubelet. The kubelet is responsible for what's running on each individual Droplet and making sure it keeps running. The kubelet controls the container runtime, in this tutorial Docker provides the container runtime and must also run on each node. Docker takes care of the details of downloading images and running containers. The kubelet registers nodes with the cluster, sends events and status updates and reports the resource utilization of the node.

To facilitate routing between containers as well as simplify service discovery, each node also runs the kube-proxy. The proxy is a simple network proxy and load balancer which can do simple TCP and UDP stream forwarding (round robin) across a set of back ends. The proxy is a crucial part for the Kubernetes services model. The proxy communicates with the controller services to keep up to date. See the Kubernetes' services FAQ for more details.

Worker node services are configured to be managed from the controller services.

The first controller service we will highlight is the API server. The API server serves up the Kubernetes API through a REST interface. It is intended to be a CRUD-y service, with most/all business logic implemented in separate components or in plug-ins. It is responsible for validating the requests and updating the corresponding objects in Etcd. The API server is void of state and will be the main component replicated and load balanced across controller nodes in a High Availability configuration.

The second controller service to highlight is the scheduler. The scheduler is responsible for assigning workloads to nodes in the cluster. This component watches the API Server and uses the binding API to apply its scheduling decisions. The scheduler is pluggable and support for multiple cluster schedulers and even user-provided schedulers is expected, but not available yet in version 1.1.2.

All other cluster-level functions are performed by the controller manager component at the time of writing. This component embeds the core control loops shipped with Kubernetes. Each controller is an active manager that watches the shared state of the cluster through the API Server and makes changes attempting to move the observed state towards the desired state. These controllers may eventually be split into separate components in future Kubernetes versions to make them independently pluggable.

As the scheduler and controller manager components modify cluster state, only one instance of each can run within the cluster. In High Availability configurations a process of master election is required for these components. We will explain and apply master election for these components as part of this tutorial, we will however only provision 1 controller node and no control plane load balancer. Setting up the control plane load balancers and appropriate TLS artifacts are left as an exercise for the reader.

Below is a high level diagram of these Kubernetes components in a High Availability set-up.

 Etcd                             Controller Nodes                                           Worker Nodes
                                    +--------------------+                                     +--------------------+
                                    |                    |                                     |                    |
+--------------------+          +---+ API Server         <---------+  +------------------------+ Kubelet            |
|                    |          |   |                    |         |  |                        |                    |
| Etcd  cluster      <----------+   | Controller Manager*|         |  |                        | Docker             |
|                    |              |                    |         |  |                        |                    |
|                    |              | Scheduler*         |         |  |                        | Proxy              |
|                    |              |                    |         |  |                        |                    |
|                    |              | Kubelet            |         |  |                        |                    |
|                    |              |                    |         |  |                        |                    |
|                    |              | Docker             |         |  |                        |                    |
|                    |              |                    |       +-+--v---------------+        |                    |
|                    |              | Proxy              |       |                    |        |                    |
|                    |              |                    |       | Control Plane      |        |                    |
|                    |              +--------------------+       |                    |        +--------------------+
|                    |                                           | Load Balancer      |
+-^--^---------------+              +--------------------+       |                    |        +--------------------+
  |  |                              |                    |       |                    |        |                    |
  |  +------------------------------+ API Server         <-------+                    <--------+ Kubelet            |
  |                                 |                    |       |                    |        |                    |
  |                                 | Kubelet            |       |                    |        | Docker             |
  |                                 |                    |       |                    |        |                    |
  |                                 | Docker             |       |                    |        | Proxy              |
  |                                 |                    |       |                    |        |                    |
  |                                 | Proxy              |       |                    |        |                    |
  |                                 |                    |       +-+--^---------------+        |                    |
  |                                 |                    |         |  |                        |                    |
  |                                 |                    |         |  |                        |                    |
  |                                 |                    |         |  |                        |                    |
  |                                 |                    |         |  |                        |                    |
  |                                 +--------------------+         |  |                        +--------------------+
  |                                                                |  |
  |                                 +--------------------+         |  |                        +--------------------+
  |                                 |                    |         |  |                        |                    |
  +---------------------------------+ API Server         <---------+  +------------------------+ Kubelet            |
                                    |                    |                                     |                    |
                                    | Kubelet            |                                     | Docker             |
                                    |                    |                                     |                    |
                                    | Docker             |                                     | Proxy              |
                                    |                    |                                     |                    |
                                    | Proxy              |                                     |                    |
                                    |                    |                                     |                    |
                                    |                    |                                     |                    |
                                    |                    |                                     |                    |
                                    |                    |                                     |                    |
                                    |                    |                                     |                    |
                                    +--------------------+                                     +--------------------+

Refer to the official diagram for a more detailed break down: http://kubernetes.io/v1.1/docs/design/architecture.png?raw=true

Step 1 — Configuring Our Client Machine.

As the first step in this tutorial we will ensure our client machine is correctly configured to complete all subsequent steps.

The default folder for storing Kubernetes cluster certificates and config-related files is $HOME/.kube/. For this tutorial, we will store our cluster configuration and certificates in this folder, ensure the folder exists:

mkdir ~/.kube

We will be using the Digital Ocean Control TooL (Doctl) as well as the Command-line JSON processor Jq to manage our Digital Ocean resources from our terminal. This will allow us to quickly repeat commands and automate our Kubernetes cluster setup further down the line.

We will set up Doctl and Jq as well as introduce the basics on how to use these tools within this step.

At the end of this step a correctly configured client environment is expected, if you skip this step ensure first that you have:

Configured your environment to create and destroy Droplets in a single Digital Ocean region from the terminal. Ensure you set the $region and $DIGITALOCEAN_API_KEY variables for the rest of this tutorial.
Created the SSH key for all Droplets in our cluster. Ensure the private key is loaded with your SSH agent and the public key is stored as a Digital Ocean resource named k8s-key.

Follow the sub-steps to achieve this.

Setting up Doctl

To use Doctl from your terminal and follow the Kubernetes cluster config from this tutorial, you will need to generate a Personal Access Token with write permissions through the Digital Ocean Control Panel. Refer to the How To Use the DigitalOcean API v2 tutorial for information on how to do this, continue with these steps once you have your Personal Access Token ready.

For all of the steps in this tutorial, we will assign our token to a variable called DIGITALOCEAN_API_KEY. For example, by running the following command in bash (replace the highlighted text with your own token):

export DIGITALOCEAN_API_KEY=77e027c7447f468068a7d4fea41e7149a75a94088082c66fcf555de3977f69d3

Review the latest Doctl release and choose the right binary archive for your environment:

Operating System	Binary
OSX	darwin-amd64-doctl.tar.bz2
Linux	linux-amd64-doctl.tar.bz2
Windows	windows-amd-64-doctl.zip

For example, to download the archive for the 0.0.16 release (used in this tutorial) to your home directory on a Linux 64-bit host, run the following commands in your terminal:

curl -Lo ~/doctl.tar.bz2 https://github.com/digitalocean/doctl/releases/download/0.0.16/linux-amd64-doctl.tar.bz2

Next, we need to extract the downloaded archive. We will also need to add doctl to a location included in our PATH environment variable, /usr/bin or /opt/bin for example. The following command will extract doctl directly to /usr/bin making it available for all users on a Linux host. This command requires sudo rights:

tar xjf ~/doctl.tar.bz2 -C /usr/bin

Finally, validate that doctl has been downloaded successfully by confirming the installed version:

doctl --version

If you followed the steps above, this should return:

Outputdoctl version 0.0.16

Finding help about Doctl

An overview of Doctl and several usage examples are available on the Doctl GitHub repository. Additionally, invoking Doctl without any arguments will print out usage instructions as well. Note that every Digital Ocean resource type (droplet, sshkey, ...) has a corresponding Doctl command. Every command has subcommands to manage the resource as well as instructions available through the help subcommand or the --help flag.

For example, to review the available commands for droplet resources, run:

doctl droplet help

This should return:

OutputNAME:
   doctl droplet - Droplet commands. Lists by default.

USAGE:
   doctl droplet [global options] command [command options] [arguments...]

VERSION:
   0.0.16

COMMANDS:
   create, c            Create droplet.
   list, l              List droplets.
   find, f              <Droplet name> Find the first Droplet whose name matches the first argument.
   destroy, d           [--id | <name>] Destroy droplet.
   reboot               [--id | <name>] Reboot droplet.
   power_cycle          [--id | <name>] Powercycle droplet.
   shutdown             [--id | <name>] Shutdown droplet.
   poweroff, off        [--id | <name>] Power off droplet.
   poweron, on          [--id | <name>] Power on droplet.
   password_reset       [--id | <name>] Reset password for droplet.
   resize               [--id | <name>] Resize droplet.
   help, h              Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h   show help

Notice that the droplet command will list droplets by default.

To get more information about the droplet create command, run:

doctl droplet help create

This should return:

OutputNAME:
   create - Create droplet.

USAGE:
   command create [command options] [arguments...]

OPTIONS:
   --domain, -d                         Domain name to append to the hostname. (e.g. server01.example.com)
   --add-region                         Append region to hostname. (e.g. server01.sfo1)
   --user-data, -u                      User data for creating server.
   --user-data-file, --uf               A path to a file for user data.
   --ssh-keys, -k                       Comma seperated list of SSH Key names. (e.g. --ssh-keys Work,Home)
   --size, -s "512mb"                   Size of Droplet.
   --region, -r "nyc3"                  Region of Droplet.
   --image, -i "ubuntu-14-04-x64"       Image slug of Droplet.
   --backups, -b                        Turn on backups.
   --ipv6, -6                           Turn on IPv6 networking.
   --private-networking, -p             Turn on private networking.
   --wait-for-active                    Don't return until the create has succeeded or failed.

As another example, to list all Droplet sizes provided by Digital Ocean run:

doctl droplet size list

Setting up your SSH Keys with Doctl

Every CoreOS droplet that you will provision for your Kubernetes cluster, will need to have at least one SSH public key installed during its creation process. The key(s) will be installed to the core user's authorized keys file, and you will need the corresponding private key(s) to log in to your CoreOS server.

If you do not already have any SSH keys associated with your Digital Ocean account, do so now by following steps one and two of this tutorial: How To Use SSH Keys with Digital Ocean Droplets. You may opt to use Doctl to add the new SSH keys to your account rather then copying the SSH Keys into the Digital Ocean control panel manually. This can be achieved by passing in your DIGITALOCEAN_API_KEY environment variable as the --api-key to Doctl and adding the public key of your newly created SSH key with the following command:

doctl --api-key $DIGITALOCEAN_API_KEY keys create <key-name> <path-to-public-key>

Note: Doctl will automatically try to use the $DITIGALOCEAN_API_KEY env variable as the --api-key if it exists and we do not need to explicitly pass it in every time. We will omit this in future Doctl commands.

Add your private key to your SSH agent on your client machine, using ssh-agent as follows:

ssh-add <path-to-private-key>

or use the -i <path-to-private-key> flag each time connecting to your droplets over ssh if you do not have a running ssh-agent.

For example, in this tutorial we will store our key pair in our home directory as ~/.ssh/id_rsa and upload the public key as k8s-key to our Digital Ocean account, combining all the above steps together, would look like this:

Output
# Generate the key pair
ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/demo/.ssh/id_rsa):/home/demo/.ssh/id_k8s
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/demo/.ssh/id_k8s.
Your public key has been saved in /home/demo/.ssh/id_k8s.pub.
The key fingerprint is:
4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 demo@a
The key's randomart image is:
+--[ RSA 2048]----+
|          .oo.   |
|         .  o.E  |
|        + .  o   |
|     . = = .     |
|      = S = .    |
|     o + = +     |
|      . o + o .  |
|           . o   |
|                 |
+-----------------+
# Upload public key to Digital Ocean account as k8s-key
doctl keys create k8s-key /home/demo/.ssh/id_k8s.pub
# Add private key to SSH Agent
ssh-add ~/.ssh/id_k8s

Managing Droplets with Doctl

To verify that your account & keys are setup correctly, we will create a new CoreOS Alpha droplet named "do-test" from the terminal.

For the remainder of this tutorial, we will be creating all droplets within the same Digital Ocean region. Choose your region and store it into a variable called $region. Review the list of all available regions by running the doctl region command first.

doctl region

For example, we will use the Amsterdam 2 region for the rest of this tutorial. Choose the region most appropriate for your case:

export region="ams2"

Now create the Droplet with the following command:

doctl droplet create \
    --image "coreos-alpha" \
    --size "512mb" \
    --region "$region" \
    --private-networking \
    --ssh-keys k8s-key \
    "do-test"

With the above command, we created a "512mb" droplet, in the region of our choice, requesting a private_ip and adding our ssh-key (k8s-key) to the droplet for remote access. Once the command completed, Doctl returns information about the new Digital Ocean resource that was just created.

First, confirm you can list all your droplets and their status with the following command:

doctl droplet list

This should output a list similar to below:

OutputID          Name            IP Address              Status  Memory  Disk    Region
8684261     do-test.ams2    198.211.118.106         new     512MB   20GB    ams2

Note that, to speed up its usage, Doctl has several shortcuts. For example, the shortcut for the droplet command is d. Moreover, the default action for the droplet command is list, allowing us to re-write the above command as follows:

doctl d

Returning the same results as before. On Linux you can watch this list to capture when the droplet status changes from new to active (which will take the same amount of time it would take when provisioning through the web control panel).

Once the CoreOS droplet has fully been provisioned and its status changed from new to active, ensure your SSH Key was added correctly by connecting as the core user, run:

ssh core@198.211.118.106

Replace the IP highlighted above by the public IP of your droplet, as listed by the previous doctl d command. As this is the first time you connect to this server, you may be prompted to confirm the fingerprint of the server:

OutputThe authenticity of host '198.211.118.106 (198.211.118.106)' can't be established.
ED25519 key fingerprint is SHA256:wp/zkg0UQifNYrxEsMVg2AEawqSVpRS+3mBAQ6TBNlU.
Are you sure you want to continue connecting (yes/no)?

For more information about accessing your Droplet remotely, see How To Connect To Your Droplet with SSH.

You should now be connected to a fully functional CoreOS droplet running in Digital Ocean.

Press CTRL+D or enter exit to log out of your Droplet, but keep the do-test droplet running to complete the exercises in the next section.

Working with Doctl responses

By default, Doctl will return yaml responses, but it is possible to change the format of the response with the -f flag. Using the json format will allow us to easily act on the data returned by Doctl through the Command-line JSON processor Jq.

Jq comes installed on several Linux distributions (i.e. CoreOS). However, to download and setup Jq manually, review the latest releases and choose the right release for your environment:

Operating System	Binary
OSX	jq-osx-amd64
Linux	jq-linux64
Windows	jq-win64.exe

For example, to download the 1.5 release from a shell directly to your /usr/bin/ directory on a Linux 64-bit host (which requires sudo rights), run:

curl -Lo /usr/bin/jq https://github.com/stedolan/jq/releases/download/jq-1.5/jq-linux64

This will make the jq command available for all users.

Validate Jq has been downloaded successfully by confirming the installed version:

jq --version

If you followed the steps above, this should return:

Outputjq-1.5

Using the -f json argument for Doctl together with Jq allows us to, for example, extract the number of CPUs a droplet has:

doctl -f json d find do-test.$region | jq '.vcpus'

In the above command, the find command for droplets (shortcut f) returns all properties of a droplet matching the name provided, including the droplet's vcpus property. This json data is passed on to Jq with an argument to only return the vcpus property to us.

Another example of using Jq to manipulate the data returned by Doctl is given next, extracting the raw public_key for an existing Digital Ocean SSH Key, the key named k8s-key in our example:

doctl -f json keys f k8s-key | jq --raw-output '.public_key'

With Output similar to:

Outputssh-rsa AAAAB3Nza... user@host

By default Jq will format strings as json strings, but using the --raw-output flag (shortcut -r), as can be seen above, will make Jq write strings directly to standard output. This is very useful for our scripts.

Finally, the real power of Jq becomes evident when we need to retrieve an array of network interfaces (ipv4) assigned to a droplet, filter the array based on a property .type with possible values "public" or "private" and extract the raw value of the ip_address property.

We'll break this down as follows. Notice first that the following command will return an array of all the IPv4 network interfaces assigned to a droplet:

doctl -f json d f do-test.$region | jq -r '.networks.v4[]'

Which will return a result similar to the following text block:

Output{
  "ip_address": "10.129.73.216",
  "netmask": "255.255.0.0",
  "gateway": "10.129.0.1",
  "type": "private"
}
{
  "ip_address": "198.211.118.106",
  "netmask": "255.255.192.0",
  "gateway": "198.211.128.1",
  "type": "public"
}

Next, we direct Jq to apply a filter to the array of network interfaces based on the type property using the select statement and only return the .ip_address property of the filtered network interface:

doctl -f json d f do-test.$region | jq -r '.networks.v4[] | select(.type == "private")  | .ip_address'

The above command effectively returns the private ip_address of our droplet directly to standard out. We will use this command often to store droplet IP addresses into environment variables. The output of the command may look like:

Output10.129.73.216

Finally, destroy your do-test droplet with the following command:

doctl d d do-test.$region

Which will output:

OutputDroplet do-test.ams2 destroyed.

For a full explanation of all the features of Jq, kindly refer to the Jq manual.

Using Doctl to configure CoreOS Droplets with cloud-config

For a short introduction to cloud-config files, kindly refer to the section on writing cloud-config files of the Getting Started with CoreOS series. We will explain every aspect of cloud-config files we rely on as we write our own in this tutorial.

One of the most useful aspects of cloud-config files is that they allow you to define a list of arbitrary Systemd units to start after booting. To understand these coreos.units better, refer to the understanding Systemd units and unit files tutorial. We will walk you through many Systemd unit examples within this tutorial.

We will heavily rely on these config files to manage the configuration of our droplets, giving us a way to consistently provision our Kubernetes clusters on Digital Ocean in the future. It is important however to note that cloud-config files are not intended as a replacement for configuration management tools such as Chef/Puppet/Ansible/Salt/TerraForm and we may benefit more adopting one of these tools in the long run.

Please ensure you use the CoreOS validator to validate any cloud-config file you write as part of this tutorial. Ensuring the config files are valid prior to creating the droplets will help avoid frustration and time loss. Also refer to the general troubleshooting tutorial for CoreOS on Digital Ocean when faced with CoreOS issues.

For this tutorial, we will be passing cloud-config files through the --user-data-file option (shortcut --uf) when creating droplets from the terminal with Doctl.

To see how this works, follow the below steps to create a Droplet with a custom motd and automatic reboots switched off, as an exercise.

First, create a test.yaml file in your working directory, with the content as follows.

test.yaml

#cloud-config

write_files:
  - path: "/etc/motd"
    permissions: "0644"
    owner: "root"
    content: |
      Good news, everyone!
coreos:
  update:
    group: alpha
    reboot-strategy: off

The write_files directive defines a set of files to create on the local filesystem. For each file, we specify the absolute location on disk through the path key and the data to be write through the content key. The coreos.update.* parameters manipulate settings related to how CoreOS instances are updated, setting the reboot-strategy to off will instruct the CoreOS reboot manager (Locksmith) to disable rebooting after updates are applied.

Create a new droplet named ccfg-test, using this test.yaml file with the following command (this command will take about a minute to complete, please be patient):

doctl d c --wait-for-active \
    -i "CoreOS-alpha" \
    -s 512mb \
    -r "$region" \
    -p \
    -k k8s-key \
    -uf test.yaml ccfg-test

We are using the d shortcut to manage our droplet resources and c as a shortcut for create. The --wait-for-active flag will ensure Doctl waits for the droplet to become active before returning control to our terminal, which is why we had to wait.

Once Doctl returned control, you may need to give your Droplet some more time to boot and load the SSH Daemon before attempting to connect.

Try to connect via the public ip of this droplet with the following one-liner.

ssh core@`doctl d | grep ccfg-test | awk '{print $3}'`

In this case we are using the droplet listing command piped into grep to filter down to the droplet we just created and we capture the third column, which is the public IP, using awk. The result should be similar to below, confirming our motd was written correctly once we confirm the authenticity of our new droplet:

OutputThe authenticity of host '37.139.21.41 (37.139.21.41)' can't be established.
ED25519 key fingerprint is SHA256:VtdI6P5sRqvQC0dGWE1ffLYTq1yBIWoRFdWc6qcm+04.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '37.139.21.41' (ED25519) to the list of known hosts.
Good news, everyone!

If you are prompted for a password, ensure your SSH Agent has the private key associated with your k8s-key loaded and try again.

If you happened to destroy a droplet directly prior to creating the one that you are connecting to, you may see a warning like this:

Output@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
...

If this is the case, your new droplet probably has the same IP address as the old, destroyed droplet, but it has a different host SSH key. This is fine, and you can remove the warning, by deleting the old droplet's host key from your system, with this command (replacing the highlighted IP address with your droplet public IP):

ssh-keygen -R 37.139.21.41

Now try connecting to your server again.

Finally, destroy this test droplet with a command similar to below:

doctl d d ccfg-test.$region

Note: At the time of writing, user provided cloud-config files can not be modified once a droplet has been created. To change the cloud-config the droplets need to be re-created. Take this into consideration when writing cloud-config files which limit ssh access to certain user accounts as these may be reset after every reboot.

With the above commands in our toolbox, we are ready to start a highly automated Kubernetes configuration on Digital Ocean.

Step 2 — Initializing The Kubernetes Cluster PKI

In this tutorial we will configure the Kubernetes API Server to use client certificate authentication to enable encryption and prevent traffic interception and man-in-the-middle attacks. This means it is necessary to have a Certificate Authority (CA) which will be trusted as the root authority for the cluster and use it to generate the proper credentials. The necessary assets can also be generated from an existing Public Key Infrastructure (PKI), if already available.

For this tutorial we will use Self-Signed certificates. Every certificate is created by submitting Certificate Signing Requests (CSRs) to a CA. A CSR contains information identifying whom the certificate request is for, including the public key associated with the private key of the requesting party. The CA will sign the CSR, effectively returning what is from then on referred to as "the certificate".

For a detailed overview of OpenSSL, refer to the OpenSSL Essentials guide on Digital Ocean.

Initialize Cluster Root CA

Generate the private key for your root certificate into the default $HOME/.kube folder, which we should have created as part of our client machine setup, with the following OpenSSL command to generate a 2048 bit RSA private key:

openssl genrsa -out ~/.kube/ca-key.pem 2048

This ca-key.pem private key will be used to generate the self-signed ca.pem certificate which will be trusted by all your Kubernetes components, as well as every Worker node and Administrator key pair. This key needs to be closely guarded and kept in a secure location for future use.

Next, use the private key to generate the self-signed root ca.pem certificate with the following openssl command:

openssl req -x509 -new -nodes -key ~/.kube/ca-key.pem -days 10000 -out ~/.kube/ca.pem -subj "/CN=kube-ca"

The ca.pem certificate will be used as the root certificate to verify the authenticity of certificates by every component within your Kubernetes cluster, you will copy this file to the controller and worker Droplets as well as the Administrator clients.

Confirm your Root CA assets exist in the expected location:

ls -1 ~/.kube/ca*.pem

Output similar to:

Output/home/demo/.kube/ca.pem  
/home/demo/.kube/ca-key.pem

We now have all the necessary ingredients to generate certificates for all of our cluster components. We will come back to openssl to generate each as required.

Step 3 — Provisioning The Data Storage Back End.

No matter if you are using Swarm with Docker overlay networks or Kubernetes, a data storage back end is required for the infrastructure meta data.

Kubernetes uses Etcd for data storage and for cluster consensus between different software components. Etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. It is open-source and available on GitHub. We will introduce the minimum concepts necessary to set up Etcd for our Kubernetes cluster, full Etcd documentation is available here.

Your Etcd cluster will be heavily utilized since all objects are stored within and every scheduling decision is recorded. It is recommended that you run a multi-droplet cluster to gain maximum performance and reliability of this important part of your Kubernetes cluster. Of your Kubernetes components, you should only give the kube-apiserver component read/write access to Etcd. You do not want the Etcd cluster used by Kubernetes exposed to every node in your cluster (or worse, to the Internet at large), because access to Etcd is equivalent to root in your cluster.

For development & testing environments, a single droplet running Etcd and shared between the Kubernetes API server and Flannel will suffice.

For production environments it is highly recommended that Etcd is ran as a dedicated cluster separately from Kubernetes components. Use the CoreOS cluster architecture overview as well as the official Etcd clustering guide to bootstrap a new Etcd cluster on Digital Ocean. If you do not have an existing Etcd cluster, you can bootstrap a fresh Etcd cluster on Digital Ocean either by:

Using the public Etcd discovery service,
Deploying your own private Etcd Discovery service or
Using DNS discovery

Additionally, refer to the official Etcd guides on securing your Etcd cluster and to get a full overview of Etcd configuration flags

In this tutorial, instead of being slowed down and distracted by generating new discovery URLs and bootstrapping Etcd, it's easier to start a single Etcd node. Since the full Etcd daemon isn't running on all of the machines, we'll gain a little bit of extra CPU and RAM to play with. However, for ease of configuration of all the cluster services, we will run a local Etcd in proxy mode on every Worker node (this daemon will listen on localhost and proxy all requests to the Etcd node). This allows us to configure every cluster component through the local Etcd proxy.

If you already have an Etcd cluster and wish to skip this step, ensure that you have set the $ETCD_PEER environment variable to your Etcd cluster before proceeding with the rest of this tutorial.

Deploying with a single Etcd node

Since we're only using a single Etcd node, there is no need to include a discovery token. There isn't any high availability for Etcd in this configuration, but that's assumed to be OK for development and testing. Provision this Droplet first so you can configure the rest with its IP address.

Etcd is configurable through command-line flags and environment variables. To start Etcd automatically using custom settings with Systemd, we may store manually created Systemd drop-ins under: /etc/systemd/system/etcd2.service.d/. Systemd drop-ins are a method for appending or overriding parameters of a Systemd unit without having to re-define the whole unit.

Alternatively, we may use the coreos.etcd2.* parameters in our cloud-config file to let CoreOS automatically generate the Etcd drop-ins on startup for us.

Note: cloud-config generated drop-ins are stored under /run/systemd/system/etcd2.service.d/.

#cloud-config

coreos:
  etcd2:
    name: "etcd-01"
    advertise-client-urls: "http://$private_ipv4:2379"
    listen-client-urls: "http://$private_ipv4:2379, http://127.0.0.1:2379"
    listen-peer-urls: "http://$private_ipv4:2380, http://127.0.0.1:2380"
    #bootstrapping
    initial-cluster: "etcd-01=http://$private_ipv4:2380"
    initial-advertise-peer-urls: "http://$private_ipv4:2380"

As we will use a single region for our Kubernetes cluster, we configure our Etcd instance to listen for incoming requests on the private_ip and localhost only, this may give us a little protection from the public Internet - but not from other droplets within the same region. For a production setup, it is recommended to follow the official Etcd guides on securing your Etcd cluster.

We set the -listen-client-urls flag to listen for client traffic and -listen-peer-urls flag to listen for peer traffic coming from Etcd proxies running on other cluster nodes. We use the $private_ipv4 substitution variable made available by the Digital Ocean metadata service in our cloud-config files. We use the IANA-assigned Etcd ports 2379 for client traffic and 2380 for peer traffic.

Note: Several Etcd applications, such as SkyDNS, still rely on Etcd's legacy port 4001. We did not configure Etcd to listen on this port, but you may need to do this to support older Etcd applications in your infrastructure.

Our Etcd node will advertise itself with its private_ip to clients as we define -advertise-client-urls to overwrite the default of localhost. this is important to avoid loops for our Etcd proxy running on our worker nodes. We are also required to configure a name for our Etcd instance to overwrite the default name for static bootstrapping. To bootstrap our single node Etcd cluster we directly provide the initial clustering flags -initial-cluster and -initial-advertise-peer-urls as we do not rely on cluster discovery.

Next, we tell Systemd to start our Etcd service on boot by providing a unit definition for the etcd2 service in the same cloud-config file as well and as this component is crucial and we only have a single node, we turn off the CoreOS reboot-strategy which is on by default.

Combining all of the above, our cloud-config file for our Etcd Droplet should look as follows:

etcd-01.yaml


#cloud-config

coreos:
  etcd2:
    name: "etcd-01"
    advertise-client-urls: "http://$private_ipv4:2379"
    listen-client-urls: "http://$private_ipv4:2379, http://127.0.0.1:2379"
    listen-peer-urls: "http://$private_ipv4:2380, http://127.0.0.1:2380"
    #bootstrapping
    initial-cluster: "etcd-01=http://$private_ipv4:2380"
    initial-advertise-peer-urls: "http://$private_ipv4:2380"
  units:
    - name: "etcd2.service"
      command: "start"
  update:
    group: alpha
    reboot-strategy: off

Validate your cloud-config file with the CoreOS Validator, then create your etcd-01 droplet with the following Doctl command. :

doctl d c --wait-for-active \
    -i "CoreOS-alpha" \
    -s 512mb \
    -r "$region" \
    -p \
    -k k8s-key \
    -uf etcd-01.yaml etcd-01

Again we are waiting for the droplet creation to be completed before proceeding. When active, give the Droplet some time to start the SSH daemon, then connect::

ssh core@`doctl d | grep etcd-01 | awk '{print $3}'`

Confirm Etcd is running:

systemctl status etcd2

This should return output similar to:

Output● etcd2.service - etcd2
   Loaded: loaded (/usr/lib64/systemd/system/etcd2.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/etcd2.service.d
           └─20-cloudinit.conf
   Active: active (running) since Sat 2015-11-11 23:19:13 UTC; 6min ago
 Main PID: 841 (etcd2)
   CGroup: /system.slice/etcd2.service
           └─841 /usr/bin/etcd2

Nov 11 23:19:13 etcd-01.ams2 systemd[1]: Started etcd2.
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: added local member ce2a822cea30bfca [http://10.129.69.201:2379] to cluster 7e27652122e8b2ae
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca is starting a new election at term 1
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca became candidate at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca became leader at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: setting up the initial cluster version to 2.2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: set the initial cluster version to 2.2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: published {Name:etcd-01 ClientURLs:[http://10.129.69.201:2379]} to cluster 7e27652122e8b2ae

Confirm the cluster is healthy:

etcdctl cluster-health

This should return output similar to:

Outputmember ce2a822cea30bfca is healthy: got healthy result from http://10.129.69.201:2379
cluster is healthy

Close the connection to the droplet and note down the Etcd endpoint your kubernetes will use, http://10.129.69.201:2379 for clients and http://10.129.69.201:2380 for peers, in the example above.

if we were to script this assignment, using Jq we can extract the private_ip property of the droplet and format the result as required:

export ETCD_PEER=`doctl -f json d f etcd-01.$region | jq -r '.networks.v4[] | select(.type == "private")  | "http://\(.ip_address):2380"'`

Refer to [Working with Doctl responses](#) of Step 1 in this tutorial for a full explanation of the above command and confirm:

echo $ETCD_PEER

This should return output similar to:

Outputhttp://10.129.69.201:2380

We will point our other Droplets to this Etcd cluster through their cloud-config files. Start by creating a new file in your working directory named cloud-config-controller.yaml. This will be the template for our Controller Droplets, add the Etcd proxy configuration with the placeholder for ETCD_PEER (we will replace the placeholder at a later stage). We will keep adding snippets of configuration to this file as we go through each step in this tutorial. We will provide a full listing of the final file as part of this tutorial. Add the Etcd proxy configuration to the file now:

cloud-config-controller.yaml

#cloud-config

coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"

When we provision a cluster, we will script the substitution of the ETCD_PEER placeholder with a sed command similar to the one below:

sed -e "s|ETCD_PEER|${ETCD_PEER}|g;" cloud-config-controller.yaml > kube-controller-01.yaml

Note: Because our variable value includes forward slashes, we are using sed with the pipeline "|" character as separator for the "s" command instead of the more common forward slash. Whichever character follows the "s" command is used as the separator by sed.

We will proceed with reviewing the networking requirements for Kubernetes and how we achieve them on Digital Ocean in the next step.

Step 4 — Configuring The Network Fabric Layer.

As explained in the introduction of this tutorial, Kubernetes has the fundamental networking requirement of ensuring that all containers are routable without network translation or port brokering on the hosts. In other words, this means every Droplet is required to have its own IP range within the cluster. To achieve this on Digital Ocean, Flannel will be used to provide an overlay network across multiple Droplets and configure Docker to use this networking layer for its containers.

Flannel runs an agent, flanneld, on each host which is responsible for allocating a subnet lease out of a pre-configured address space. When enabling Flannel on CoreOS, CoreOS will ensure Docker is automatically configured to use the Flannel overlay network for its containers.

Flannel uses Etcd to store the network configuration, subnets allocated to each host and auxiliary data (such as host IPs). The Etcd storage back end for Flannel should be ran separately from the Kubernetes storage back end. To reduce the complexity in this tutorial however, we will configure Flannel to share the external Etcd cluster with Kubernetes, this is acceptable for Testing and Development only. By default, Flannel looks up its configuration under the /coreos.com/network/config key within Etcd. To run Flannel on each node in a consistent way, we are required to publish the Flannel configuration to Etcd under this key.

At the bare minimum, the configuration must provide Flannel an IP range (subnet) that it should use for the overlay network. The IP subnet used by Flannel should not overlap with the public and private IP ranges used by the Digital Ocean Droplets, 10.2.0.0/16 is the IP range used in this tutorial. This /16 range will be assigned for the entire overlay network and used by containers and Pods across the cluster Droplets. By default, Flannel will allocate a /24 to each Droplet. This default, along with the minimum and maximum subnet IP addresses is overridable in config.

The forwarding of packets by Flannel is achieved using one of several strategies that are known as back ends. In this tutorial we will configure Flannel to use the vxlan back end which is built on the performant in-kernel VXLAN tunneling protocol to encapsulate the packets for the overlay network.

If we were to use the etcdctl utility, which is shipped with CoreOS, directly from the terminal of any of our Droplets to publish this configuration, it would look like this:

etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}

With the above command, etcdctl uses the localhost, which in our hosts will be the Etcd daemon running in proxy mode, forwarding the configuration to our Etcd storage back end.

In our cloud-config-controller.yaml we now add the requirement that the flanneld.service needs to be started as well as add a Systemd drop-in. In this case we're using the Systemd drop-in to append a pre-condition to the start command of the flanneld service, making sure the Flannel configuration is published prior to starting Flannel:

cloud-config-controller.yaml

#cloud-config

coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'

Note: any services that run Docker containers must come after the flanneld.service definition in our cloud-config file and should include Requires=flanneld.service, After=flanneld.service, and Restart=always|on-failure directives. These directives are necessary because flanneld.service may fail due to Etcd not being available yet. Flannel will keep restarting and it is important for Docker based services to also keep trying until Flannel is up.

In order for Flannel to manage the pod network in the cluster, Docker needs to be configured to use the correct ip range for the Docker bridge. All we need to do is require that flanneld is running prior to Docker starting and Flannel will handle the Docker configuration for us.

We're doing this with another Systemd drop-in, in this case we're appending two dependency rules to our Docker service to ensure it is only started after the Flannel service:

cloud-config-controller.yaml

#cloud-config

coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: "docker.service"
      command: start
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service

Step 5 — Downloading The Kubernetes Artifacts

Overview of the Kubernetes Artifacts

At the time of writing, the Official Kubernetes guidelines require the Kubernetes binaries and Docker images wrapping those binaries to be downloaded as part of the full Kubernetes release archive available in the Kubernetes repository on GitHub.

At the same time, all Kubernetes artifacts are also stored on the kubernetes-release bucket on Google cloud storage for every release.

To confirm the current stable release of Kubernetes run:

curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt

This returned v1.1.2 at the time of writing.

To list all Kubernetes release binaries stored in the Google cloud storage bucket for a particular release, you can either use the python 2.7 based Gsutil from the Google SDK as follows:

gsutil ls -R gs://kubernetes-release/release/v1.1.2/bin/linux/amd64

Or without Python, directly talking to the Google cloud platform API using curl & jq (Jq ref):

 curl -sL  https://www.googleapis.com/storage/v1/b/kubernetes-release/o?prefix='release/v1.1.2/bin/linux/amd64' | jq '"https://storage.googleapis.com/kubernetes-release/\(.items[].name)"'

A combined binary is provided as the Hyperkube. This Hyperkube is an all-in-one binary, allowing you to run any Kubernetes component as a subcommand of the hyperkube command. The Hyperkube is also made available within a Docker image. The Dockerfile used to build this image can be reviewed here.

The plan for the Kubernetes release process is to publish the Kubernetes images on the Google Container Registry, under the google_containers repository: gcr.io/google_containers/hyperkube:$TAG, where TAG is the latest stable release tag (i.e.: v1.1.2).

For example, we would obtain the Hyperkube image with the following command:

docker pull gcr.io/google_containers/hyperkube:v1.1.2

Note: At the time of writing, Kubernetes images were not yet being pushed to the Google Container Registry as part of the release process. Any available images were pushed as a one-off. Refer to the following support ticket for an updated status of the release process.

Moreover, as the Hyperkube combines all binaries, is based on debian:jessie and includes additional packages such iptables (required by the kube-proxy), its size is considerable:

Outputgcr.io/google_containers/hyperkube    v1.1.2    aa1283b0c02d    2 weeks ago    254.3 MB

As a result, for this tutorial, we will run the kube-proxy binary outside of a container, the same way we run the kubelet or any system daemon. For the kube-apiserver, kube-controller-manager and kube-scheduler it is recommended to run these within containers and we will take a closer look at the available Docker images to do so now.

As can be seen in the listings earlier, tarred repositories for Docker images wrapping the Kubernetes binaries are also available:

binary_name	base image	size
kube-apiserver	busybox	47.97 MB
kube-controller-manager	busybox	40.12 MB
kube-scheduler	busybox	21.44 MB

Assuming you have access to a Docker daemon, we can curl and load these Docker images with the following commands.

curl -sLo ./kube-apiserver.tar https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-apiserver.tar
docker load -i ./kube-apiserver.tar

We have loaded the kube-apiserver image in this example.

The Kubernetes build script tags these images as gcr.io/google_containers/$binary_name:$md5_sum, which uses the md5_sum instead of the version. To easily run a container from this image, or push the image to a private registry for bootstrapping, we may re-tag the images with commands similar to the following:

#get md5 via docker_tag
docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-apiserver.docker_tag)"
#re-tag
docker tag -f "gcr.io/google_containers/kube-apiserver:${docker_tag}" "kube-apiserver:1.1.2"

In the above command we first get the md5_sum from the cloud storage bucket and use it to re-tag the image. We can automate this for all the Kubernetes containers and will do this to pre-load the Kubernetes images.

Pre-Loading Kubernetes images

We will curl and load each image individually, see Pre-pulling images for details on how pre-pulled images may affect the Kubelet Pod definitions if we were to use the latest tag. The following script combines the commands described above for each binary:

pull-kube-images.sh

#!/bin/bash
tag=1.1.2
docker_wrapped_binaries=(
   "kube-apiserver"
   "kube-controller-manager"
   "kube-scheduler"
   #"kube-proxy"
)
temp_dir="$(mktemp -d -t 'kube-server-XXXX')"

for binary in "${docker_wrapped_binaries[@]}"; do
  docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
  echo "downloading ${binary} ${docker_tag}"
  curl -Lo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
  echo "loading docker image"
  docker load -i ${temp_dir}/${binary}.tar
  echo "tagging docker image as ${binary} ${tag}"
  docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
done

echo "cleaning up temp dir"
rm -rf "${temp_dir}"

The build scripts provided on the Kubernetes repository were used as a reference to create this script.

If you have a Docker registry available, you may modify this script to push these images to a repository there and simplify the provisioning of your cluster nodes significantly. In that case, our cluster nodes know how to simply pull the containers from your repository and we do not need to include the above script as part of our controller Droplet configuration. For this tutorial, we assume such registry is not available and we will embed the above script into the cloud-config file of every controller node.

Insert this script to the cloud-config-controller.yaml with a write-files directive and make it executable. To reduce the output to the Systemd journal, we make the curl command silent and strip the echo commands. We inserted the write-files directive before the CoreOS configuration directives for clarity. We will execute the script through a oneshot service, which we will add to the cloud-config after this code listing:

cloud-config-controller.yaml

#cloud-config

write-files:
  - path: /opt/bin/pull-kube-images.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      tag=1.1.2
      docker_wrapped_binaries=(
        "kube-apiserver"
        "kube-controller-manager"
        "kube-scheduler"
      )
      temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
      for binary in "${docker_wrapped_binaries[@]}"; do
        docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
        curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
        docker load -i ${temp_dir}/${binary}.tar
        docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
      done;
      rm -rf "${temp_dir}";
      exit $?
coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: "docker.service"
      command: start
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service

We will now instruct cloudinit to run this script during initiation of the Droplet by adding a oneshot service for it to our cloud-config. Systemd will flag services which exit as failed, unless we explicitly inform Systemd our service should still be flagged as successful after a non-zero exit with the RemainAfterExit flag. The script also requires the network and Docker services to be available and we add these as conditions to our Service Unit. Add the Unit definition to the end of the cloud-config-controller.yaml file (line 48 and below):

cloud-config-controller.yaml

#cloud-config

write-files:
  - path: /opt/bin/pull-kube-images.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      tag=1.1.2
      docker_wrapped_binaries=(
        "kube-apiserver"
        "kube-controller-manager"
        "kube-scheduler"
      )
      temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
      for binary in "${docker_wrapped_binaries[@]}"; do
        docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
        curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
        docker load -i ${temp_dir}/${binary}.tar
        docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
      done;
      rm -rf "${temp_dir}";
      exit $?
coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: "docker.service"
      command: start
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service
    - name: "pull-kube-images.service"
      command: start
      content: |
        [Unit]
        Description=Pull and load all Docker wrapped Kubernetes binaries
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target docker.service
        After=network-online.target docker.service
        [Service]
        ExecStart=/opt/bin/pull-kube-images.sh
        RemainAfterExit=yes
        Type=oneshot

A note about the Kubernetes Up and Running images:

Non-official images are also made available by Kelsey Hightower for his Kubernetes Up And Running book through his kuar repository (backed by the Kubernetes Up And Running Google cloud storage bucket), we may list the contents of the kuar bucket using the repositories/library key to find the images available within the repository, as follows:

gsutil ls gs://kuar/repositories/library/

Or without Python, using curl, jq and grep to grab all v1.1.2 Kubernetes tagged images:

curl -sL  https://www.googleapis.com/storage/v1/b/kuar/o?prefix='repositories/library/kube' | jq .items[].name | grep tag_1.1.2

The kuar images are very easy to use with a single Docker command using the registry endpoint: b.gcr.io/kuar/$binary_name:$version, for example: to run the kube-apiserver:

docker run b.gcr.io/kuar/kube-apiserver:1.1.2

This is much easier than manually curl-ing, load-ing, re-tag-ing and run-ing the images, but keep in mind that these are not the official Kubernetes images and their availability is not guaranteed. We will not be using these images as part of this tutorial.

Step 6 — The Kubernetes Controller Systemd Services

Most of the Kubernetes controller configuration will be done through cloud-config, aside from placing the TLS assets on disk. The cloud-config we are writing will take into account the possibility to have load-balanced controller nodes for High Availability in the future. How this affects our configuration will be discussed in detail in the [Controller Services set up: Master Election](#) section.

Note: The TLS assets shouldn't be stored in the cloud-config for enhanced security. If you do prefer to transfer the TLS assets as part of the cloud-config refer to this CoreOS tutorial as an example of storing TLS assets within the cloud-config file.

We will now introduce every Kubernetes component and its configuration to incrementally add to our controller Droplet's cloud-config file.

The Kubernetes Kubelet Service

As seen in the Architectural overview section, Kubernetes is made up of several components. One such fundamental component is the kubelet. The kubelet is responsible for what's running on each individual Droplet within your cluster. You can think of it as a process watcher like systemd, but focused on running containers. It has one job: given a set of containers to run, make sure they are all running.

The unit of execution Kubernetes works with is the Pod, not an individual container. A Pod is a collection of containers and volumes sharing the same execution environment. The containers within a Pod share a single IP, in our case this IP is provided by Docker within the Flannel overlay network. Pods are defined by a JSON or YAML file called a Pod manifest.

Within a Kubernetes cluster, the kubelet functions as a local agent that watches for Pod specs via the Kubernetes API server. The kubelet is also responsible for registering a node with a Kubernetes cluster, sending events and pod status, and reporting resource utilization.

While the kubelet plays an important role in a Kubernetes cluster, it also works well in standalone mode - outside of a Kubernetes cluster. With the kubelet running in standalone mode we will be able to use containers to distribute our binaries, monitor container resource utilization through the built-in support for cAdvisor and establish resource limits for the daemon services. The kubelet provides a convenient interface for managing containers on a local system, allowing us to update our controller services by updating the containers without rebuilding our unit files. To achieve this, the kubelet supports the configuration of a manifest directory, which is monitored for pod manifests every 20 seconds by default.

We will use our controller's cloud-config to configure & start the kube-kubelet.service in standalone mode on our controller Droplet. We will run the kube-proxy service in the same way. Next, we will deploy all the Kubernetes cluster control services using a Pod manifest placed in the manifest directory of the controller as soon as the TLS assets become available. The kubelet will start and make sure all containers within the Pod keep running, just as if the Pod was submitted via the API. The cool trick here is that we don't have an API running yet, but the Pod will function in the exact same way, which simplifies troubleshooting later on.

Note: Please note that only CoreOS Alpha or Beta images come with the Kubernetes kubelet. The Stable channel has never contained a version which included the kubelet. If a Droplet was booted from the Beta or Alpha channels and then moved to the Stable channel, it will lose the kubelet when it updates to the stable release.

At the time of writing, the CoreOS Alpha image on Digital Ocean has the following version:

Output$ cat /etc/lsb-release
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=891.0.0

And the Kubelet bundled with this is:

Output$ kubelet --version=true
Kubernetes v1.1.2+3085895

If you are required to use the CoreOS stable channel or need a different Kubelet version, you may curl the kubelet binary as part of the cloud-config using the paths identified in [Step 5 — Downloading The Kubernetes Artifacts](#) of this tutorial.

cloud-config-* - kubelet snippet

...
ExecStartPre=/usr/bin/curl -sLo /opt/bin/kubelet -z /opt/bin/kubelet https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kubelet
ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet
...

Note: The -z option of curl will only download newer files based on a date expression or, as used here - given an existing local file - only if the date of the remote file is newer than the date of the local file. This will generate a warning if the local file does not exist, as shown below.

OutputWarning: Illegal date format for -z/--timecond (and not a file name).
Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.

Wherever we curl binaries with the -z option as part of a Systemd unit, these warnings will show in the journal and can safely be ignored.

Running the Kubelet in standalone mode

The parameters we will pass on to the kubelet are as follows, we will break these down one by one next:

kubelet \
  --api-servers=http://127.0.0.1:8080 \
  --register-node=false \
  --allow-privileged=true \
  --config=/etc/kubernetes/manifests \
  --hostname-override=$public_ipv4 \
  --cluster-dns=10.3.0.10 \
  --cluster-domain=cluster.local

The kubelet will communicate with the API server through localhost as we specify this with the --api-servers flag, but it will not register our controller node for cluster work as we set the --register-node=false flag, this ensures our controller Pods will not be affected by Pods scheduled by users within the cluster. As mentioned in the Kubelet deep dive section, to run the kubelet in standalone mode, we need to point it to a manifest directory. We set the kubelet manifest directory via the --config flag, which will be /etc/kubernetes/manifests in our setup. To facilitate the routing between Droplets, we also override the hostname with the Droplet public IP through the --hostname-override flag.

CoreOS Linux ships with reasonable defaults for the kubelet, which have been optimized for security and ease of use. However, we are going to loosen the security restrictions in order to enable support for privileged containers through the --allow-privileged=true flag.

Service Discovery and Kubernetes Services

To enable service discovery within the Kubernetes cluster, we need to provide our kubelet with the service IP for the cluster DNS component as well as the DNS domain. The kubelet will pass this on as the DNS server and DNS search suffix to each container running within the cluster. In this tutorial we will deploy DNS as a service within our Kubernetes cluster through the cluster DNS add-on in [Step 11 — Deploying Kubernetes-ready applications](#). Kubernetes uses cluster Virtual IPs (VIPs) for all services defined within the cluster. Routing to these VIPs is handled by the Kubernetes proxy components and VIPs are not required to be routable between nodes.

We configure Kubernetes to use the 10.3.0.0/24 IP range for all services. Each service will be assigned a cluster IP in this range. This range must not overlap with any IP ranges assigned to Pods as configured in our Flannel overlay network, or the Digital Ocean public and private IP ranges. The API server will take the first IP in that range (10.3.0.1) by itself and we will configure the DNS service to take the static IP of 10.3.0.10. Modify these values to mirror your own configuration.

We must pass on this DNS service IP to the kubelet via the --cluster-dns flag and the DNS domain via the --cluster-domain flag.

If the kubelet is bundled with CoreOS (Alpha/Beta), it is stored on /usr/bin/kubelet, if you manually download (Stable) to another path (/opt/bin/kubelet for example), make sure to update the paths in the snippet below. Prior to starting the Kubelet service, we will also ensure the manifests and ssl directories exist on the host using the ExecStartPre directive, preceded by "-" which indicates to Systemd that failure of the command will be tolerated.

We combine all the information above in the systemd service unit file for running the kubelet. We add a dependency on the docker.service and make sure the unit restarts on failure. Here is the relevant cloud-config snippet:

cloud-config-controller.yaml kubelet - snippet

...
  units:
    - name: "kube-kubelet.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=docker.service
        After=docker.service
        [Service]
        ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
        ExecStart=/usr/bin/kubelet \
        --api-servers=http://127.0.0.1:8080 \
        --register-node=false \
        --allow-privileged=true \
        --config=/etc/kubernetes/manifests \
        --hostname-override=$public_ipv4 \
        --cluster-dns=10.3.0.10 \
        --cluster-domain=cluster.local
        Restart=always
        RestartSec=10
...

Adding this to our existing cloud-config-controller.yaml gives us the following new contents (changes from line 60 onwards):

cloud-config-controller.yaml

#cloud-config

write-files:
  - path: /opt/bin/pull-kube-images.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      tag=1.1.2
      docker_wrapped_binaries=(
        "kube-apiserver"
        "kube-controller-manager"
        "kube-scheduler"
      )
      temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
      for binary in "${docker_wrapped_binaries[@]}"; do
        docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
        curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
        docker load -i ${temp_dir}/${binary}.tar
        docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
      done;
      rm -rf "${temp_dir}";
      exit $?
coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: "docker.service"
      command: start
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service
    - name: "pull-kube-images.service"
      command: start
      content: |
        [Unit]
        Description=Pull and load all Docker wrapped Kubernetes binaries
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target docker.service
        After=network-online.target docker.service
        [Service]
        ExecStart=/opt/bin/pull-kube-images.sh
        RemainAfterExit=yes
        Type=oneshot
    - name: "kube-kubelet.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=docker.service
        After=docker.service
        [Service]
        ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
        ExecStart=/usr/bin/kubelet \
        --api-servers=http://127.0.0.1:8080 \
        --register-node=false \
        --allow-privileged=true \
        --config=/etc/kubernetes/manifests \
        --hostname-override=$public_ipv4 \
        --cluster-dns=10.3.0.10 \
        --cluster-domain=cluster.local
        Restart=always
        RestartSec=10

With this configuration, all state-less controller services will be managed through the Pod manifests dropped into the kubelet's manifest folder (/etc/kubernetes/manifests). After configuring the kube-proxy service next, we will go through the structure of a Pod manifest and the Pod manifest for each controller service. We will finalize the controller configuration section with an overview of the full Kubernetes controller Pod manifests.

The Kubernetes Proxy Service

All nodes should run kube-proxy (Running kube-proxy on a "controller" node is not strictly required, but being consistent is easier.) The proxy is responsible for directing traffic destined for specific services and pods to the correct location. The proxy communicates with the API server periodically to keep up to date.

Unlike the kubelet, the kube-proxy binary is currently not shipped with any CoreOS release and we will always need to download it. The URL to download the binary is described in [Step 5 — Downloading The Kubernetes Artifacts](#) of this tutorial. We will curl the binary from this URL prior to starting the service by providing the following ExecStartPre directives within the [Service] section of our Systemd unit:

ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy

We will also delay the kube-proxy daemon from trying to connect to the API server until the kube-apiserver service has started with the following ExecStartPre directive:

ExecStartPre=/bin/bash -c "until /usr/bin/curl http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"

Both the controller and worker nodes in your cluster will run the proxy. The following kube-proxy parameters will be defined in our systemd service unit:

--master=http://127.0.0.1:8080: The address of the Kubernetes API server for our Kubernetes Controller node. In the section below, we will configure our kube-apiserver to bind to the network of the host and be reachable on the loopback interface.
--proxy-mode=iptables: The proxy mode for our kube-proxy. At the time of writing the following two options are valid: userspace (older, stable) or iptables (experimental). If the iptables mode is selected, but the system's kernel or iptables versions are insufficient, this always falls back to the userspace proxy.
--hostname-override=$public_ipv4: to facilitate routing without DNS resolution.

Add the kube-proxy Systemd Service Unit definition to your cloud-config-controller.yaml. We insert this before the Kubelet Service to ensure the kube-proxy binary is downloaded and started as soon as possible.

cloud-config-controller.yaml

#cloud-config

write-files:
  - path: /opt/bin/pull-kube-images.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      tag=1.1.2
      docker_wrapped_binaries=(
        "kube-apiserver"
        "kube-controller-manager"
        "kube-scheduler"
      )
      temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
      for binary in "${docker_wrapped_binaries[@]}"; do
        docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
        curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
        docker load -i ${temp_dir}/${binary}.tar
        docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
      done;
      rm -rf "${temp_dir}";
      exit $?
coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: start
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: "docker.service"
      command: start
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service
    - name: "pull-kube-images.service"
      command: start
      content: |
        [Unit]
        Description=Pull and load all Docker wrapped Kubernetes binaries
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target docker.service
        After=network-online.target docker.service
        [Service]
        ExecStart=/opt/bin/pull-kube-images.sh
        RemainAfterExit=yes
        Type=oneshot
    - name: "kube-proxy.service"
      command: start
      content: |
        [Unit]
        Description=Kubernetes Proxy
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target
        After=network-online.target
        [Service]
        ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
        # wait for kube-apiserver to be up and ready
        ExecStartPre=/bin/bash -c "until /usr/bin/curl http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
        ExecStart=/opt/bin/kube-proxy \
        --master=http://127.0.0.1:8080 \
        --proxy-mode=iptables \
        --hostname-override=$public_ipv4
        TimeoutStartSec=10
        Restart=always
        RestartSec=10
    - name: "kube-kubelet.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=docker.service
        After=docker.service
        [Service]
        ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
        ExecStart=/usr/bin/kubelet \
        --api-servers=http://127.0.0.1:8080 \
        --register-node=false \
        --allow-privileged=true \
        --config=/etc/kubernetes/manifests \
        --hostname-override=$public_ipv4 \
        --cluster-dns=10.3.0.10 \
        --cluster-domain=cluster.local
        Restart=always
        RestartSec=10

Note: At the time of writing, the kube-proxy binary is 18.3MB while the docker wrapped image based on Debian with the Iptables package installed is over 180MB, downloading the kube-proxy binary takes less than 10 seconds and is therefore the method used in this tutorial as opposed to running the proxy in a privileged Hyperkube container.

Note: By setting the TimeoutStartSec to 10, Systemd will fail the kube-proxy if it hasn't started after 10 seconds, but it will be restarted after the specified RestartSec timeout. We may notice these failures in the journal until the kube-apiserver has started. Until we are certain the kube-apiserver has started, these warning may be ignored. The cloudinit utility will only continue with the next service after Systemd has failed the kube-proxy once.

For the full overview of all kube-proxy arguments, refer to the official Kubernetes documentation.

Step 7 — The Kubernetes Controller Pods

For every controller node, state-less and state-full services will be managed by the Kubelet using Pods. All state-less controller services will be managed through the Kubernetes manifest files dropped into the kubelet's manifest folder (/etc/kubernetes/manifests). All state-full controller services will be defined in Kubernetes manifest files managed by state-less master elector Pods. To understand this configuration, we require a good understanding of Kubernetes Manifest files.

Our kubelet will be used to manage our controller services within containers based on manifest files. In this section we will have a closer look at the structure of these files, you may refer to this section to understand manifest files better.

Introduction to Kubernetes Manifest files

Kubernetes manifests can be written using YAML or JSON, but only YAML provides the ability to add comments. All of the manifests accepted and returned by the server have a schema, identified by the kind and apiVersion fields. These fields are required for proper decoding of the object.

The kind field takes a string that identifies the schema of an object, in our case we are writing a manifest to create Pod objects, as such we write kind: Pod in our Pod manifest.

The apiVersion field takes a string that identifies the API group & version of the schema of an object. API groups will enable the Kubernetes API to be broken down into modular groups which can then be enabled/disabled individually, versioned separately as well as provide 3rd parties the ability to develop Kubernetes plug-ins without naming conflicts. At the time of writing there are only 2 API groups:

The "core" group, which currently consists of the original monolithic Kubernetes v1 API. This API group is simply omitted and specified only by it's version, for example: apiVersion: v1
The "extensions" group, which is the first API group introduced with v1.1. The extensions API group is still in v1beta1 at the time of writing, as such this API group is specified as apiVersion: extensions/v1beta1. Resources within the extensions API group can be enabled or disabled through the --runtime-config flag passed on to the apiserver. For example, to disable HorizontalPodAutoscalers and Jobs we may set --runtime-config=extensions/v1beta1/horizontalpodautoscalers=false,extensions/v1beta1/jobs=false.

For a more detailed explanation of the Kubernetes API, refer to the API documentation.

Once the schema has been specified, Pod manifests mainly consist of the following key structures:

A metadata structure for describing the pod and its labels
A spec structure for describing volumes, and a list of containers that will run in the Pod.

The name and namespace of the metadata structure are generally user provided. The name has to be unique within the namespace specified. An empty namespace is equivalent to the default namespace. In our case, we will scope our Pods related to the Kubernetes system environment to the kube-system namespace. We will combine all stateless controller service containers within one Pod and call it the kube-controller Pod.

Every Pod spec structure must at least have a list of containers with a minimum of 1 container. Each container in a pod must have a unique name. For example, the API service container may be named identical to it's binary name: kube-apiserver. Next, we may specify the image for the container, the command ran within the container (equivalent to the Docker image entrypoint array), the args passed on to the container process (equivalent to Docker image cmd array), the volumeMounts, ...

A special requirement for our controller service containers is that they need to use the host's network namespace. This can be achieved by setting the hostNetwork: true for the spec structure of our controller Pod manifest.

Thus, this is how our Pod manifest for the controller services starts:

kube-controller.yaml - header


apiVersion: v1
kind: Pod
metadata:
  name: kube-controller
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: "kube-apiserver"
  ...

Master Election

By using a single Kubernetes controller Droplet, we have a single point of failure in our infrastructure. To ensure high availability we will need to run multiple controller nodes at some point. Every stateless Kubernetes component, such as the kube-apiserver, can be scaled across multiple controller nodes without concern. However, there are components which modify the state of the cluster, such as the kube-controller-manager and the kube-scheduler. Of these components, only one instance may modify the state at a time. To achieve this, we need to have a way to ensure only 1 instance for each of these components is running which is done by setting up master election per component.

At the time of writing, master election is not integrated within the kube-controller-manager and kube-scheduler, but is planned to be added in the future. Until then, a powerful and generic master election utility called the Podmaster is recommended.

The Podmaster is a small (8MB) utility written in Go that uses Etcd's atomic CompareAndSwap functionality to implement master election. The first controller node to reach the Etcd cluster wins the race and becomes the master node for that service, marking itself as such with an expiring key identifying the service. The Podmaster will then periodically extend its service key. If a Podmaster finds the service key it monitors has expired, it attempts to take over by setting itself as the new master for that service. If it is the current master, the Podmaster copies the manifest of its service into the manifests directory of its host, ensuring a single instance of the service is always running within the cluster. If the Podmaster finds it is no longer the master, it removes the manifest file from the manifests directory of its host, ensuring the kubelet will no longer run the service on that controller node.

A Podmaster instance may run for each service requiring master election, each instance takes the key identifying the service as well as a source manifest file and a destination manifest file. The Podmaster itself will run inside a container and a Docker image wrapping the Podmaster can be pulled from the Google Container Registry under the gcr.io/google_containers/podmaster repository. At the time of writing there is only 1 tag: 1.1.

Even though we are only creating one controller node in this tutorial, we will set up the master election for the controller manager and scheduler service by storing their manifest files under the /srv/kubernetes/manifests path and letting Podmaster instances copy the manifest files to the /etc/kubernetes/manifests path on the elected master node.

In a single-controller deployment, the Podmaster will simply ensure that the kube-scheduler and kube-controller-manager run on the current node. In a multi-controller deployment, the Podmaster will be responsible for ensuring no additional instances are started, unless a machine dies, in which case the Podmaster will ensure new instances are started on one of the other controller nodes.

As our Podmasters depend on Kubernetes volumes, we will see the full Podmaster configurations after defining the Kubernetes volumes and kube-apiserver Pod manifests.

Kubernetes Volumes

At its core, a Kubernetes volume is just a directory, possibly with some data in it, which is accessible to the containers in a Pod. How that directory comes to be, the medium that backs it, and the contents of it are determined by the particular volume type used. Each volume type is backed by a Kubernetes volume plug-in.

For our controller services, we ensure the Pod is tied to our controller node, and we will use HostPath type volumes. HostPath type volumes represent a pre-existing file or directory on the host machine that is directly exposed to the container. They are generally used for system agents or other privileged things that are allowed to see the host machine.

We will place our API server certificates, once generated, in the following pre-defined path on the host:

File: /etc/kubernetes/ssl/ca.pem
File: /etc/kubernetes/ssl/apiserver.pem
File: /etc/kubernetes/ssl/apiserver-key.pem

The address of the controller node is required to generate these API Server certificates. On Digital Ocean, this address is not known in advance. Therefore, we will generate the certificates and securely copy them after we provision our controller Droplet in a separate step, but we will prepare our cloud-config and the volumes defined in our Pod manifests to expect these certificates into these pre-defined host paths.

Every volume requires a name which is unique within the Pod. The name is how we reference the volumes when we mount them into the Pod containers.

We define the following volume collection as part of our controller Pod manifest:

a HostPath volume to provision the Kubernetes TLS Credentials from the parent directory /etc/kubernetes/ssl.
a HostPath volume for the list of "well-known" ca certificates - which, under CoreOS, is located under the read-only /usr/share/ca-certificates path.
a HostPath volume to provision as a source for manifest files for master election for the Podmaster /srv/kubernetes/manifests
a HostPath volume for the Podmaster to access the host manifest folder /etc/kubernetes/manifests, where it will store the destination manifest files.

kube-controller.yaml - volumes

spec:
  volumes:
    - hostPath:
        path: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
    - hostPath:
        path: /usr/share/ca-certificates
      name: ssl-certs-host
    - hostPath:
        path: /srv/kubernetes/manifests
      name: manifest-src
    - hostPath:
        path: /etc/kubernetes/manifests
      name: manifest-dst

The kube-apiserver

The first controller service we will configure in our controller Pod manifest is the API server. The API server is where most of the magic happens. It is stateless by design and takes in API requests, processes them and stores the result in Etcd if needed, and then returns the result of the request. the API server will run on every controller Droplet and will be stored directly under the kubelet manifest folder.

In this tutorial we are using individual Docker images wrapping each Kubernetes binary, in our Pod manifest we specify this binary as the entrypoint for the containers through the command array together with all of its arguments.

Below is the kube-apiserver container spec for our controller Pod, we will go through each argument in detail right after:

kube-controller.yaml - api-server container


  containers:
    - name: "kube-apiserver"
      image: "kube-apiserver:1.1.2"
      command: 
        - "kube-apiserver"
        - "--etcd-servers=http://127.0.0.1:2379"
        - "--bind-address=0.0.0.0"
        - "--secure_port=443"
        - "--advertise-address=$public_ipv4"
        - "--service-cluster-ip-range=10.3.0.0/24"
        - "--service-node-port-range=30000-37000"
        - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
        - "--allow-privileged=true"
        - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
        - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
        - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
        - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
      ports:
        - containerPort: 443
          hostPort: 443
          name: https
        - containerPort: 8080
          hostPort: 8080
          name: local
      volumeMounts:
        - mountPath: /etc/kubernetes/ssl
          name: ssl-certs-kubernetes
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: ssl-certs-host
          readOnly: true

As highlighted in the [Kubernetes manifest files](#) section of this tutorial, our kube-controller Pod uses the host's network namespace and each container running within the Pod can reach the host services, such as the Etcd proxy, over localhost.

--etcd-servers=[]: By design, the kube-apiserver component is the only Kubernetes component communicating with Etcd. We specify the location of the Etcd cluster through the --etcd-servers=[] flag. This flag takes a comma separated list of etcd servers to watch. In this tutorial we bind an Etcd proxy for the cluster to the loopback interface of each Droplet, thus the Etcd cluster can be reached through http://127.0.0.1:2379. Also note that by default, Kubernetes objects are stored under the /registry key in Etcd. We could prefix this path by also setting the --etcd-prefix="/foo" flag, but wont do this for this tutorial.
--bind-address=0.0.0.0: The IP address on which the API server listens for requests. We explicitely configure our API server to listen on all interfaces of the host.
--secure-port=443: To enable HTTPS with authentication and authorization we need to set this flag.

--advertise-address=$public_ipv4: The IP address on which to advertise the apiserver to members of the cluster. This address must be reachable by the rest of the cluster. If blank, the --bind-address will be used, which would not work in our set up.
--service-cluster-ip-range=10.3.0.0/24: A required CIDR notation IP range from which to assign service cluster IPs. See the [Running the kubelet in standalone mode](#) section for more details on how this is used within Kubernetes, we use 10.3.0.0/24 for Kubernetes Services within this Tutorial. Modify these values to mirror your own configuration.
--service-node-port-range=30000-37000: A port range to reserve for services with NodePort visibility. If we do not specify this range we will not be able to run some of the Kubernetes service examples using nodePort.
--admission-control=[]: In Kubernetes, API requests need to pass through a chain of admission controllers after authentication and authorization but prior to being accepted and executed. Admission controllers are chained plug-ins, many advanced features in Kubernetes require an admission control plug-in to be enabled in order to properly support the feature. As a result, a Kubernetes API server that is not properly configured with the right set of admission control plug-ins is an incomplete server and will not support all the features you expect. The recommended set of admission controllers for Kubernetes 1.0 is NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota. We would like to highlight the NamespaceLifecycle plug-in which ensures that API requests in a non-existant Namespace are rejected. Due to this, we will be required to manually create the kube-system namespace used by our controller services once our kube-apiserver is available or our other nodes won't be able to discover them.
--allow-privileged=true: We have to explicitely allow privileged containers to run in our cluster.
--tls-cert-file="etc/kubernetes/ssl/apiserver.pem": The certificate used for SSL/TLS connections to the API Server. We will generate The apiserver certificate containing host identities (DNS name, IP, ..) and securely copy it to our controller Droplet in a separate step. If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to /var/run/kubernetes. If you intend to use this approach, ensure to provide a volume for /var/run/kubernetes/ as well.
--tls-private-key-file="/etc/kubernetes/ssl/apiserver-key.pem": The API Server private key matching the --tls-cert-file we generated.
--client-ca-file="/etc/kubernetes/ssl/ca.pem": The trusted certificate authority, Kubernetes will check all incoming HTTPs request for a client certificate signed by this trusted CA. Any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate.
--service-account-key-file="/etc/kubernetes/ssl/apiserver-key.pem": used to verify ServiceAccount tokens. We explicitely set this to the same private key as our --tls-private-key-file flag. If - unspecified, --tls-private-key-file is also used.

Refer to the full kube-apiserver reference for a full overview of all API server flags.

Creating The kube-system namespace

As soon as the kube-apiserver is available we need to create the kube-system namespace used by our controller services or our cluster nodes won't be able to discover them. In this section we define the Systemd unit responsible for this.

We wait until the kube-apiserver service has started, the same way as our kube-proxy service was configured to wait:

ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"

The command to create the namespace using the Kubernetes API is:

curl -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"

We are passing in a the Manifest file as a JSON string in this case.

Putting the command into a oneshot Systemd unit which depends on a successful start of the kubelet service, gives us the following unit definition:

cloud-config-controller.yaml create kube-system namespace - snippet


coreos:
  units:
    - name: "create-kube-system-ns.service"
      command: "start"
      content: |
        [Unit]
        Description=Create the kube-system namespace
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=kube-kubelet.service
        After=kube-kubelet.service
        [Service]
        ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
        ExecStart=/usr/bin/curl -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"
        RemainAfterExit=yes
        Type=oneshot

The kube-podmaster

In this section we add our Podmaster containers to our kube-controller Pod manifest. As mentioned in the [Controller Services: Master Election](#) section of this tutorial, the kube-scheduler and kube-controller-manager services require master election. We will create 1 Podmaster container for each component requiring master election and define the Pod manifests in the following sections.

We will go into the contents of the kube-scheduler.yaml Pod manifest and kube-controller-manager.yaml Pod manifest after finalizing this kube-controller.yaml Pod manifest.

As the kube-controller Pod shares the host network, our Podmaster containers can reach the Etcd cluster via the localhost Etcd proxy. To ease the setup, we overwrite the hostname the Podmaster stores in the master reservation with the Droplet public IP by setting the --whoami flag. The Droplet IP is always routable without the need for DNS services. We mount the manifest-src volume as a read only volume within the Podmaster containers. The manifest-dst volume is the path monitored by the Kubelet and needs to be writable by the Podmaster.

Here is the Podmaster container managing the master election for the kube-scheduler service

kube-controller.yaml - scheduler-elector - snippet


  containers:
    - name: "scheduler-elector"
      image: "gcr.io/google_containers/podmaster:1.1"
      args:
        - "--whoami=$public_ipv4"
        - "--etcd-servers=http://127.0.0.1:2379"
        - "--key=scheduler"
        - "--source-file=/src/manifests/kube-scheduler.yaml"
        - "--dest-file=/dst/manifests/kube-scheduler.yaml"
      volumeMounts:
        - mountPath: /src/manifests
          name: manifest-src
          readOnly: true
        - mountPath: /dst/manifests
          name: manifest-dst

For the kube-scheduler our Podmaster sets the value of the scheduler key in Etcd to record which controller Droplet is the master. We point this Podmaster to the kube-scheduler Pod manifest source and destination files.

For the kube-controller-manager the master elector looks almost identical, apart from the key, source and destination manifest files. The key used for the kube-controller-manager is controller and the kube-controller-manager.yaml Pod manifest file is used instead.

kube-controller.yaml - controller-manager-elector - snippet


  containers
    - name: "controller-manager-elector"
      image: "gcr.io/google_containers/podmaster:1.1"
      args:
        - "--whoami=$public_ipv4"
        - "--etcd-servers=http://127.0.0.1:2379"
        - "--key=controller"
        - "--source-file=/src/manifests/kube-controller-manager.yaml"
        - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
      volumeMounts:
        - mountPath: /src/manifests
          name: manifest-src
          readOnly: true
        - mountPath: /dst/manifests
          name: manifest-dst

Combining kube-controller Pod manifest snippets

Combining all the kube-controller.yaml snippets above into a single kube-controller Pod manifest:

kube-controller.yaml


apiVersion: v1
kind: Pod
metadata:
  name: kube-controller
  namespace: kube-system
spec:
  hostNetwork: true
  volumes:
    - hostPath:
        path: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
    - hostPath:
        path: /usr/share/ca-certificates
      name: ssl-certs-host
    - hostPath:
        path: /srv/kubernetes/manifests
      name: manifest-src
    - hostPath:
        path: /etc/kubernetes/manifests
      name: manifest-dst
  containers:
    - name: "kube-apiserver"
      image: "kube-apiserver:1.1.2"
      command: 
        - "kube-apiserver"
        - "--etcd-servers=http://127.0.0.1:2379"
        - "--bind-address=0.0.0.0"
        - "--secure_port=443"
        - "--advertise-address=$public_ipv4"
        - "--service-cluster-ip-range=10.3.0.0/24"
        - "--service-node-port-range=30000-37000"
        - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
        - "--allow-privileged=true"
        - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
        - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
        - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
        - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
      ports:
        - containerPort: 443
          hostPort: 443
          name: https
        - containerPort: 8080
          hostPort: 8080
          name: local
      volumeMounts:
        - mountPath: /etc/kubernetes/ssl
          name: ssl-certs-kubernetes
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: ssl-certs-host
          readOnly: true
    - name: "scheduler-elector"
      image: "gcr.io/google_containers/podmaster:1.1"
      args:
        - "--whoami=$public_ipv4"
        - "--etcd-servers=http://127.0.0.1:2379"
        - "--key=scheduler"
        - "--source-file=/src/manifests/kube-scheduler.yaml"
        - "--dest-file=/dst/manifests/kube-scheduler.yaml"
      volumeMounts:
        - mountPath: /src/manifests
          name: manifest-src
          readOnly: true
        - mountPath: /dst/manifests
          name: manifest-dst
    - name: "controller-manager-elector"
      image: "gcr.io/google_containers/podmaster:1.1"
      args:
        - "--whoami=$public_ipv4"
        - "--etcd-servers=http://127.0.0.1:2379"
        - "--key=controller"
        - "--source-file=/src/manifests/kube-controller-manager.yaml"
        - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
      volumeMounts:
        - mountPath: /src/manifests
          name: manifest-src
          readOnly: true
        - mountPath: /dst/manifests
          name: manifest-dst

The kube-controller Pod Pre-conditions

The kube-apiserver requires the TLS assets to be in place, if these are not in place the container will die after starting. The kubelet will create a new container every 5 minutes until the container stays up. To keep the error logs and dead containers minimal during first boot, we prefer to hold off on putting the kube-controller Pod manifest in the kubelet manifest directory until the kube-apiserver TLS assets are available. We will use the write_files directive to create the kube-controller Pod manifest under the /srv/kubernetes/manifests/ Path until then.

We will use a Systemd unit to monitor the /etc/kubernetes/ssl path and copy the kube-controller manifest file to the kubelet manifest directory as soon as the TLS assets are detected.

The following loop sleeps until all 3 TLS assets required on the controller node, are available:

until [ `ls -1 /etc/kubernetes/ssl/{apiserver,apiserver-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo "waiting for TLS assets...";sleep 5; done

Putting this into a oneshot Systemd unit which starts as soon as the kubelet is ready, gives us the following unit definition:

cloud-config-controller.yaml start controller pod - snippet

...
coreos:
  units:
    - name: "tls-ready.service"
      command: "start"
      content: |
        [Unit]
        Description=Ensure TLS assets are ready
        Requires=kube-kubelet.service
        After=kube-kubelet.service
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{apiserver,apiserver-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
        ExecStart=/usr/bin/cp /srv/kubernetes/manifests/kube-controller.yaml /etc/kubernetes/manifests/
...

We will now proceed with defining the master elected kube-scheduler and kube-controller-manager Pod manifests which will also be stored under the /srv/kubernetes/manifests path.

The kube-controller-manager Pod manifest

The controller manager embeds the core control loops within Kubernetes such as the replication controller, endpoints controller, namespace controller and serviceaccount controller. In short, a control loop watches the shared state of the cluster through the kube-apiserver and makes changes attempting to move the current state towards the desired state.

For example, if you increased the replica count for a replication controller, the controller manager would generate a scale up event, which would cause a new Pod to get scheduled in the cluster. The controller manager communicates with the API to submit these events.

We start writing this Pod manifest in exactly the same way as our kube-controller Pod manifest, but with it's own unique name in the kube-system namespace:

/srv/kubernetes/manifests/kube-controller-manager.yaml - header - snippet


apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
    - name: "kube-controller-manager"
...

This Pod also shares the network with the host (hostNetwork: true), allowing the containers running within to access the kube-apiserver through localhost as well as exposing themselves to the kubelet over localhost.

We define volumes for the ssl certificates and list off "well-known" ca certificates stored on the host so we can mount these into the Pod containers:

/srv/kubernetes/manifests/kube-controller-manager.yaml - volumes - snippet


spec:
  ...
  volumes:
    - hostPath:
        path: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
    - hostPath:
        path: /usr/share/ca-certificates
      name: ssl-certs-host

Our kube-controller-manager is called with the following arguments:

kube-controller-manager \
  --master=http://127.0.0.1:8080 \
  --service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
  --root-ca-file=/etc/kubernetes/ssl/ca.pem

We provide the address of the kube-apiserver via the --master=http://127.0.0.1:8080 flag. We provide the private key (to sign service account tokens) and our Kubernetes cluster root CA certificate for inclusion in service account tokens via the --service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem and --root-ca-file=/etc/kubernetes/ssl/ca.pem flags respectively.

We are also adding a livenessProbe to our Pod manifest. This is a diagnostic performed periodically by the kubelet on a container. The LivenessProbe hints to the kubelet when a container is unhealthy. If the LivenessProbe fails, the kubelet will kill the container and the container will be subjected to its RestartPolicy. If RestartPolicy is not set, the default value is Always. The default state of Liveness before the initial delay is Success. The state of Liveness for a container when no probe is provided is assumed to be Success.

The httpGet handler used in our livenessProbe performs an HTTP Get against the provided IP address on a specified port and path expecting on success that the response has a status code greater than or equal to 200 and less than 400. Note the default port used by kube-controller-manager is always 10252 and the Kubernetes "healtz" package registers a handler on the '/healthz' path , that serves 200s.

This gives us the following container spec for our kube-controller-manager container:

/srv/kubernetes/manifests/kube-controller-manager.yaml - containers - snippet


spec:
  ...
  containers:
    - name: "kube-controller-manager"
      image: "kube-controller-manager:1.1.2"
      command: 
        - "kube-controller-manager"
        - "--master=http://127.0.0.1:8080"
        - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
        - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
      livenessProbe:
        httpGet:
          host: 127.0.0.1
          path: /healthz
          port: 10252
        initialDelaySeconds: 15
        timeoutSeconds: 1
      volumeMounts:
        - mountPath: /etc/kubernetes/ssl
          name: ssl-certs-kubernetes
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: ssl-certs-host
          readOnly: true

Refer to the official kube-controller-manager reference for a full overview of all arguments.

Combining the above snippets together, the full kube-controller-manager.yaml Pod manifest file will look as follows:

/srv/kubernetes/manifests/kube-controller-manager.yaml


apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  hostNetwork: true
  volumes:
    - hostPath:
        path: /etc/kubernetes/ssl
      name: ssl-certs-kubernetes
    - hostPath:
        path: /usr/share/ca-certificates
      name: ssl-certs-host
  containers:
    - name: "kube-controller-manager"
      image: "kube-controller-manager:1.1.2"
      command: 
        - "kube-controller-manager"
        - "--master=http://127.0.0.1:8080"
        - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
        - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
      livenessProbe:
        httpGet:
          host: 127.0.0.1
          path: /healthz
          port: 10252
        initialDelaySeconds: 15
        timeoutSeconds: 1
      volumeMounts:
        - mountPath: /etc/kubernetes/ssl
          name: ssl-certs-kubernetes
          readOnly: true
        - mountPath: /etc/ssl/certs
          name: ssl-certs-host
          readOnly: true

The kube-scheduler Pod manifest

The scheduler is the last major piece of our control services. It monitors the API for unscheduled pods, finds them a machine to run on, and communicates the decision back to the API.

The full kube-scheduler.yaml Pod manifest file introduces no new concepts, does liveness probes on the scheduler default port of 10251, doesn't require any volumes and looks as follows:

/srv/kubernetes/manifests/kube-scheduler.yaml


apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
    - name: "kube-scheduler"
      image: "kube-scheduler:1.1.2"
      command:
        - "kube-scheduler"
        - "--master=http://127.0.0.1:8080"
      livenessProbe:
        httpGet:
          host: 127.0.0.1
          path: /healthz
          port: 10251
        initialDelaySeconds: 15
        timeoutSeconds: 1

Refer to the official kube-scheduler reference for a full overview of all arguments.

Step 8 — Finalizing the Controller Cloud Config and Provisioning the Controller Droplet

We will embed the Pod manifest files constructed in Step 7 into our controller cloud-config file.

Embedding all Pod manifests into the Controller cloud-config

We will store each manifest under the following paths:

/srv/kubernetes/manifests/kube-scheduler.yaml
/srv/kubernetes/manifests/kube-controller-manager.yaml
/srv/kubernetes/manifests/kube-controller.yaml

This is achieved through the write-files directive highlighted earlier.

cloud-config-controller.yaml pod-manifests - snippet

#cloud-config

write-files:
  - path: "/srv/kubernetes/manifests/kube-scheduler.yaml"
    permissions: "0644"
    owner: "root"
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kube-scheduler
        namespace: kube-system
      spec:
        hostNetwork: true
        containers:
          - name: "kube-scheduler"
            image: "kube-scheduler:1.1.2"
            command:
              - "kube-scheduler"
              - "--master=http://127.0.0.1:8080"
            livenessProbe:
              httpGet:
                host: 127.0.0.1
                path: /healthz
                port: 10251
              initialDelaySeconds: 15
              timeoutSeconds: 1
  - path: "/srv/kubernetes/manifests/kube-controller-manager.yaml"
    permissions: "0644"
    owner: "root"
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kube-controller-manager
        namespace: kube-system
      spec:
        hostNetwork: true
        volumes:
          - hostPath:
              path: /etc/kubernetes/ssl
            name: ssl-certs-kubernetes
          - hostPath:
              path: /usr/share/ca-certificates
            name: ssl-certs-host
        containers:
          - name: "kube-controller-manager"
            image: "kube-controller-manager:1.1.2"
            command:
              - "kube-controller-manager"
              - "--master=http://127.0.0.1:8080"
              - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
              - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
            livenessProbe:
              httpGet:
                host: 127.0.0.1
                path: /healthz
                port: 10252
              initialDelaySeconds: 15
              timeoutSeconds: 1
            volumeMounts:
              - mountPath: /etc/kubernetes/ssl
                name: ssl-certs-kubernetes
                readOnly: true
              - mountPath: /etc/ssl/certs
                name: ssl-certs-host
                readOnly: true
  - path: "/srv/kubernetes/manifests/kube-controller.yaml"
    permissions: "0644"
    owner: "root"
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kube-controller
        namespace: kube-system
      spec:
        hostNetwork: true
        volumes:
          - hostPath:
              path: /etc/kubernetes/ssl
            name: ssl-certs-kubernetes
          - hostPath:
              path: /usr/share/ca-certificates
            name: ssl-certs-host
          - hostPath:
              path: /srv/kubernetes/manifests
            name: manifest-src
          - hostPath:
              path: /etc/kubernetes/manifests
            name: manifest-dst
        containers:
          - name: "kube-apiserver"
            image: "kube-apiserver:1.1.2"
            command: 
              - "kube-apiserver"
              - "--etcd-servers=http://127.0.0.1:2379"
              - "--bind-address=0.0.0.0"
              - "--secure_port=443"
              - "--advertise-address=$public_ipv4"
              - "--service-cluster-ip-range=10.3.0.0/24"
              - "--service-node-port-range=30000-37000"
              - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
              - "--allow-privileged=true"
              - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
              - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
              - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
              - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
            ports:
              - containerPort: 443
                hostPort: 443
                name: https
              - containerPort: 8080
                hostPort: 8080
                name: local
            volumeMounts:
              - mountPath: /etc/kubernetes/ssl
                name: ssl-certs-kubernetes
                readOnly: true
              - mountPath: /etc/ssl/certs
                name: ssl-certs-host
                readOnly: true
          - name: "scheduler-elector"
            image: "gcr.io/google_containers/podmaster:1.1"
            args:
              - "--whoami=$public_ipv4"
              - "--etcd-servers=http://127.0.0.1:2379"
              - "--key=scheduler"
              - "--source-file=/src/manifests/kube-scheduler.yaml"
              - "--dest-file=/dst/manifests/kube-scheduler.yaml"
            volumeMounts:
              - mountPath: /src/manifests
                name: manifest-src
                readOnly: true
              - mountPath: /dst/manifests
                name: manifest-dst
          - name: "controller-manager-elector"
            image: "gcr.io/google_containers/podmaster:1.1"
            args:
              - "--whoami=$public_ipv4"
              - "--etcd-servers=http://127.0.0.1:2379"
              - "--key=controller"
              - "--source-file=/src/manifests/kube-controller-manager.yaml"
              - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
            volumeMounts:
              - mountPath: /src/manifests
                name: manifest-src
                readOnly: true
              - mountPath: /dst/manifests
                name: manifest-dst

The Final Controller cloud-config with all CoreOS Units

To finally create the controller Droplet, we will combine all above cloud-config snippets into a single cloud-config file:

write-files snippets:
1. /opt/bin/pull-kube-images.sh script to pre-load the Kubernetes docker images
2. /srv/kubernetes/manifests/kube-scheduler.yaml Pod manifest source for the kube-scheduler
3. /srv/kubernetes/manifests/kube-controller-manager.yaml Pod manifest source for the kube-controller-manager
4. /etc/kubernetes/manifests/kube-controller.yaml Pod manifest to start the kube-apiserver, controller-manager-elector and scheduler-elector
etcd2.service snippet to start a local Etcd proxy, notice the ETCD_PEER placeholder.
flanneld.service snippet to start the overlay network daemon with a drop-in to configure the network subnet
docker.service drop-in snippet to add flannel dependency
kubelet.service snippet running the kubelet in standalone mode
kube-proxy.service snippet running the kube-proxy service
kube-pull-images.service snippet running the script to pre-load the Kubernetes docker images
create-kube-system.service snippet creating the kube-system namespace as soon as the API server is available

Several of these services depend on the TLS assets, which we generate as soon as the IP addresses are known for our Droplet.

In a multi-controller set-up, every controller node may be created using this cloud-config, although the flanneld drop-in and create-kube-system.service unit only need to be ran once within the cluster and are not required on subsequent controller nodes.

As we are running a single controller node, we are also turning off CoreOS updates and reboots in our cloud-config.

cloud-config-controller.yaml

#cloud-config

write-files:
  - path: /opt/bin/pull-kube-images.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      tag=1.1.2
      docker_wrapped_binaries=(
        "kube-apiserver"
        "kube-controller-manager"
        "kube-scheduler"
      )
      temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
      for binary in "${docker_wrapped_binaries[@]}"; do
        docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
        curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
        docker load -i ${temp_dir}/${binary}.tar
        docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
      done;
      rm -rf "${temp_dir}";
      exit $?
  - path: "/srv/kubernetes/manifests/kube-scheduler.yaml"
    permissions: "0644"
    owner: "root"
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kube-scheduler
        namespace: kube-system
      spec:
        hostNetwork: true
        containers:
          - name: "kube-scheduler"
            image: "kube-scheduler:1.1.2"
            command:
              - "kube-scheduler"
              - "--master=http://127.0.0.1:8080"
            livenessProbe:
              httpGet:
                host: 127.0.0.1
                path: /healthz
                port: 10251
              initialDelaySeconds: 15
              timeoutSeconds: 1
  - path: "/srv/kubernetes/manifests/kube-controller-manager.yaml"
    permissions: "0644"
    owner: "root"
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kube-controller-manager
        namespace: kube-system
      spec:
        hostNetwork: true
        volumes:
          - hostPath:
              path: /etc/kubernetes/ssl
            name: ssl-certs-kubernetes
          - hostPath:
              path: /usr/share/ca-certificates
            name: ssl-certs-host
        containers:
          - name: "kube-controller-manager"
            image: "kube-controller-manager:1.1.2"
            command:
              - "kube-controller-manager"
              - "--master=http://127.0.0.1:8080"
              - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
              - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
            livenessProbe:
              httpGet:
                host: 127.0.0.1
                path: /healthz
                port: 10252
              initialDelaySeconds: 15
              timeoutSeconds: 1
            volumeMounts:
              - mountPath: /etc/kubernetes/ssl
                name: ssl-certs-kubernetes
                readOnly: true
              - mountPath: /etc/ssl/certs
                name: ssl-certs-host
                readOnly: true
  - path: "/srv/kubernetes/manifests/kube-controller.yaml"
    permissions: "0644"
    owner: "root"
    content: |
      apiVersion: v1
      kind: Pod
      metadata:
        name: kube-controller
        namespace: kube-system
      spec:
        hostNetwork: true
        volumes:
          - hostPath:
              path: /etc/kubernetes/ssl
            name: ssl-certs-kubernetes
          - hostPath:
              path: /usr/share/ca-certificates
            name: ssl-certs-host
          - hostPath:
              path: /srv/kubernetes/manifests
            name: manifest-src
          - hostPath:
              path: /etc/kubernetes/manifests
            name: manifest-dst
        containers:
          - name: "kube-apiserver"
            image: "kube-apiserver:1.1.2"
            command:
              - "kube-apiserver" 
              - "--etcd-servers=http://127.0.0.1:2379"
              - "--bind-address=0.0.0.0"
              - "--secure_port=443"
              - "--advertise-address=$public_ipv4"
              - "--service-cluster-ip-range=10.3.0.0/24"
              - "--service-node-port-range=30000-37000"
              - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
              - "--allow-privileged=true"
              - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
              - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
              - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
              - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
            ports:
              - containerPort: 443
                hostPort: 443
                name: https
              - containerPort: 8080
                hostPort: 8080
                name: local
            volumeMounts:
              - mountPath: /etc/kubernetes/ssl
                name: ssl-certs-kubernetes
                readOnly: true
              - mountPath: /etc/ssl/certs
                name: ssl-certs-host
                readOnly: true
          - name: "scheduler-elector"
            image: "gcr.io/google_containers/podmaster:1.1"
            args:
              - "--whoami=$public_ipv4"
              - "--etcd-servers=http://127.0.0.1:2379"
              - "--key=scheduler"
              - "--source-file=/src/manifests/kube-scheduler.yaml"
              - "--dest-file=/dst/manifests/kube-scheduler.yaml"
            volumeMounts:
              - mountPath: /src/manifests
                name: manifest-src
                readOnly: true
              - mountPath: /dst/manifests
                name: manifest-dst
          - name: "controller-manager-elector"
            image: "gcr.io/google_containers/podmaster:1.1"
            args:
              - "--whoami=$public_ipv4"
              - "--etcd-servers=http://127.0.0.1:2379"
              - "--key=controller"
              - "--source-file=/src/manifests/kube-controller-manager.yaml"
              - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
            volumeMounts:
              - mountPath: /src/manifests
                name: manifest-src
                readOnly: true
              - mountPath: /dst/manifests
                name: manifest-dst
coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: "start"
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Unit]
            Requires=etcd2.service
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
    - name: "docker.service"
      command: "start"
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service
    - name: "pull-kube-images.service"
      command: "start"
      content: |
        [Unit]
        Description=Pull and load all Docker wrapped Kubernetes binaries
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target docker.service
        After=network-online.target docker.service
        [Service]
        ExecStart=/opt/bin/pull-kube-images.sh
        RemainAfterExit=yes
        Type=oneshot
    - name: "kube-proxy.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Proxy
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target
        After=network-online.target
        [Service]
        ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
        # wait for kube-apiserver to be up and ready
        ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
        ExecStart=/opt/bin/kube-proxy \
        --master=http://127.0.0.1:8080 \
        --proxy-mode=iptables \
        --hostname-override=$public_ipv4
        TimeoutStartSec=10
        Restart=always
        RestartSec=10
    - name: "kube-kubelet.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=docker.service
        After=docker.service
        [Service]
        ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
        ExecStart=/usr/bin/kubelet \
        --api-servers=http://127.0.0.1:8080 \
        --register-node=false \
        --allow-privileged=true \
        --config=/etc/kubernetes/manifests \
        --hostname-override=$public_ipv4 \
        --cluster-dns=10.3.0.10 \
        --cluster-domain=cluster.local
        Restart=always
        RestartSec=10
    - name: "tls-ready.service"
      command: "start"
      content: |
        [Unit]
        Description=Ensure TLS assets are ready
        Requires=kube-kubelet.service
        After=kube-kubelet.service
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{apiserver,apiserver-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
        ExecStart=/usr/bin/cp /srv/kubernetes/manifests/kube-controller.yaml /etc/kubernetes/manifests/
    - name: "create-kube-system-ns.service"
      command: "start"
      content: |
        [Unit]
        Description=Create the kube-system namespace
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=kube-kubelet.service
        After=kube-kubelet.service
        [Service]
        ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
        ExecStart=/usr/bin/curl -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"
        RemainAfterExit=yes
        Type=oneshot
  update:
    group: alpha
    reboot-strategy: off

Create the Controller Droplet

Validate your cloud-config file, then create your kube-controller-01 droplet with the following Doctl command. :

Ensure your ETCD_PEER environment variable is still set from the [Deploy the data storage back end](#) section of this tutorial:

$ echo $ETCD_PEER
http://10.129.69.201:2380

If not - set it to the private_ip of your single node ECTD cluster:

export ETCD_PEER=`doctl -f json d f etcd-01.$region | jq -r '.networks.v4[] | select(.type == "private")  | "http://\(.ip_address):2380"'`

Substitute the ETCD_PEER placeholder from above cloud-config-controller.yaml template file with the following command:

sed -e "s|ETCD_PEER|${ETCD_PEER}|g;" cloud-config-controller.yaml > kube-controller.yaml

And send the command to create the droplet:

doctl d c --wait-for-active \
    -i "CoreOS-alpha" \
    -s 512mb \
    -r "$region" \
    -p \
    -k k8s-key \
    -uf kube-controller.yaml kube-controller-01

Note: running free -m on a 512mb Droplet shows only 12mb free memory after all controller services have started, it may be better to use a 1024mb droplet to fully test Kubernetes.

We are waiting for the droplet to be flagged as active before proceeding. Once the Doctl command completes, the Droplet configuration is returned. As it usually takes more time for the Droplet to return its public and private ip addresses, we need to re-query the Droplet configuration. We will cache the json string returned in the $CONTROLLER_JSON environment variable for subsequent commands:

CONTROLLER_JSON=`doctl -f 'json' d f kube-controller-01.$region`

We parse the private and public IPs out as explained in the [Working with doctl responses](#) section of this tutorial.

CONTROLLER_PUBLIC_IP=`echo $CONTROLLER_JSON | jq -r '.networks.v4[] | select(.type == "public") | .ip_address'`
CONTROLLER_PRIVATE_IP=`echo $CONTROLLER_JSON | jq -r '.networks.v4[] | select(.type == "private") | .ip_address'`

Confirm values were populated correctly:

echo $CONTROLLER_PUBLIC_IP && echo $CONTROLLER_PRIVATE_IP

You may monitor the initialization process driven by cloud-config by connecting to the Droplet:

ssh core@$CONTROLLER_PUBLIC_IP

and follow the oem-cloudinit service running the cloud-config:

journalctl -u oem-cloudinit -f

Once the oem-cloudinit service has reached the "tls-ready.service" it will wait for our actions. CTRL+C and confirm Etcd proxy is running:

systemctl status etcd2

Confirm Flannel service started

systemctl status flanneld

If Flannel started, confirm it was able to retrieve its configuration from Etcd:

cat /run/flannel/subnet.env

The Docker daemon options for the overlay network generated by Flannel are stored under /run/flannel_docker_opts.env:

cat /run/flannel_docker_opts.env

Confirm all services are running:

systemctl status tls-ready
systemctl status docker
systemctl status pull-kube-images

confirm the docker images loaded the kube images have all been loaded by running

docker images | grep kube

confirm all files have been written to disk:

ls -l /opt/bin/
ls -l /srv/kubernetes/manifests/

Monitor when the kubelet will launch the containers (which will happen as soon as we copy the TLS assets)

watch -n 1 'docker ps --format="table {{.Image}}\t{{.ID}}\t{{.Status}}\t{{.Ports}}" -a'

If the oem-cloudinit failed, review the cloud-config stored by the Digital Ocean Metadata Service:

curl -sL 169.254.169.254/metadata/v1/user-data | less

If you find a mistake in the cloud-config, your only option is to delete and re-create the Droplet.

Generating and Transferring the kube-apiserver TLS Assets

The address of the controller node is required for the API Server certificate. In most cases this will be the publicly routable IP or hostname of the controller cluster. Worker nodes must be able to reach the controller node(s) via this address on port 443. Additionally, external clients (such as an administrator using kubectl) will also need access, since this will run the Kubernetes API endpoint.

If you will be running a highly-available control-plane consisting of multiple controller nodes, then the host name for the certificate will ideally be pointing at a network load balancer that sits in front of the controller nodes. Alternatively, a DNS name can be configured which will resolve to the controller node IPs. In either case, the certificate which is generated next, needs to have the correct CommonName and/or SubjectAlternateNames.

Ensure you have populated the $CONTROLLER_PUBLIC_IP, $region and $CONTROLLER_PRIVATE_IP variables:

echo $region && echo $CONTROLLER_PUBLIC_IP && echo $CONTROLLER_PRIVATE_IP

which should show output similar to:

Output$ echo $region && echo $CONTROLLER_PUBLIC_IP && echo $CONTROLLER_PRIVATE_IP
ams2
188.166.252.4
10.130.158.66

The API Server will take the first IP in the Kubernetes Service IP range. In this tutorial we are using the 10.3.0.1/24 IP range for the cluster services (See [Running the Kubelet in standalone mode](#)). The IP used by the apiserver service within Kubernetes is thus 10.3.0.1 and needs to be included in the API server certificate. If you are using a different Service IP range, update the value in the configuration file below.

Now we are ready to prepare the openssl config file (see CoreOS OpenSSL tutorial).

cat > openssl.cnf <<EOF
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = kube-controller-01.$region
IP.1 = 10.3.0.1
IP.2 = $CONTROLLER_PUBLIC_IP
IP.3 = $CONTROLLER_PRIVATE_IP
EOF

<!-- TODO: change apiserver openssl.cnf

use

$ENV::CONTROLLER_PUBLIC_IP

instead of cat <<EOF...?

refer to worker config

-->

Generate the API server private key (apiserver-key.pem) which is needed to create the signing request:

openssl genrsa -out ~/.kube/apiserver-key.pem 2048

Generate the Certificate Signing Request (CSR):

openssl req -new -key ~/.kube/apiserver-key.pem -out apiserver.csr -subj "/CN=kube-apiserver" -config openssl.cnf

And finally, use the Certificate Authority to generate the signed API Server certificate (apiserver.pem):

openssl x509 -req -in apiserver.csr \
 -CA "$HOME/.kube/ca.pem" \
 -CAkey "$HOME/.kube/ca-key.pem" \
 -CAcreateserial \
 -out "$HOME/.kube/apiserver.pem" \
 -days 365 \
 -extensions v3_req \
 -extfile openssl.cnf

Note: the above command does not work on git-for-windows due to windows path conversions, it is recommended to copy the apiserver.csr and openssl.cnf to ~/.kube/ and just run the command from within the ~/.kube/ directory (without the "$HOME/.kube/" parts)

Copy the necessary certificates to the controller node. The core user does not have write permissions to /etc/kubernetes/ssl directly, thus we store the files in the home directory first.

scp ~/.kube/apiserver-key.pem ~/.kube/apiserver.pem ~/.kube/ca.pem core@$CONTROLLER_PUBLIC_IP:~

Move the certificates from the Home directory to the /etc/kubernetes/ssl path and fix the permissions by executing the following commands over ssh:

ssh core@$CONTROLLER_PUBLIC_IP <<EOF
sudo mkdir -p /etc/kubernetes/ssl/
sudo mv ~core/*.pem /etc/kubernetes/ssl/
sudo chown root:root /etc/kubernetes/ssl/*.pem
sudo chmod 600 /etc/kubernetes/ssl/*-key.pem
EOF

Troubleshooting: Review the certificate contents with the following command:

openssl x509 -text -noout -in apiserver.pem

As soon as the certificates are available it will take just a few minutes for all the Controller services to start running.

The kube-proxy service will start as soon as the kube-apiserver is available. As we specify the iptables mode, it will try to flush the userpace settings from iptables - which don't exist - this will show up in the log files, but can be ignored

Output$ journalctl -u kube-proxy -f
...
Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-HOST": exit status 1: iptables: No chain/target/match by that name.
Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-CONTAINER": exit status 1: iptables: No chain/target/match by that name.

If you started the docker ps -a watch on the controller, you should notice all containers being created by the kubelet.

We can confirm the apiserver authenticates itself with the certificate we provided and requires client authentication using curl. As the self-signed root CA used by the cluster is not trusted by our client, we need to pass it in:

curl -s https://$CONTROLLER_PUBLIC_IP/api/v1/namespaces --cacert ~/.kube/ca.pem -v

The -v flag allows us to see the verbose log of communication between our client and the apiserver. As we did not present our client certificate, the server responds with unauthorized.

Output...
* successfully set certificate verify locations:
*   CAfile: /home/demo/.kube/ca.pem
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
...
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-SHA
* ALPN, server accepted to use http/1.1
* Server certificate:
*        subject: CN=kube-apiserver
*        start date: Dec 15 08:15:44 2015 GMT
*        expire date: Dec 14 08:15:44 2016 GMT
*        subjectAltName: 188.166.252.4 matched
*        issuer: CN=kube-ca
*        SSL certificate verify ok.
...
< HTTP/1.1 401 Unauthorized
< Content-Type: text/plain; charset=utf-8
< Date: Wed, 16 Dec 2015 00:05:36 GMT
< Content-Length: 13
<
Unauthorized
...

We will need to authenticate by presenting a client certificate signed by our Kubernetes root CA, we will generate an admin certificate in the Administrator set up section of this tutorial.

At this stage we can configure our client to communicate with our Kubernetes Controller, although we do not have any worker nodes and won't be able to start the workload yet, this will ensure our configuration is working so far.

Note: We may re-use the same Controller cloud-config files to spin-up a cluster of Controller Droplets with a load balancer in front of it. In this case, our apiserver certificate should have included all necessary IP addresses (such as the Load Balancer IP) for proper TLS authentication.

Step 9 — Setting Up The Kubernetes Cluster Administrator

Generate the Cluster Administrator Keypair

Every administrator, needs to have a private key, which we generate using openssl as follows:

openssl genrsa -out ~/.kube/admin-key.pem 2048

Using his private key, the administrator needs to create a Certificate Signing Request (CSR):

openssl req -new -key ~/.kube/admin-key.pem -out admin.csr -subj "/CN=kube-admin"

To be authorized to connect to the Kubernetes apiserver, this admin.csr needs to be sent to and processed by the Kubernetes Cluster root CA to generate the signed admin.pem certificate:

openssl x509 -req -in admin.csr -CA ~/.kube/ca.pem -CAkey ~/.kube/ca-key.pem -CAcreateserial -out ~/.kube/admin.pem -days 365

From now on, the administrator can use his admin-key.pem and signed certificate admin.pem to connect to the Kubernetes cluster.

Test the freshly signed admin certificate by passing it in to the curl command we used earlier:

curl -s https://$CONTROLLER_PUBLIC_IP/api/v1/namespaces --cacert ~/.kube/ca.pem --cert ~/.kube/admin.pem --key ~/.kube/admin-key.pem

Now authenticated, this should return a json response containing all namespaces within our cluster. You can use Jq to simplify the output:

curl -s https://$CONTROLLER_PUBLIC_IP/api/v1/namespaces \
 --cacert ~/.kube/ca.pem --cert ~/.kube/admin.pem --key ~/.kube/admin-key.pem \
 | jq .items[].metadata.name

Instead of talking to the API directly, we will download and configure the command line tool kubectl.

Download Kubectl

As highlighted in [Step 5 — Downloading The Kubernetes Artifacts](#) of this tutorial, the kubectl binary can be downloaded from the Google cloud storage bucket.

For 64bit Linux clients:

sudo curl -Lo /opt/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kubectl
sudo chmod +x /opt/bin/kubectl

For 64bit OSX clients:

sudo curl -Lo /usr/local/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/darwin/amd64/kubectl
sudo chmod +x /usr/local/bin/kubectl

For 64bit Windows clients (for this tutorial, tested using git-for-windows) bash:

curl -Lo /usr/bin/kubectl.exe https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/windows/amd64/kubectl.exe

Before we can use kubectl, we need to understand how configuration is managed within Kubernetes. This section is also important for our worker node configuration as we will use these concepts to simplify the worker setup.

Introduction to Kubeconfig files

Several tools, Docker for example, rely on command-line flags and environment variables to configure the environment:

docker environment variables

DOCKER_HOST=tcp://192.168.99.101:2376
DOCKER_CERT_PATH=/home/demo/.docker/machines/.client
DOCKER_TLS_VERIFY=1
DOCKER_MACHINE_NAME=dev

When users have to work with multiple environments which require a different configuration however, managing several environment variables to define a single configuration becomes cumbersome, even more so when the combination of clusters and users allow for many different configurations as is the case with Kubernetes.

Docker opted to facilitate environment management by creating the docker-machine env command. This tool generates the necessary shell commands allowing users to easily switch the server their client talks to. The commands generated by docker-machine in turn need to support each shell (bash/fish/cmd/PowerShell/..) users may be using and ideally also auto-detect the shell in use.

For Kubernetes, kubeconfig files were created instead to store the environment definitions such as authentication and connection details as well as provide a mechanism to easily switch between multiple clusters and multiple user credentials. Kubernetes components were written to read the configuration from these config files including functionality for merging multiple configurations based on certain rules.

On one side, kubeconfig files store connection information for clusters in an associative array of name->cluster entries. A cluster entry consists of information such as the server to connect to, the api-version of the cluster and the certificate-authority for the cluster or a flag to skip verification of the authority which signed the server certificate (insecure-skip-tls-verify).

On the other side, kubeconfig files also store user credentials in a second associative array of name->user entries. A user entry defines user authentication mechanisms which may be:

Authentication through a client certificate,
Basic authentication with username and password or
Authentication through a bearer token

Decoupling users from clusters provides the ability to define cross cluster users only once. A user entry and a cluster entry combine to make up a context. Several such (cluster, user) pairs are then defined in a third associative array of name->context entries. Context entries also provide a namespace field to specify the Kubernetes namespace to be used for that context. The current-context may be set to define the context in use.

To declare the above components, kubeconfig files are written in YAML and similar to Pod manifests start with a versioned schema definition:

apiVersion: v1
kind: Config
...

In the next step we will manually write out a kubeconfig file to fully understand these concepts. We will also be using the kubectl tool to more easily manipulate kubeconfig files, with a series of kubectl config subcommands. Refer to the official Kubernetes kubectl config documentation for full details.

As mentioned, kubeconfig files also define a way multiple configurations may be merged together along with override options specified from the command line. See the loading and merging rules section of the Kubernetes documentation for a technical overview of these rules. We will only define a single kubeconfig file in this tutorial.

Configure Kubectl

As we have a theoretical understanding of what a kubeconfig if made up from, we will first manually write our default kubeconfig file (~/.kube/config).

Define the Digital Ocean cluster we just created as do-cluster:

~/.kube/config - snippet


apiVersion: v1
kind: Config
clusters:
- name: do-cluster
  cluster:
    certificate-authority: ca.pem
    server: https://$CONTROLLER_PUBLIC_IP

Note: Relative paths are supported, in this tutorial we store our certificates in the same directory as our kubeconfig file (~/.kube/), modify these values to mirror your own configuration.

Next, we define the admin user and specify the associated TLS assets we generated for certification based authentication:

~/.kube/config - snippet

...
users:
- name: admin
  user:
    client-certificate: admin.pem
    client-key: admin-key.pem

Followed by the definition of the context combining these two, which we will name the do-cluster-admin context:

~/.kube/config - snippet

...
contexts:
- name: do-cluster-admin
  context:
    cluster: do-cluster
    user: admin

As we did not specify a namespace for our context, the default namespace will be used.

We may set this as our current context in our kubeconfig file by adding the current-context: do-cluster-admin setting at the end.

Using the cat command to combine all the above snippets with a here-string for variable substitution, we write out the file as follows:

cat > ~/.kube/config<<EOF
apiVersion: v1
kind: Config
clusters:
- name: do-cluster
  cluster:
    certificate-authority: ca.pem
    server: https://$CONTROLLER_PUBLIC_IP
users:
- name: admin
  user:
    client-certificate: admin.pem
    client-key: admin-key.pem
contexts:
- name: do-cluster-admin
  context:
    cluster: do-cluster
    user: admin
current-context: do-cluster-admin
EOF

If the kubeconfig file is not passed in to kubectl through the --kubeconfig flag, it will first look for a kubeconfig file in the current directory as well as the $KUBECONFIG environment variable. If none of these are set, kubectl will use the default ~/.kube/config file we just created.

We may also generate the above file using kubectl with the following 4 commands:

Set the "do-cluster" entry:

kubectl config set-cluster do-cluster --server=https://$CONTROLLER_PUBLIC_IP --certificate-authority=$HOME/.kube/ca.pem

Set the "admin" user entry:

kubectl config set-credentials admin --client-key=$HOME/.kube/admin-key.pem --client-certificate=$HOME/.kube/admin.pem

Set the "do-cluster-admin" context:

kubectl config set-context do-cluster-admin --cluster=do-cluster --user=admin

Set the current-context:

kubectl config use-context do-cluster-admin

Confirm your configuration was successful with the following command:

kubectl version

If everything worked so far, this should return output similar to:

OutputClient Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.2", GitCommit:"3085895b8a70a3d985e9320a098e74f545546171", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.2", GitCommit:"3085895b8a70a3d985e9320a098e74f545546171", GitTreeState:"clean"}

We may confirm the pods running in the kube-system namespace with the following command

kubectl get pods --namespace=kube-system

Expected output looks like:

OutputNAME                                      READY     STATUS    RESTARTS   AGE
kube-controller-188.166.252.4             3/3       Running   0          3h
kube-controller-manager-188.166.252.4     1/1       Running   0          3h
kube-scheduler-188.166.252.4              1/1       Running   0          3h

Indicating all 3 containers (kube-apiserver, scheduler-elector & controller-manager-elector) of the kube-controller pod as well as the kube-controller-manager and kube-scheduler pods are running.

We now have our Etcd data store and first controller droplet ready, but we do not have any worker nodes yet to schedule workloads on.

kubectl get nodes

Returns an empty collection of worker nodes. We will spin up our first worker node next.

Step 10 — Provisioning The Kubernetes Worker Droplets

Configuring the workers is significantly less complicated and we can reuse many of the controllers configuration.

Ensure your DIGITALOCEAN_API_KEY, ETCD_PEER, CONTROLLER_PUBLIC_IP, CONTROLLER_PRIVATE_IP and region environment variables are set for the next steps.

The Etcd, Flannel & Docker services

As mentioned in the Etcd configuration section, we are running an Etcd daemon in proxy mode on each droplet for Flannel to access. As we are sharing our Etcd cluster between Flannel and Kubernetes we should note that exposing your Kubernetes data back end to each node is a bad practice. For production environments we should use a separate Etcd cluster to store the Flannel meta data configuration. If Flannel was not used to configure the overlay network, Etcd access would not be needed on the worker nodes at all.

Our cloud-config section for Etcd proxy daemon configuration as we saw before looks like this:

cloud-config-worker.yaml - etcd proxy snippet


coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"

We need to ensure Flannel starts and add a Drop-in for Docker to depend on flannel. (Flannel will use localhost to get it's network configuration)

[label cloud-config-worker.yaml - flannel snippet

coreos:
  units:
    - name: "flanneld.service"
      command: "start"
    - name: "docker.service"
      command: "start"
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service

The kubelet service

In order to facilitate secure communication between Kubernetes components, kubeconfig can also be used to define authentication settings for the kubelet. In this case, the kubelet and proxy are reading this configuration to communicate with the API. Refer back to the [Introduction to kubeconfig files](#) section of this tutorial for a detailed explanation of the kubeconfig specification.

Very similar to our previous kubeconfig file, we define a single cluster, in this case called "local" with a certificate-authority path. and a single user called "kubelet" with certificate based authentication through a worker private key and signed certificate. The combination of this user and cluster is defined as the "kubelet-context" and set as the current-context.

kubelet-kubeconfig


apiVersion: v1
kind: Config
clusters:
- name: local
  cluster:
    certificate-authority: /etc/kubernetes/ssl/ca.pem
users:
- name: kubelet
  user:
    client-certificate: /etc/kubernetes/ssl/worker.pem
    client-key: /etc/kubernetes/ssl/worker-key.pem
contexts:
- name: kubelet-context
  context:
    cluster: local
    user: kubelet
current-context: kubelet-context

Our kubelet parameters are

kubelet \
  --api-servers=https://CONTROLLER_PUBLIC_IP \
  --register-node=true \
  --allow-privileged=true \
  --config=/etc/kubernetes/manifests \
  --hostname-override=$public_ipv4 \
  --cluster-dns=10.3.0.10 \
  --cluster-domain=cluster.local \
  --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml

The --api-servers flag points to the https protocol to use port 443 and we use a placeholder for our CONTROLLER_PUBLIC_IP to make a generic cloud-config file which we can re-use for creating multiple clusters. We pass in the kubeconfig we described above with the --kubeconfig flag. Our worker nodes are registered to receive work by specifying the --register-node=true flag. We still configure our Kubelet to monitor a local directory for Pod manifests, although we will not be using this at this point. The remaining parameters are identical to the controller Droplets configuration.

Similar to our controller setup, we define a Systemd unit to wait for the Worker TLS assets to be in place and require the kube-kubelet service to depend on this unit.

cloud-config-worker.yaml - tls ready snippet


coreos:
  units:
    - name: "tls-ready.service"
      command: "start"
      content: |
        [Unit]
        Description=Ensure TLS assets are ready
        Requires=kube-kubelet.service
        After=kube-kubelet.service
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStartPre=-/usr/bin/mkdir -p /etc/kubernetes/ssl
        ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{worker,worker-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"

Add the tls-ready.service dependency to the kube-kubelet and kube-proxy services

cloud-config-worker.yaml - kubelet snippet


coreos:
  units:
    - name: "kube-kubelet.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=docker.service tls-ready.service
        After=docker.service tls-ready.service
        [Service]
        ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
        ExecStart=/usr/bin/kubelet \
        --api-servers=https://CONTROLLER_PUBLIC_IP \
        --register-node=true \
        --allow-privileged=true \
        --config=/etc/kubernetes/manifests \
        --hostname-override=$public_ipv4 \
        --cluster-dns=10.3.0.10 \
        --cluster-domain=cluster.local \
        --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml 
        Restart=always
        RestartSec=10

The kube-proxy Service

cloud-config-worker.yaml - kube-proxy snippet


coreos:
  units:
    - name: "kube-proxy.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Proxy
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target tls-ready.service
        After=network-online.target tls-ready.service
        [Service]
        ExecStartPre=-/usr/bin/mkdir -p /opt/bin
        ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
        ExecStart=/opt/bin/kube-proxy \
        --master=https://CONTROLLER_PUBLIC_IP \
        --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
        --proxy-mode=iptables \
        --hostname-override=$public_ipv4
        Restart=always
        RestartSec=10

The Final Worker cloud-config with all CoreOS Units

etcd2.service snippet to start a local Etcd proxy, notice the ETCD_PEER placeholder.
flanneld.service snippet to start the overlay network daemon with a drop-in to configure the network subnet
docker.service drop-in snippet to add flannel dependency
tls-ready.service to block other units until the TLS assets for the worker have been put in place
kubelet.service snippet running the kubelet to register with our controller nodes
kube-proxy.service snippet running the kube-proxy service

cloud-config-worker.yaml


#cloud-config

write-files:
  - path: /etc/kubernetes/worker-kubeconfig.yaml
    permissions: '0644'
    content: |
      apiVersion: v1
      kind: Config
      clusters:
      - name: local
        cluster:
          certificate-authority: /etc/kubernetes/ssl/ca.pem
      users:
      - name: kubelet
        user:
          client-certificate: /etc/kubernetes/ssl/worker.pem
          client-key: /etc/kubernetes/ssl/worker-key.pem
      contexts:
      - name: kubelet-context
        context:
          cluster: local
          user: kubelet
      current-context: kubelet-context
coreos:
  etcd2:
    proxy: on 
    listen-client-urls: http://localhost:2379
    initial-cluster: "etcd-01=ETCD_PEER"
  units:
    - name: "etcd2.service"
      command: "start"
    - name: "flanneld.service"
      command: "start"
    - name: "docker.service"
      command: "start"
      drop-ins:
        - name: 40-flannel.conf
          content: |
            [Unit]
            Requires=flanneld.service
            After=flanneld.service
    - name: "tls-ready.service"
      command: "start"
      content: |
        [Unit]
        Description=Ensure TLS assets are ready
        Requires=docker.service
        After=docker.service
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStartPre=-/usr/bin/mkdir -p /etc/kubernetes/ssl
        ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{worker,worker-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
    - name: "kube-proxy.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Proxy
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=network-online.target tls-ready.service
        After=network-online.target tls-ready.service
        [Service]
        ExecStartPre=-/usr/bin/mkdir -p /opt/bin
        ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
        ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
        ExecStart=/opt/bin/kube-proxy \
        --master=https://CONTROLLER_PUBLIC_IP \
        --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
        --proxy-mode=iptables \
        --hostname-override=$public_ipv4
        Restart=always
        RestartSec=10
    - name: "kube-kubelet.service"
      command: "start"
      content: |
        [Unit]
        Description=Kubernetes Kubelet
        Documentation=https://github.com/kubernetes/kubernetes
        Requires=docker.service tls-ready.service
        After=docker.service tls-ready.service
        [Service]
        ExecStart=/usr/bin/kubelet \
        --api-servers=https://CONTROLLER_PUBLIC_IP \
        --register-node=true \
        --hostname-override=$public_ipv4 \
        --cluster-dns=10.3.0.10 \
        --cluster-domain=cluster.local \
        --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml
        Restart=always
        RestartSec=10

Use the above template to generate the worker node cloud config for this Digital Ocean cluster:

sed -e "s|ETCD_PEER|${ETCD_PEER}|g;s|CONTROLLER_PUBLIC_IP|${CONTROLLER_PUBLIC_IP}|g;" cloud-config-worker.yaml > kube-worker.yaml

And send the command to create the Droplet

doctl d c --wait-for-active \
    -i "CoreOS-alpha" \
    -s 512mb \
    -r "$region" \
    -p \
    -k k8s-key \
    -uf kube-worker.yaml kube-worker-01

Note: running free -m after freshly starting all Kubernetes services on a 512mb Worker Droplet shows 129mb free, consider using 1024mb Droplets

We refresh the Droplet configuration and cache the json string returned in the $WORKER_JSON environment variable for subsequent commands.

WORKER_JSON=`doctl -f 'json' d f kube-worker-01.$region`

We parse the private and public IPs out as explained in the [Working with doctl responses](#) section of this tutorial.

WORKER_PUBLIC_IP=`echo $WORKER_JSON | jq -r '.networks.v4[] | select(.type == "public") | .ip_address'`
WORKER_PRIVATE_IP=`echo $WORKER_JSON | jq -r '.networks.v4[] | select(.type == "private") | .ip_address'`

Confirm

echo $WORKER_PUBLIC_IP && echo $WORKER_PRIVATE_IP

Troubleshooting:

ssh core@$WORKER_PUBLIC_IP

journalctl -u oem-cloudinit -f

watch -n 1 'docker ps --format="table {{.Image}}\t{{.ID}}\t{{.Status}}\t{{.Ports}}" -a'

Generating and Transferring the Worker TLS assets

As it is recommended to generate a unique certificate per worker, we will do so and transfer it to our worker droplet now.

The IP addresses and fully qualified hostnames of all worker nodes will be needed. The certificates generated for the worker nodes will need to reflect how requests will be routed to those nodes. In most cases this will be a routable IP and/or a routable hostname. These will be unique per worker; when you see them used below, consider it a loop and do that step for each worker.

This procedure generates a unique TLS certificate for every Kubernetes worker node in your cluster. While unique certificates are less convenient to generate and deploy, they do provide stronger security assurances and the most portable installation experience across multiple cloud-based and on-premises Kubernetes deployments.

We will use a common openssl configuration file for all workers. The certificate output will be customized per worker based on environment variables used in conjunction with the configuration file. Create the file worker-openssl.cnf on your local machine with the following contents.

~/.kube/worker-openssl.cnf


[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
IP.1 = $ENV::WORKER_IP

Generate the private key for our first Worker Droplet

openssl genrsa -out ~/.kube/worker-01-key.pem 2048

Generate the Certificate Signing Request, substituting the WORKER_IP environment variable:

WORKER_IP=${WORKER_PRIVATE_IP} openssl req -new -key ~/.kube/worker-01-key.pem -out worker-01.csr -subj "/CN=kube-worker-01" -config worker-openssl.cnf

Generate the worker certificate

WORKER_IP=${WORKER_PRIVATE_IP} openssl x509 -req -in worker-01.csr \
 -CA "$HOME/.kube/ca.pem" \
 -CAkey "$HOME/.kube/ca-key.pem" \
 -CAcreateserial \
 -out "$HOME/.kube/worker-01.pem" \
 -days 365 \
 -extensions v3_req \
 -extfile worker-openssl.cnf

Note: the above command does not work on git-for-windows due to windows path conversions, it is recommended to copy the worker-01.csr and worker-openssl.cnf to ~/.kube/ and just run the command from within the ~/.kube/ directory (without the "$HOME/.kube/" parts)

Copy the necessary certificates to the controller node. We store the files in the home directory first.

scp ~/.kube/worker-01-key.pem ~/.kube/worker-01.pem ~/.kube/ca.pem core@$WORKER_PUBLIC_IP:~

Move the certificates from the Home directory to the /etc/kubernetes/ssl path, fix the permissions and create links to match our generic kubeconfig (which expects /etc/kubernetes/worker-key.pem instead of /etc/kubernetes/worker-01-key.pem) by executing the following commands over ssh:

ssh core@$WORKER_PUBLIC_IP <<EOF
sudo mkdir -p /etc/kubernetes/ssl/
sudo mv ~core/*.pem /etc/kubernetes/ssl/
sudo chown root:root /etc/kubernetes/ssl/*.pem
sudo chmod 600 /etc/kubernetes/ssl/*-key.pem
sudo ln -s /etc/kubernetes/ssl/worker-01.pem /etc/kubernetes/ssl/worker.pem
sudo ln -s /etc/kubernetes/ssl/worker-01-key.pem /etc/kubernetes/ssl/worker-key.pem
EOF

As soon as the certificates are available it will take just a few minutes for the kubelet and kube-proxy to start running on the worker and register with the Controller.

We can verify by watching the kubectl get nodes:

kubectl get nodes

Which should show output as follows:

OutputNAME              LABELS                                   STATUS    AGE
128.199.203.205   kubernetes.io/hostname=128.199.203.205   Ready     9m

We may repeat the above steps to create additional Worker droplets with their own TLS assets. We now have a working Kubernetes cluster, ready to start running our containerized applications. To facilitate the application deployment however, we are recommended to run a few cluster services and will proceed to do so in the next step.

Step 11 — Running Kubernetes Cluster Services

Several cluster services are provided as cluster add-ons (UI/Dashboard, Image Registry, DNS, ...). Deploying these add-ons is optional, but availability of some of these services is often expected by Kubernetes users. A full listing of all supported add-ons can be found within the Kubernetes GitHub repository at kubernetes/cluster/addons/.

Add-ons are built on the same Kubernetes components as user-submitted jobs — Pods, Replication Controllers and Services, however, cluster add-ons are expected to specify the label: kubernetes.io/cluster-service: "true".

One such cluster add-on facilitates the discovery of services running within Kubernetes, we will first define the problem and the options Kubernetes provides to solve this problem.

When Pods depend on each other (for example: front end services may depend on back end services), mechanisms need to be in place to enable service discovery. Within Kubernetes, Pods are short lived objects and their IPs change over time due to crashes or scheduling changes. Because of this, addressing Pods directly has now become difficult, thus Kubernetes introduced the concept of Service objects to address this problem. Service objects are long lived objects which get a static Virtual IP within the cluster, usually referred to as their clusterIP, to address sets of Pods internally or externally to the cluster. This clusterIP is stable as long as the Service object exists. Kubernetes sets up a load balancer forwarding traffic through this clusterIP to the Service EndPoints, unless you explicitly disable the load balancer (by setting clusterIP to none) and expect to work with a list of the Service EndPoints directly. Such Services without a clusterIP are called Headless. Service objects may also be created for services running outside of the Kubernetes cluster (by omitting the Pod selector) as long as you manually create the EndPoint definitions for these external services. Full details on how to do this are available within the official Kubernetes documentation.

Once Service objects have been defined, Kubernetes provides 2 ways of finding them:

Through Environment variables, or
Using DNS

Upon Pod creation, the kubelet adds a set of environment variables for each active Service within the same namespace, similar to how Docker links worked. These environment variables enforce an ordering requirement as any Service that a Pod wants to access must be created before the Pod itself and may require applications to be modified before they can run within Kubernetes. If we use DNS to discover services, we do not have these restrictions, but we are required to deploy the DNS cluster add-on.

As part of this tutorial we ensure our Kubernetes cluster is integrated with DNS for Service discovery by deploying the DNS add-on with kubectl.

DNS Integration with Kubernetes

When enabled, the DNS add-on for Kubernetes will assign a DNS name for every Service object defined in the Kubernetes cluster.

At the time of writing, the DNS protocol implementation for the DNS add-on is provided by SkyDNS. SkyDNS is configured as a slave to the Kubernetes API Server with custom logic implemented in a bridge component called Kube2sky. SkyDNS itself is only a thin layer over Etcd to translate Etcd keys and values to the DNS protocol. In this way, SkyDNS can be as highly available and stable as the underlying Etcd cluster. We will have a closer look at how each of these 3 components work together and how we will deploy them as a single Pod.

<!--
Services announce their availability by sending a POST with a small JSON payload, Each service has a Time To Live that allows SkyDNS to expire the records for the services that haven't update their availability within the TTL window. Services can send a periodic POST to SkyDNS updating their TTL to keep them in the pool.

-->

We will create a Replication Controller to run the DNS Pod and a Service to expose its ports.

Our Replication Controller manifest starts, as we saw earlier, with the schema definition and a metadata section:

DNS Add-on Manifest - snippet

apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v9
  namespace: kube-system
  ...

A Replication Controller can be thought of as a process supervisor, but which supervises multiple Pods across multiple nodes instead of individual processes on a single node. The Replication Controller creates Pods from a template and uses labels and selectors to monitor the actual running Pods. The selector finds Pods within the cluster by label, the labels we'll use for this Replication Controller are the k8s-app and version labels. We specify these labels together with the kubernetes.io/cluster-service: "true" label required for cluster add-ons in the PodTemplateSpec as well as attach them to our Replication Controller itself.

At the time of writing we are using version 9 and refer to the DNS add-on as the kube-dns app. By default Replication Controllers will run 1 replica, but we explicitly set the replicas field to 1 for clarity in our spec. This looks as follows in our manifest file:

DNS Add-on Manifest - snippet

apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v9
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    version: v9
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v9
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v9
        kubernetes.io/cluster-service: "true"
    spec:
      volumes:
  ...

We added the labels to the Replication Controller object in its metadata field. In the ReplicationControllerSpec we set the labels for the Pod template.metadata and also set the replicas and selector values. Let's look at the volumes and containers defined in the PodTemplateSpec next.

We will only define a volume for Etcd. Giving Etcd a volume, outside of the union filesystem used by container runtimes such as Docker, will ensure optimal performance by reducing filesystem overhead. As the data is just a scratch space and it's fine to lose the data when the Pod is rescheduled on a different Node, it is sufficient to use an EmptyDir-type volume:

DNS Add-on Manifest - snippet

...
      volumes:
      - name: etcd-storage
        emptyDir: {}
...

Let's look at the actual definitions of the containers in the Pod template. We see a container for each of the 3 components described earlier as well as an ExecHealthz sidecar container:

Etcd - the storage for SkyDNS
Kube2sky - the glue between SkyDNS and Kubernetes
SkyDNS - the DNS server
ExecHealthz - sidecar container for health monitoring, see details below.

The Etcd instance used by the DNS add-on is best ran separately from the Etcd cluster used by the Kubernetes API Services. For simplicity we run Etcd within the same Pod as our SkyDNS and Kube2sky components. This is sufficient considering the DNS add-on only requires a small subset of everything Etcd has to offer.

For the Etcd container we will use the busybox-based image available on the Google container registry, refer to the kubernetes/cluster/images/etcd repository on GitHub to see the full details of how that image is made.

DNS Add-on Manifest - snippet

...
      - name: etcd
        image: gcr.io/google_containers/etcd:2.0.9
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        command:
        - /usr/local/bin/etcd
        - -data-dir
        - /var/etcd/data
        - -listen-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -advertise-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -initial-cluster-token
        - skydns-etcd
        volumeMounts:
        - name: etcd-storage
          mountPath: /var/etcd/data
...

We run this container with the etcd-storage volume mounted and used as the Etcd data-dir. We configure Etcd to listen on localhost for connections on both the IANA-assigned 2379 port as well as the legacy 4001 port, this is required for Kube2sky and SkyDNS which still connect to port 4001 by default.

This spec also applies resource limits which define an upper bound on the maximum amount of resources that will be made available to this container. Resource limits are crucial to enable the scheduling components within Kubernetes to be effective. Without a definition of the required resources, schedulers can do little more than round robin assignments. The CPU resource is defined in Compute Units per second (KCU) and in this case the unit is milli-KCUs, where 1 KCU will roughly be equivalent to a single CPU hyperthreaded core for some recent x86 processor. The memory resource is defined in bytes. For a full overview of Resource management within Kubernetes, refer to the official Kubernetes resource guidelines.

The Kube2sky container uses another Busybox based-image made available on the Google Container Registry, refer to the kubernetes/cluster/addons/dns/kube2sky repository on GitHub to see the source for that image.

DNS Add-on Manifest - snippet

...
      - name: kube2sky
        image: gcr.io/google_containers/kube2sky:1.11
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        - -domain=cluster.local
...

The Kube2sky Docker image has the Entrypoint set to /kube2sky, thus we only need to pass on the -domain under which we want all DNS names to be hosted through the args array. This should match our kubelet configuration which we set to cluster.local in this tutorial, modify this value to mirror your own configuration.

Kube2sky discovers and authenticates with the Kubernetes API Service through environment variables provisioned and secrets mounted by the kubelet into the container, we will have a closer look at these in [Step 12 — Running Kubernetes Cluster Services](#). Once authenticated and connected, Kube2sky watches the Kubernetes API Service for changes in Service objects and publishes those changes to Etcd for SkyDNS. SkyDNS supports A and AAAA records to handle "legacy" services. With A/AAAA records the port number must be known by the client connection because that information is not in the returned records. Given we defined our cluster domain as cluster.local, the keys created by Kube2sky and served by SkyDNS will have the following DNS naming scheme:

A Records naming scheme

<service_name>.<namespace_name>.svc.cluster.local

For example: for a Service called "my-service" in the "default" namespace, an A record for my-service.default.svc.cluster.local is created. Other Pods within the same default namespace should be able to find the service simply by doing a name lookup for my-service, Pods which exist in other namespaces must use the fully qualified name.

For Service objects which define named ports, Kube2sky ensures SRV records are created with the following naming scheme:

named ports SRV records naming scheme

_<port_name>._<port_protocol>.<service_name>.<namespace_name>.svc.cluster.local

For example, If the Service called "my-service" in the "default" namespace has a port named "http" with a protocol of TCP, you can do a DNS SRV query for "http.tcp.my-service.default.svc.cluster.local" to discover the port number for "http".

We will confirm the above DNS records are served correctly after we have deployed the DNS add-on to our cluster.

The skynetservices/skydns image based on Alpine Linux is available on the Docker Hub at about 19MB and comes with dig. The official kubernetes/cluster/addons/dns/skydns add-on uses a busybox based image at about 41MB without dig. The discussion as to which image should be used in the long run can be followed on GitHub. In this tutorial we opt to use the skynetservices/skydns image as the version tags are slightly more intuitive:

DNS Add-on Manifest - snippet

...
      - name: skydns
        image: skynetservices/skydns:2.5.3a
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        - -machines=http://localhost:4001
        - -addr=0.0.0.0:53
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
...

The EntryPoint for both images is /skydns/, thus we only need to pass in 3 arguments. We point SkyDNS to the Etcd instance running within the Pod through the -machines flag. We define the address we want SkyDNS to bind to through the -addr flag and we specify the domain we want SkyDNS to serve records within through the -domain flag. We also expose the port SkyDNS is bound to on the Pod through named ports for both TCP and UDP protocols.

To monitor the health of the container with liveness probes, we run a health server as a sidecar container using the ExecHealthz utility. By running a sidecar container, we do not make these liveness probes dependent on the container runtime to execute commands directly in the SkyDNS container (which also requires those binaries to be available within the container image). Instead our sidecar container will provide the /healthz http endpoint, this usage of a sidecar container illustrates very well the concept of creating single purpose and re-usable components and the power of Pods to bundle them. This is one of the fundamental features of Kubernetes Pods and you may reuse these Kubernetes components for your own application setup.

The ExecHealthz image available on the Google Container Registry uses Busybox as a base image. We use the nslookup utility bundled with Busybox for liveness probes as dig is not available in this image.

Add the ExecHealthz container with the following container spec:

DNS Add-on Manifest - snippet

...
      - name: healthz
        image: gcr.io/google_containers/exechealthz:1.0
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
        args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
        - -port=8080
        ports:
        - containerPort: 8080
          protocol: TCP
...

Our health check does a simple probe for the Kubernetes API service which, as discussed above, SkyDNS should serve under the kubernetes.default.svc.cluster.local DNS record.

We can now add the liveness and readiness probes via this sidecar health server to report on the health status of our SkyDNS container:

DNS Add-on Manifest - snippet

      - name: skydns
        image: skynetservices/skydns:2.5.3a
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        - -machines=http://localhost:4001
        - -addr=0.0.0.0:53
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        <^>livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 1
          timeoutSeconds: 5<^>

The full manifest of the Replication Controller for the kube-dns-v9 add-on will be listed next for your reference, we will look at the manifest for the Service right after.

skydns-rc.yaml


apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-dns-v9
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    version: v9
    kubernetes.io/cluster-service: "true"
spec:
  replicas: 1
  selector:
    k8s-app: kube-dns
    version: v9
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        version: v9
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: etcd
        image: gcr.io/google_containers/etcd:2.0.9
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        command:
        - /usr/local/bin/etcd
        - -data-dir
        - /var/etcd/data
        - -listen-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -advertise-client-urls
        - http://127.0.0.1:2379,http://127.0.0.1:4001
        - -initial-cluster-token
        - skydns-etcd
        volumeMounts:
        - name: etcd-storage
          mountPath: /var/etcd/data
      - name: kube2sky
        image: gcr.io/google_containers/kube2sky:1.11
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        - -domain=cluster.local
      - name: skydns
        image: skynetservices/skydns:2.5.3a
        resources:
          limits:
            cpu: 100m
            memory: 50Mi
        args:
        - -machines=http://localhost:4001
        - -addr=0.0.0.0:53
        - -domain=cluster.local.
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 1
          timeoutSeconds: 5
      - name: healthz
        image: gcr.io/google_containers/exechealthz:1.0
        resources:
          limits:
            cpu: 10m
            memory: 20Mi
        args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
        - -port=8080
        ports:
        - containerPort: 8080
          protocol: TCP
      volumes:
      - name: etcd-storage
        emptyDir: {}
      dnsPolicy: Default

The kube-dns Service will expose the DNS Pod internally to the cluster on the fixed IP we assigned for our DNS server, this clusterIP has to match the value we passed to all our kubelets previously, which is 10.3.0.10 in this tutorial. Modify this value to mirror your own configuration. The full Service definition is listed below:

skydns-svc.yaml


apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "KubeDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.3.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP

In our Service object metadata we attach the same k8s-app label as our Pods and Replication Controller as well as the necessary labels for Kubernetes add-on services. In our ServiceSpec our selector, used to route traffic to Pods with matching labels, only specifies the k8s-app label. This does not specify the version, allowing us to do rolling updates of our DNS add-on in the future, see the Rolling Update Example for more details. Finally, we also define named ports for the DNS service on both TCP and UDP protocols. We will later confirm SRV records exist for these named ports of the kube-dns service in the kube-system namespace itself.

Note: Multiple yaml documents can be concatenated with the --- separator. We may simplify management of multiple resources by grouping them together in the same file separated by ---, we may just specify multiple resources through multiple -f arguments for the kubectl create command. See the official Managing Deployments Guide

The resources will be created in the order they appear in the file. Therefore, it's best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the replication controller(s).

use kubectl with multiple -f arguments:

kubectl create -f ./skydns-svc.yaml -f ./skydns-rc.yaml

And wait for the DNS add-on to start running

kubectl get pods --namespace=kube-system | grep kube-dns-v9

Create a Busybox pod to test the DNS from within the cluster using the following Pod manifest:

busybox.yaml


apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always

This busybox will sleep for 1 hour before exiting and being restarted by the kubelet, we will use it to test nslookup commands from within the cluster. Create the Pod:

kubectl create -f busybox.yaml

Although it seems we are only creating a Pod, Kubernetes will create a Replication Controller to manage this Pod for us. After a few seconds, confirm the Pod is running:

kubectl get pods busybox

When the Pod is running, output will look as follows:

OutputNAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   0          14s

Do a DNS lookup from within the busybox Pod on your client's terminal with the following command:

kubectl exec busybox -- nslookup kubernetes.default

The expected output should look as follows:

OutputServer:    10.3.0.10
Address 1: 10.3.0.10

Name:      kubernetes.default
Address 1: 10.3.0.1

If you are using the skynetservices/skydns:5.2.3a image, you may use the dig binary within to confirm the SRV records for the named ports are served as expected (the nslookup utility bundled in the busybox Pod does not support SRV queries).

To do this, get the name of the kube-dns Pod created by the kube-dns Replication Controller (Pod names are dynamic and change when they are restarted):

dns_pod=`kubectl --namespace=kube-system get po | grep kube-dns | awk '{ print $1}'`

Open an interactive shell into the SkyDNS container of your kube-dns pod:

kubectl --namespace=kube-system exec $dns_pod -c skydns -it sh

We specify the container we want to execute commands in through the -c option of the kubectl exec command.

Use dig to query the SRV record for the port named dns using the UDP protocol and sed to only print the response from the ANSWER SECTION to the Query Time lines:

dig @localhost SRV _dns._udp.kube-dns.kube-system.svc.cluster.local | sed -n '/ANSWER SECTION:/,/Query time/ p'

We are using sed with the -n option to suppress all output, we specify a range of regular expression patterns (/ANSWER SECTION/,/Query time/) and instruct sed to print only lines within this range with the p command

The expected output should look as follows:

Output;; ANSWER SECTION:
_dns._udp.kube-dns.kube-system.svc.cluster.local. 30 IN SRV 10 100 53 kube-dns.kube-system.svc.cluster.local.

;; ADDITIONAL SECTION:
kube-dns.kube-system.svc.cluster.local. 30 IN A 10.3.0.10

;; Query time: 3 msec

As you can see, using the SRV records created by the kube-dns add-on, we are able to get the port as well as the IP.

Refer also to the Official DNS Integration documentation and the DNS Add-on repository.

Step 12 — Deploying Kubernetes-ready applications

You should now have a Kubernetes cluster set up and be able to deploy Kubernetes-ready applications.

To better understand the inner workings of Kubernetes from a Pod perspective, we may use the Kubernetes Pod Inspector application by Kelsey Hightower. With the below yaml file combining both the Replication Controller and Service, you can quickly deploy and expose this application on your cluster:

kube-inspector.yaml


apiVersion: v1
kind: Service
metadata:
  name: inspector
  labels:
    app: inspector
spec:
  type: NodePort
  selector:
    app: inspector
  ports:
  - name: http
    nodePort: 31000
    port: 80
    protocol: TCP

---

apiVersion: v1
kind: ReplicationController
metadata:
  name: inspector-stable
  labels:
    app: inspector
    track: stable
spec:
  replicas: 1
  selector:
    app: inspector
    track: stable
  template:
    metadata:
      labels:
        app: inspector
        track: stable
    spec:
      containers:
      - name: inspector
        image: b.gcr.io/kuar/inspector:1.0.0

As seen previously, we provide our Replication Controller with the necessary labels and use the b.gcr.io/kuar/inspector:1.0.0 image. Note that we are exposing the inspector application by telling Kubernetes to open port 31000 on every worker node (this will work if you ran the API service with --service-node-port-range=30000-37000 as shown in Step 6 of this Tutorial).

Expected Output:

OutputYou have exposed your service on an external port on all nodes in your
cluster.  If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:31000) to serve traffic.

See http://releases.k8s.io/release-1.1/docs/user-guide/services-firewalls.md for more details.
<^>service "inspector" created
replicationcontroller "inspector-stable" created<^>

We can now point our web browser to http://$WORKER_PUBLIC_IP:31000/env on any worker node to access the Inspector Pod and view all environment variables published by the kubelet as well as access http://$WORKER_PUBLIC_IP:31000/mnt?path=/var/run/secrets/kubernetes.io/serviceaccount to see the secrets mounted into the Pods. To see how Kubernetes-ready applications can use these, refer to the InClusterConfig function of the client Kubernetes client helper library and the KubeClient Setup) section of Kube2Sky as an example implementation.

You may now proceed to set up a multi-tier web application (Guestbook) from the official Kubernetes documentation to visualize how the various Kubernetes components fit together.

See the Guestbook Example app from the Official Kubernetes documentation.

Conclusion

Following this Tutorial you have created a fully functional Kubernetes cluster. This gives you a great management and scheduling interface for working with services in logical groupings. As you have used many of the Kubernetes concepts to set up the cluster itself, you have a deep understanding of many of the core concepts and deployment workflow of Kubernetes. To review all the Kubernetes concepts, refer to the official Kubernetes Concept Guide.

You probably noticed that the steps above were still very manual, but the Cloud-config files created are flexible enough for you to automate the process.

Deleting your Kubernetes Cluster

If you decide you do no longer want to run this cluster (or want to start from scratch), below are the commands to do so:

NOTE These commands destroy your cluster and all the data contained within without any backups, these commands are irreversible.

Repeat for every controller

doctl d d kube-controller-01.$region

Repeat for every worker

doctl d d kube-worker-01.$region

Delete all Kubernetes data

doctl d d etcd-01.$region

Delete the apiserver & worker certificates as they are tied to the IPs of the Droplets, but keep the Admin and CA certificates.

rm ~/.kube/apiserver*.{pem,csr}
rm ~/.kube/worker*.{pem,csr}
rm *.srl
rm *openssl.cnf