How To Provision a Kubernetes Cluster Using CoreOS
Introduction
Kubernetes, often abreviated as k8s, is a system designed to manage applications built within containers across a cluster of nodes. It handles the entire life cycle of a containerized application including deployment and scaling. Kubernetes was designed within Google and released as open source.
With AWS, Red Hat, Microsoft, IBM, Mirantis OpenStack, and VMware (and the list keeps growing) working to integrate Kubernetes into their platforms, going through this tutorial will provide you with a working cluster on Digital Ocean and a strong foundation as well as fundamental understanding of a framework that is here to stay.
In this tutorial, we will give step-by-step instructions on how to create a single-controller/multi-worker Kubernetes cluster on CoreOS hosted by Digital Ocean. This system will allow us to group related services together for deployment as a unit on a single host using what Kubernetes calls "Pods". Kubernetes also provides health checking functionality, high availability, and efficient usage of resources through schedulers.
This tutorial was tested with Kubernetes v1.1.2. Keep in mind that this software changes frequently.
Prerequisites and goals
We will provision each component of our Kubernetes cluster as part of this tutorial, no existing architecture is required. Experience with Docker is expected, experience with Systemd and CoreOS is a plus, but each concept is introduced and explained as part of this tutorial. If you are not familiar with CoreOS, it may be helpful to review some basic information about the CoreOS system.
After a high-level overview of the Kubernetes Architecture, we will configure our client machine to work with our Digital Ocean resources from the terminal using Doctl & Jq. Once this is done we will be able to quickly and repeatedly provision our droplets with cloud-config
files. This allows us to declaratively customize network configuration, Systemd units, and other OS-level items. We will also ensure a Public Key Infrastructure (PKI) is available by going through the instructions to set up a local Certificate Authority.
Will then first provision an Etcd cluster to reliably provide storage of meta data across a cluster of machines. Etcd provides a great way to store configuration data reliably for Kubernetes. Thanks to the watch support provided by Etcd, coordinating components can be notified very quickly of changes. This component is crucial to our Kubernetes cluster.
With the help of our Etcd cluster, we will also configure Flannel, a network fabric layer that provides each machine with an individual subnet for container communication. This satisfies a fundamental requirement for running a Kubernetes cluster. Docker will be configured to use this networking layer for its containers.
We will provision our Kubernetes controller Droplet and to ensure the security of our Kubernetes cluster, we will generate the required certificates for communication between Kubernetes components using openssl
& securely transfer these using scp
to each Droplet. We will configure the command line client utility, kubectl
, to work with our cluster from our client next.
Finally, we will provision worker nodes pointing to the controller nodes and deploy the internal cluster DNS through the DNS add-on. We will have a fully functional Kubernetes cluster allowing us to deploy our workloads and easily add worker nodes as required with the cloud-config
files created through this tutorial.
Working through this tutorial may take you a few hours, but it will give you a good understanding of the moving pieces of your cluster and set you up for success in the long run.
The structure and idea for this tutorial was taken from the Getting started with CoreOS and Kubernetes Guide and updated with detailed step by step instructions for Digital Ocean. Let's get started.
Kubernetes Architectural Overview
In this section we will give an overview of the Kubernetes Architecture. For a more detailed look, refer to the official Kubernetes documentation.
At a high level, we need to differentiate the services that run on every node, referred to as node agents (kubelet
, ... ), the controller services (APIs, scheduler, ...) that compose the cluster-level control plane and the distributed storage solution (Etcd).
A crucial component which runs on every node is the kubelet. The kubelet is responsible for what's running on each individual Droplet and making sure it keeps running. The kubelet controls the container runtime, in this tutorial Docker provides the container runtime and must also run on each node. Docker takes care of the details of downloading images and running containers. The kubelet registers nodes with the cluster, sends events and status updates and reports the resource utilization of the node.
To facilitate routing between containers as well as simplify service discovery, each node also runs the kube-proxy. The proxy is a simple network proxy and load balancer which can do simple TCP and UDP stream forwarding (round robin) across a set of back ends. The proxy is a crucial part for the Kubernetes services model. The proxy communicates with the controller services to keep up to date. See the Kubernetes' services FAQ for more details.
Worker node services are configured to be managed from the controller services. <!-- this is meant to highlight the difference between the same service deployed on worker nodes vs controller nodes. On worker nodes, the services register with the controller nodes, on controller nodes they are often bootstrapped... -->
The first controller service we will highlight is the API server. The API server serves up the Kubernetes API through a REST interface. It is intended to be a CRUD-y service, with most/all business logic implemented in separate components or in plug-ins. It is responsible for validating the requests and updating the corresponding objects in Etcd. The API server is void of state and will be the main component replicated and load balanced across controller nodes in a High Availability configuration.
The second controller service to highlight is the scheduler. The scheduler is responsible for assigning workloads to nodes in the cluster. This component watches the API Server and uses the binding API to apply its scheduling decisions. The scheduler is pluggable and support for multiple cluster schedulers and even user-provided schedulers is expected, but not available yet in version 1.1.2.
All other cluster-level functions are performed by the controller manager component at the time of writing. This component embeds the core control loops shipped with Kubernetes. Each controller is an active manager that watches the shared state of the cluster through the API Server and makes changes attempting to move the observed state towards the desired state. These controllers may eventually be split into separate components in future Kubernetes versions to make them independently pluggable.
As the scheduler and controller manager components modify cluster state, only one instance of each can run within the cluster. In High Availability configurations a process of master election is required for these components. We will explain and apply master election for these components as part of this tutorial, we will however only provision 1 controller node and no control plane load balancer. Setting up the control plane load balancers and appropriate TLS artifacts are left as an exercise for the reader.
Below is a high level diagram of these Kubernetes components in a High Availability set-up.
<!-- created with http://asciiflow.com/ -->
Etcd Controller Nodes Worker Nodes
+--------------------+ +--------------------+
| | | |
+--------------------+ +---+ API Server <---------+ +------------------------+ Kubelet |
| | | | | | | | |
| Etcd cluster <----------+ | Controller Manager*| | | | Docker |
| | | | | | | |
| | | Scheduler* | | | | Proxy |
| | | | | | | |
| | | Kubelet | | | | |
| | | | | | | |
| | | Docker | | | | |
| | | | +-+--v---------------+ | |
| | | Proxy | | | | |
| | | | | Control Plane | | |
| | +--------------------+ | | +--------------------+
| | | Load Balancer |
+-^--^---------------+ +--------------------+ | | +--------------------+
| | | | | | | |
| +------------------------------+ API Server <-------+ <--------+ Kubelet |
| | | | | | |
| | Kubelet | | | | Docker |
| | | | | | |
| | Docker | | | | Proxy |
| | | | | | |
| | Proxy | | | | |
| | | +-+--^---------------+ | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| +--------------------+ | | +--------------------+
| | |
| +--------------------+ | | +--------------------+
| | | | | | |
+---------------------------------+ API Server <---------+ +------------------------+ Kubelet |
| | | |
| Kubelet | | Docker |
| | | |
| Docker | | Proxy |
| | | |
| Proxy | | |
| | | |
| | | |
| | | |
| | | |
| | | |
+--------------------+ +--------------------+
Refer to the official diagram for a more detailed break down: http://kubernetes.io/v1.1/docs/design/architecture.png?raw=true
Step 1 — Configuring Our Client Machine.
As the first step in this tutorial we will ensure our client machine is correctly configured to complete all subsequent steps.
The default folder for storing Kubernetes cluster certificates and config-related files is $HOME/.kube/
. For this tutorial, we will store our cluster configuration and certificates in this folder, ensure the folder exists:
- mkdir ~/.kube
We will be using the Digital Ocean Control TooL (Doctl) as well as the Command-line JSON processor Jq to manage our Digital Ocean resources from our terminal. This will allow us to quickly repeat commands and automate our Kubernetes cluster setup further down the line.
We will set up Doctl and Jq as well as introduce the basics on how to use these tools within this step.
At the end of this step a correctly configured client environment is expected, if you skip this step ensure first that you have:
- Configured your environment to create and destroy Droplets in a single Digital Ocean region from the terminal. Ensure you set the
$region
and$DIGITALOCEAN_API_KEY
variables for the rest of this tutorial. - Created the SSH key for all Droplets in our cluster. Ensure the private key is loaded with your SSH agent and the public key is stored as a Digital Ocean resource named
k8s-key
.
Follow the sub-steps to achieve this.
Setting up Doctl
To use Doctl from your terminal and follow the Kubernetes cluster config from this tutorial, you will need to generate a Personal Access Token with write permissions through the Digital Ocean Control Panel. Refer to the How To Use the DigitalOcean API v2 tutorial for information on how to do this, continue with these steps once you have your Personal Access Token ready.
For all of the steps in this tutorial, we will assign our token to a variable called DIGITALOCEAN_API_KEY
. For example, by running the following command in bash (replace the highlighted text with your own token):
- export DIGITALOCEAN_API_KEY=77e027c7447f468068a7d4fea41e7149a75a94088082c66fcf555de3977f69d3
Review the latest Doctl release and choose the right binary archive for your environment:
Operating System | Binary |
---|---|
OSX | darwin-amd64-doctl.tar.bz2 |
Linux | linux-amd64-doctl.tar.bz2 |
Windows | windows-amd-64-doctl.zip |
For example, to download the archive for the 0.0.16
release (used in this tutorial) to your home directory on a Linux 64-bit host, run the following commands in your terminal:
- curl -Lo ~/doctl.tar.bz2 https://github.com/digitalocean/doctl/releases/download/0.0.16/linux-amd64-doctl.tar.bz2
Next, we need to extract the downloaded archive. We will also need to add doctl
to a location included in our PATH
environment variable, /usr/bin
or /opt/bin
for example. The following command will extract doctl
directly to /usr/bin
making it available for all users on a Linux host. This command requires sudo
rights:
- tar xjf ~/doctl.tar.bz2 -C /usr/bin
Finally, validate that doctl
has been downloaded successfully by confirming the installed version:
- doctl --version
If you followed the steps above, this should return:
Outputdoctl version 0.0.16
Finding help about Doctl
An overview of Doctl and several usage examples are available on the Doctl GitHub repository. Additionally, invoking Doctl without any arguments will print out usage instructions as well. Note that every Digital Ocean resource type (droplet, sshkey, ...) has a corresponding Doctl command. Every command has subcommands to manage the resource as well as instructions available through the help
subcommand or the --help
flag.
For example, to review the available commands for droplet
resources, run:
- doctl droplet help
This should return:
OutputNAME:
doctl droplet - Droplet commands. Lists by default.
USAGE:
doctl droplet [global options] command [command options] [arguments...]
VERSION:
0.0.16
COMMANDS:
create, c Create droplet.
list, l List droplets.
find, f <Droplet name> Find the first Droplet whose name matches the first argument.
destroy, d [--id | <name>] Destroy droplet.
reboot [--id | <name>] Reboot droplet.
power_cycle [--id | <name>] Powercycle droplet.
shutdown [--id | <name>] Shutdown droplet.
poweroff, off [--id | <name>] Power off droplet.
poweron, on [--id | <name>] Power on droplet.
password_reset [--id | <name>] Reset password for droplet.
resize [--id | <name>] Resize droplet.
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--help, -h show help
Notice that the droplet
command will list
droplets by default.
To get more information about the droplet
create
command, run:
- doctl droplet help create
This should return:
OutputNAME:
create - Create droplet.
USAGE:
command create [command options] [arguments...]
OPTIONS:
--domain, -d Domain name to append to the hostname. (e.g. server01.example.com)
--add-region Append region to hostname. (e.g. server01.sfo1)
--user-data, -u User data for creating server.
--user-data-file, --uf A path to a file for user data.
--ssh-keys, -k Comma seperated list of SSH Key names. (e.g. --ssh-keys Work,Home)
--size, -s "512mb" Size of Droplet.
--region, -r "nyc3" Region of Droplet.
--image, -i "ubuntu-14-04-x64" Image slug of Droplet.
--backups, -b Turn on backups.
--ipv6, -6 Turn on IPv6 networking.
--private-networking, -p Turn on private networking.
--wait-for-active Don't return until the create has succeeded or failed.
As another example, to list all Droplet sizes provided by Digital Ocean run:
- doctl droplet size list
Setting up your SSH Keys with Doctl
Every CoreOS droplet that you will provision for your Kubernetes cluster, will need to have at least one SSH public key installed during its creation process. The key(s) will be installed to the core
user's authorized keys file, and you will need the corresponding private key(s) to log in to your CoreOS server.
If you do not already have any SSH keys associated with your Digital Ocean account, do so now by following steps one and two of this tutorial: How To Use SSH Keys with Digital Ocean Droplets. You may opt to use Doctl to add the new SSH keys to your account rather then copying the SSH Keys into the Digital Ocean control panel manually. This can be achieved by passing in your DIGITALOCEAN_API_KEY
environment variable as the --api-key
to Doctl and adding the public key of your newly created SSH key with the following command:
- doctl --api-key $DIGITALOCEAN_API_KEY keys create <key-name> <path-to-public-key>
Note: Doctl will automatically try to use the $DITIGALOCEAN_API_KEY
env variable as the --api-key
if it exists and we do not need to explicitly pass it in every time. We will omit this in future Doctl commands.
Add your private key to your SSH agent on your client machine, using ssh-agent
as follows:
- ssh-add <path-to-private-key>
or use the -i <path-to-private-key>
flag each time connecting to your droplets over ssh
if you do not have a running ssh-agent
.
For example, in this tutorial we will store our key pair in our home directory as ~/.ssh/id_rsa
and upload the public key as k8s-key
to our Digital Ocean account, combining all the above steps together, would look like this:
Output# Generate the key pair
ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/demo/.ssh/id_rsa):/home/demo/.ssh/id_k8s
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/demo/.ssh/id_k8s.
Your public key has been saved in /home/demo/.ssh/id_k8s.pub.
The key fingerprint is:
4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 demo@a
The key's randomart image is:
+--[ RSA 2048]----+
| .oo. |
| . o.E |
| + . o |
| . = = . |
| = S = . |
| o + = + |
| . o + o . |
| . o |
| |
+-----------------+
# Upload public key to Digital Ocean account as k8s-key
doctl keys create k8s-key /home/demo/.ssh/id_k8s.pub
# Add private key to SSH Agent
ssh-add ~/.ssh/id_k8s
Managing Droplets with Doctl
To verify that your account & keys are setup correctly, we will create a new CoreOS Alpha droplet named "do-test"
from the terminal.
<!-- note that this will cause a charge? -->
For the remainder of this tutorial, we will be creating all droplets within the same Digital Ocean region. Choose your region and store it into a variable called $region
. Review the list of all available regions by running the doctl region
command first.
- doctl region
For example, we will use the Amsterdam 2 region for the rest of this tutorial. Choose the region most appropriate for your case:
- export region="ams2"
Now create the Droplet with the following command:
- doctl droplet create \
- --image "coreos-alpha" \
- --size "512mb" \
- --region "$region" \
- --private-networking \
- --ssh-keys k8s-key \
- "do-test"
With the above command, we created a "512mb"
droplet, in the region of our choice, requesting a private_ip
and adding our ssh-key (k8s-key
) to the droplet for remote access. Once the command completed, Doctl returns information about the new Digital Ocean resource that was just created.
First, confirm you can list all your droplets and their status with the following command:
- doctl droplet list
This should output a list similar to below:
OutputID Name IP Address Status Memory Disk Region
8684261 do-test.ams2 198.211.118.106 new 512MB 20GB ams2
Note that, to speed up its usage, Doctl has several shortcuts. For example, the shortcut for the droplet
command is d
. Moreover, the default action for the droplet
command is list
, allowing us to re-write the above command as follows:
- doctl d
Returning the same results as before. On Linux you can watch
this list to capture when the droplet status changes from new
to active
(which will take the same amount of time it would take when provisioning through the web control panel).
Once the CoreOS droplet has fully been provisioned and its status changed from new
to active
, ensure your SSH Key was added correctly by connecting as the core
user, run:
- ssh core@198.211.118.106
Replace the IP highlighted above by the public IP of your droplet, as listed by the previous doctl d
command. As this is the first time you connect to this server, you may be prompted to confirm the fingerprint of the server:
OutputThe authenticity of host '198.211.118.106 (198.211.118.106)' can't be established.
ED25519 key fingerprint is SHA256:wp/zkg0UQifNYrxEsMVg2AEawqSVpRS+3mBAQ6TBNlU.
Are you sure you want to continue connecting (yes/no)?
For more information about accessing your Droplet remotely, see How To Connect To Your Droplet with SSH.
You should now be connected to a fully functional CoreOS droplet running in Digital Ocean.
Press CTRL+D
or enter exit
to log out of your Droplet, but keep the do-test
droplet running to complete the exercises in the next section.
Working with Doctl responses
By default, Doctl will return yaml responses, but it is possible to change the format of the response with the -f
flag. Using the json
format will allow us to easily act on the data returned by Doctl through the Command-line JSON processor Jq.
Jq comes installed on several Linux distributions (i.e. CoreOS). However, to download and setup Jq manually, review the latest releases and choose the right release for your environment:
Operating System | Binary |
---|---|
OSX | jq-osx-amd64 |
Linux | jq-linux64 |
Windows | jq-win64.exe |
For example, to download the 1.5
release from a shell directly to your /usr/bin/
directory on a Linux 64-bit host (which requires sudo
rights), run:
- curl -Lo /usr/bin/jq https://github.com/stedolan/jq/releases/download/jq-1.5/jq-linux64
This will make the jq
command available for all users.
Validate Jq has been downloaded successfully by confirming the installed version:
- jq --version
If you followed the steps above, this should return:
Outputjq-1.5
Using the -f json
argument for Doctl together with Jq allows us to, for example, extract the number of CPUs a droplet has:
- doctl -f json d find do-test.$region | jq '.vcpus'
In the above command, the find
command for droplets (shortcut f
) returns all properties of a droplet matching the name provided, including the droplet's vcpus
property. This json
data is passed on to Jq with an argument to only return the vcpus
property to us.
Another example of using Jq to manipulate the data returned by Doctl is given next, extracting the raw public_key
for an existing Digital Ocean SSH Key, the key named k8s-key
in our example:
- doctl -f json keys f k8s-key | jq --raw-output '.public_key'
With Output similar to:
Outputssh-rsa AAAAB3Nza... user@host
By default Jq will format strings as json strings, but using the --raw-output
flag (shortcut -r
), as can be seen above, will make Jq write strings directly to standard output. This is very useful for our scripts.
Finally, the real power of Jq becomes evident when we need to retrieve an array of network interfaces (ipv4
) assigned to a droplet, filter the array based on a property .type
with possible values "public"
or "private"
and extract the raw value of the ip_address
property.
We'll break this down as follows. Notice first that the following command will return an array of all the IPv4 network interfaces assigned to a droplet:
- doctl -f json d f do-test.$region | jq -r '.networks.v4[]'
Which will return a result similar to the following text block:
Output{
"ip_address": "10.129.73.216",
"netmask": "255.255.0.0",
"gateway": "10.129.0.1",
"type": "private"
}
{
"ip_address": "198.211.118.106",
"netmask": "255.255.192.0",
"gateway": "198.211.128.1",
"type": "public"
}
Next, we direct Jq to apply a filter to the array of network interfaces based on the type
property using the select
statement and only return the .ip_address
property of the filtered network interface:
- doctl -f json d f do-test.$region | jq -r '.networks.v4[] | select(.type == "private") | .ip_address'
The above command effectively returns the private ip_address
of our droplet directly to standard out. We will use this command often to store droplet IP addresses into environment variables. The output of the command may look like:
Output10.129.73.216
Finally, destroy your do-test
droplet with the following command:
- doctl d d do-test.$region
Which will output:
OutputDroplet do-test.ams2 destroyed.
For a full explanation of all the features of Jq, kindly refer to the Jq manual.
Using Doctl to configure CoreOS Droplets with cloud-config
For a short introduction to cloud-config
files, kindly refer to the section on writing cloud-config files of the Getting Started with CoreOS series. We will explain every aspect of cloud-config
files we rely on as we write our own in this tutorial.
One of the most useful aspects of cloud-config
files is that they allow you to define a list of arbitrary Systemd units to start after booting. To understand these coreos.units
better, refer to the understanding Systemd units and unit files tutorial. We will walk you through many Systemd unit examples within this tutorial.
We will heavily rely on these config files to manage the configuration of our droplets, giving us a way to consistently provision our Kubernetes clusters on Digital Ocean in the future. It is important however to note that cloud-config
files are not intended as a replacement for configuration management tools such as Chef/Puppet/Ansible/Salt/TerraForm and we may benefit more adopting one of these tools in the long run.
Please ensure you use the CoreOS validator to validate any cloud-config
file you write as part of this tutorial. Ensuring the config files are valid prior to creating the droplets will help avoid frustration and time loss. Also refer to the general troubleshooting tutorial for CoreOS on Digital Ocean when faced with CoreOS issues.
For this tutorial, we will be passing cloud-config
files through the --user-data-file
option (shortcut --uf
) when creating droplets from the terminal with Doctl.
To see how this works, follow the below steps to create a Droplet with a custom motd
and automatic reboots switched off, as an exercise.
First, create a test.yaml
file in your working directory, with the content as follows.
#cloud-config
write_files:
- path: "/etc/motd"
permissions: "0644"
owner: "root"
content: |
Good news, everyone!
coreos:
update:
group: alpha
reboot-strategy: off
The write_files
directive defines a set of files to create on the local filesystem. For each file, we specify the absolute location on disk through the path
key and the data to be write through the content
key. The coreos.update.*
parameters manipulate settings related to how CoreOS instances are updated, setting the reboot-strategy
to off
will instruct the CoreOS reboot manager (Locksmith) to disable rebooting after updates are applied.
Create a new droplet named ccfg-test
, using this test.yaml
file with the following command (this command will take about a minute to complete, please be patient):
- doctl d c --wait-for-active \
- -i "CoreOS-alpha" \
- -s 512mb \
- -r "$region" \
- -p \
- -k k8s-key \
- -uf test.yaml ccfg-test
We are using the d
shortcut to manage our droplet
resources and c
as a shortcut for create
. The --wait-for-active
flag will ensure Doctl waits for the droplet to become active before returning control to our terminal, which is why we had to wait.
Once Doctl returned control, you may need to give your Droplet some more time to boot and load the SSH Daemon before attempting to connect.
Try to connect via the public ip of this droplet with the following one-liner.
- ssh core@`doctl d | grep ccfg-test | awk '{print $3}'`
In this case we are using the droplet listing command piped into grep
to filter down to the droplet we just created and we capture the third column, which is the public IP, using awk
. The result should be similar to below, confirming our motd
was written correctly once we confirm the authenticity of our new droplet:
OutputThe authenticity of host '37.139.21.41 (37.139.21.41)' can't be established.
ED25519 key fingerprint is SHA256:VtdI6P5sRqvQC0dGWE1ffLYTq1yBIWoRFdWc6qcm+04.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '37.139.21.41' (ED25519) to the list of known hosts.
Good news, everyone!
If you are prompted for a password, ensure your SSH Agent has the private key associated with your k8s-key
loaded and try again.
If you happened to destroy a droplet directly prior to creating the one that you are connecting to, you may see a warning like this:
Output@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
...
If this is the case, your new droplet probably has the same IP address as the old, destroyed droplet, but it has a different host SSH key. This is fine, and you can remove the warning, by deleting the old droplet's host key from your system, with this command (replacing the highlighted IP address with your droplet public IP):
- ssh-keygen -R 37.139.21.41
Now try connecting to your server again.
Finally, destroy this test droplet with a command similar to below:
- doctl d d ccfg-test.$region
Note: At the time of writing, user provided cloud-config
files can not be modified once a droplet has been created. To change the cloud-config
the droplets need to be re-created. Take this into consideration when writing cloud-config files which limit ssh access to certain user accounts as these may be reset after every reboot.
With the above commands in our toolbox, we are ready to start a highly automated Kubernetes configuration on Digital Ocean.
Step 2 — Initializing The Kubernetes Cluster PKI
In this tutorial we will configure the Kubernetes API Server to use client certificate authentication to enable encryption and prevent traffic interception and man-in-the-middle attacks. This means it is necessary to have a Certificate Authority (CA) which will be trusted as the root authority for the cluster and use it to generate the proper credentials. The necessary assets can also be generated from an existing Public Key Infrastructure (PKI), if already available.
For this tutorial we will use Self-Signed certificates. Every certificate is created by submitting Certificate Signing Requests (CSRs) to a CA. A CSR contains information identifying whom the certificate request is for, including the public key associated with the private key of the requesting party. The CA will sign the CSR, effectively returning what is from then on referred to as "the certificate".
For a detailed overview of OpenSSL, refer to the OpenSSL Essentials guide on Digital Ocean.
Initialize Cluster Root CA
Generate the private key for your root certificate into the default $HOME/.kube
folder, which we should have created as part of our client machine setup, with the following OpenSSL command to generate a 2048 bit RSA private key:
- openssl genrsa -out ~/.kube/ca-key.pem 2048
This ca-key.pem
private key will be used to generate the self-signed ca.pem
certificate which will be trusted by all your Kubernetes components, as well as every Worker node and Administrator key pair. This key needs to be closely guarded and kept in a secure location for future use.
Next, use the private key to generate the self-signed root ca.pem
certificate with the following openssl command:
- openssl req -x509 -new -nodes -key ~/.kube/ca-key.pem -days 10000 -out ~/.kube/ca.pem -subj "/CN=kube-ca"
The ca.pem
certificate will be used as the root certificate to verify the authenticity of certificates by every component within your Kubernetes cluster, you will copy this file to the controller and worker Droplets as well as the Administrator clients.
Confirm your Root CA assets exist in the expected location:
- ls -1 ~/.kube/ca*.pem
Output similar to:
Output/home/demo/.kube/ca.pem
/home/demo/.kube/ca-key.pem
We now have all the necessary ingredients to generate certificates for all of our cluster components. We will come back to openssl
to generate each as required.
Step 3 — Provisioning The Data Storage Back End.
No matter if you are using Swarm with Docker overlay networks or Kubernetes, a data storage back end is required for the infrastructure meta data.
Kubernetes uses Etcd for data storage and for cluster consensus between different software components. Etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. It is open-source and available on GitHub. We will introduce the minimum concepts necessary to set up Etcd for our Kubernetes cluster, full Etcd documentation is available here.
Your Etcd cluster will be heavily utilized since all objects are stored within and every scheduling decision is recorded. It is recommended that you run a multi-droplet cluster to gain maximum performance and reliability of this important part of your Kubernetes cluster. Of your Kubernetes components, you should only give the kube-apiserver
component read/write access to Etcd. You do not want the Etcd cluster used by Kubernetes exposed to every node in your cluster (or worse, to the Internet at large), because access to Etcd is equivalent to root in your cluster.
<!-- TODO: We should not share Etcd with flannel..., it means Etcd is exposed on every node..., but we still do in this tutorial... -->
For development & testing environments, a single droplet running Etcd and shared between the Kubernetes API server and Flannel will suffice.
For production environments it is highly recommended that Etcd is ran as a dedicated cluster separately from Kubernetes components. Use the CoreOS cluster architecture overview as well as the official Etcd clustering guide to bootstrap a new Etcd cluster on Digital Ocean. If you do not have an existing Etcd cluster, you can bootstrap a fresh Etcd cluster on Digital Ocean either by:
- Using the public Etcd discovery service,
- Deploying your own private Etcd Discovery service or
- Using DNS discovery
Additionally, refer to the official Etcd guides on securing your Etcd cluster and to get a full overview of Etcd configuration flags
In this tutorial, instead of being slowed down and distracted by generating new discovery URLs and bootstrapping Etcd, it's easier to start a single Etcd node. Since the full Etcd daemon isn't running on all of the machines, we'll gain a little bit of extra CPU and RAM to play with. However, for ease of configuration of all the cluster services, we will run a local Etcd in proxy mode on every Worker node (this daemon will listen on localhost and proxy all requests to the Etcd node). This allows us to configure every cluster component through the local Etcd proxy.
If you already have an Etcd cluster and wish to skip this step, ensure that you have set the $ETCD_PEER
environment variable to your Etcd cluster before proceeding with the rest of this tutorial.
Deploying with a single Etcd node
Since we're only using a single Etcd node, there is no need to include a discovery token. There isn't any high availability for Etcd in this configuration, but that's assumed to be OK for development and testing. Provision this Droplet first so you can configure the rest with its IP address.
Etcd is configurable through command-line flags and environment variables. To start Etcd automatically using custom settings with Systemd, we may store manually created Systemd drop-ins under: /etc/systemd/system/etcd2.service.d/
. Systemd drop-ins are a method for appending or overriding parameters of a Systemd unit without having to re-define the whole unit.
Alternatively, we may use the coreos.etcd2.*
parameters in our cloud-config
file to let CoreOS automatically generate the Etcd drop-ins on startup for us.
Note: cloud-config
generated drop-ins are stored under /run/systemd/system/etcd2.service.d/
.
#cloud-config
coreos:
etcd2:
name: "etcd-01"
advertise-client-urls: "http://$private_ipv4:2379"
listen-client-urls: "http://$private_ipv4:2379, http://127.0.0.1:2379"
listen-peer-urls: "http://$private_ipv4:2380, http://127.0.0.1:2380"
#bootstrapping
initial-cluster: "etcd-01=http://$private_ipv4:2380"
initial-advertise-peer-urls: "http://$private_ipv4:2380"
As we will use a single region for our Kubernetes cluster, we configure our Etcd instance to listen for incoming requests on the private_ip
and localhost
only, this may give us a little protection from the public Internet - but not from other droplets within the same region. For a production setup, it is recommended to follow the official Etcd guides on securing your Etcd cluster.
We set the -listen-client-urls
flag to listen for client traffic and -listen-peer-urls
flag to listen for peer traffic coming from Etcd proxies running on other cluster nodes. We use the $private_ipv4
substitution variable made available by the Digital Ocean metadata service in our cloud-config
files. We use the IANA-assigned Etcd ports 2379
for client traffic and 2380
for peer traffic.
Note: Several Etcd applications, such as SkyDNS, still rely on Etcd's legacy port 4001
. We did not configure Etcd to listen on this port, but you may need to do this to support older Etcd applications in your infrastructure.
Our Etcd node will advertise itself with its private_ip
to clients as we define -advertise-client-urls
to overwrite the default of localhost
. this is important to avoid loops for our Etcd proxy running on our worker nodes. We are also required to configure a name for our Etcd instance to overwrite the default name for static bootstrapping. To bootstrap our single node Etcd cluster we directly provide the initial
clustering flags -initial-cluster
and -initial-advertise-peer-urls
as we do not rely on cluster discovery.
Next, we tell Systemd to start our Etcd service on boot by providing a unit definition for the etcd2
service in the same cloud-config
file as well and as this component is crucial and we only have a single node, we turn off the CoreOS reboot-strategy
which is on
by default.
Combining all of the above, our cloud-config
file for our Etcd Droplet should look as follows:
-
- #cloud-config
-
- coreos:
- etcd2:
- name: "etcd-01"
- advertise-client-urls: "http://$private_ipv4:2379"
- listen-client-urls: "http://$private_ipv4:2379, http://127.0.0.1:2379"
- listen-peer-urls: "http://$private_ipv4:2380, http://127.0.0.1:2380"
- #bootstrapping
- initial-cluster: "etcd-01=http://$private_ipv4:2380"
- initial-advertise-peer-urls: "http://$private_ipv4:2380"
- units:
- - name: "etcd2.service"
- command: "start"
- update:
- group: alpha
- reboot-strategy: off
Validate your cloud-config file with the CoreOS Validator, then create your etcd-01
droplet with the following Doctl command. :
- doctl d c --wait-for-active \
- -i "CoreOS-alpha" \
- -s 512mb \
- -r "$region" \
- -p \
- -k k8s-key \
- -uf etcd-01.yaml etcd-01
Again we are waiting for the droplet creation to be completed before proceeding. When active, give the Droplet some time to start the SSH daemon, then connect::
- ssh core@`doctl d | grep etcd-01 | awk '{print $3}'`
Confirm Etcd is running:
- systemctl status etcd2
This should return output similar to:
Output● etcd2.service - etcd2
Loaded: loaded (/usr/lib64/systemd/system/etcd2.service; disabled; vendor preset: disabled)
Drop-In: /run/systemd/system/etcd2.service.d
└─20-cloudinit.conf
Active: active (running) since Sat 2015-11-11 23:19:13 UTC; 6min ago
Main PID: 841 (etcd2)
CGroup: /system.slice/etcd2.service
└─841 /usr/bin/etcd2
Nov 11 23:19:13 etcd-01.ams2 systemd[1]: Started etcd2.
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: added local member ce2a822cea30bfca [http://10.129.69.201:2379] to cluster 7e27652122e8b2ae
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca is starting a new election at term 1
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca became candidate at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca received vote from ce2a822cea30bfca at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: ce2a822cea30bfca became leader at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: raft.node: ce2a822cea30bfca elected leader ce2a822cea30bfca at term 2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: setting up the initial cluster version to 2.2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: set the initial cluster version to 2.2
Nov 11 23:19:13 etcd-01.ams2 etcd2[841]: published {Name:etcd-01 ClientURLs:[http://10.129.69.201:2379]} to cluster 7e27652122e8b2ae
Confirm the cluster is healthy:
- etcdctl cluster-health
This should return output similar to:
Outputmember ce2a822cea30bfca is healthy: got healthy result from http://10.129.69.201:2379
cluster is healthy
Close the connection to the droplet and note down the Etcd endpoint your kubernetes will use, http://10.129.69.201:2379
for clients and http://10.129.69.201:2380
for peers, in the example above.
if we were to script this assignment, using Jq we can extract the private_ip
property of the droplet and format the result as required:
- export ETCD_PEER=`doctl -f json d f etcd-01.$region | jq -r '.networks.v4[] | select(.type == "private") | "http://\(.ip_address):2380"'`
Refer to [Working with Doctl responses](#) of Step 1 in this tutorial for a full explanation of the above command and confirm:
- echo $ETCD_PEER
This should return output similar to:
Outputhttp://10.129.69.201:2380
We will point our other Droplets to this Etcd cluster through their cloud-config
files. Start by creating a new file in your working directory named cloud-config-controller.yaml
. This will be the template for our Controller Droplets, add the Etcd proxy configuration with the placeholder for ETCD_PEER (we will replace the placeholder at a later stage). We will keep adding snippets of configuration to this file as we go through each step in this tutorial. We will provide a full listing of the final file as part of this tutorial. Add the Etcd proxy configuration to the file now:
- #cloud-config
-
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
When we provision a cluster, we will script the substitution of the ETCD_PEER
placeholder with a sed
command similar to the one below:
- sed -e "s|ETCD_PEER|${ETCD_PEER}|g;" cloud-config-controller.yaml > kube-controller-01.yaml
Note: Because our variable value includes forward slashes, we are using sed
with the pipeline "|" character as separator for the "s" command instead of the more common forward slash. Whichever character follows the "s" command is used as the separator by sed
.
We will proceed with reviewing the networking requirements for Kubernetes and how we achieve them on Digital Ocean in the next step.
Step 4 — Configuring The Network Fabric Layer.
As explained in the introduction of this tutorial, Kubernetes has the fundamental networking requirement of ensuring that all containers are routable without network translation or port brokering on the hosts. In other words, this means every Droplet is required to have its own IP range within the cluster. To achieve this on Digital Ocean, Flannel will be used to provide an overlay network across multiple Droplets and configure Docker to use this networking layer for its containers.
Flannel runs an agent, flanneld
, on each host which is responsible for allocating a subnet lease out of a pre-configured address space. When enabling Flannel on CoreOS, CoreOS will ensure Docker is automatically configured to use the Flannel overlay network for its containers.
Flannel uses Etcd to store the network configuration, subnets allocated to each host and auxiliary data (such as host IPs). The Etcd storage back end for Flannel should be ran separately from the Kubernetes storage back end. To reduce the complexity in this tutorial however, we will configure Flannel to share the external Etcd cluster with Kubernetes, this is acceptable for Testing and Development only. By default, Flannel looks up its configuration under the /coreos.com/network/config
key within Etcd. To run Flannel on each node in a consistent way, we are required to publish the Flannel configuration to Etcd under this key.
At the bare minimum, the configuration must provide Flannel an IP range (subnet) that it should use for the overlay network. The IP subnet used by Flannel should not overlap with the public and private IP ranges used by the Digital Ocean Droplets, 10.2.0.0/16
is the IP range used in this tutorial. This /16 range will be assigned for the entire overlay network and used by containers and Pods across the cluster Droplets. By default, Flannel will allocate a /24 to each Droplet. This default, along with the minimum and maximum subnet IP addresses is overridable in config.
The forwarding of packets by Flannel is achieved using one of several strategies that are known as back ends. In this tutorial we will configure Flannel to use the vxlan
back end which is built on the performant in-kernel VXLAN tunneling protocol to encapsulate the packets for the overlay network.
If we were to use the etcdctl
utility, which is shipped with CoreOS, directly from the terminal of any of our Droplets to publish this configuration, it would look like this:
- etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}
With the above command, etcdctl
uses the localhost, which in our hosts will be the Etcd daemon running in proxy mode, forwarding the configuration to our Etcd storage back end.
In our cloud-config-controller.yaml
we now add the requirement that the flanneld.service
needs to be started as well as add a Systemd drop-in. In this case we're using the Systemd drop-in to append a pre-condition to the start command of the flanneld
service, making sure the Flannel configuration is published prior to starting Flannel:
- #cloud-config
-
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: start
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
Note: any services that run Docker containers must come after the flanneld.service
definition in our cloud-config
file and should include Requires=flanneld.service
, After=flanneld.service
, and Restart=always|on-failure
directives. These directives are necessary because flanneld.service
may fail due to Etcd not being available yet. Flannel will keep restarting and it is important for Docker based services to also keep trying until Flannel is up.
In order for Flannel to manage the pod network in the cluster, Docker needs to be configured to use the correct ip range for the Docker bridge. All we need to do is require that flanneld
is running prior to Docker starting and Flannel will handle the Docker configuration for us.
We're doing this with another Systemd drop-in, in this case we're appending two dependency rules to our Docker service to ensure it is only started after the Flannel service:
- #cloud-config
-
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: start
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
- - name: "docker.service"
- command: start
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
Step 5 — Downloading The Kubernetes Artifacts
Overview of the Kubernetes Artifacts
At the time of writing, the Official Kubernetes guidelines require the Kubernetes binaries and Docker images wrapping those binaries to be downloaded as part of the full Kubernetes release archive available in the Kubernetes repository on GitHub.
At the same time, all Kubernetes artifacts are also stored on the kubernetes-release
bucket on Google cloud storage for every release.
To confirm the current stable release of Kubernetes run:
- curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt
This returned v1.1.2 at the time of writing.
To list all Kubernetes release binaries stored in the Google cloud storage bucket for a particular release, you can either use the python 2.7 based Gsutil from the Google SDK as follows:
- gsutil ls -R gs://kubernetes-release/release/v1.1.2/bin/linux/amd64
Or without Python, directly talking to the Google cloud platform API using curl
& jq
(Jq ref):
- curl -sL https://www.googleapis.com/storage/v1/b/kubernetes-release/o?prefix='release/v1.1.2/bin/linux/amd64' | jq '"https://storage.googleapis.com/kubernetes-release/\(.items[].name)"'
A combined binary is provided as the Hyperkube. This Hyperkube is an all-in-one binary, allowing you to run any Kubernetes component as a subcommand of the hyperkube
command. The Hyperkube is also made available within a Docker image. The Dockerfile used to build this image can be reviewed here.
The plan for the Kubernetes release process is to publish the Kubernetes images on the Google Container Registry, under the google_containers
repository: gcr.io/google_containers/hyperkube:$TAG
, where TAG is the latest stable release tag (i.e.: v1.1.2
).
For example, we would obtain the Hyperkube image with the following command:
- docker pull gcr.io/google_containers/hyperkube:v1.1.2
Note: At the time of writing, Kubernetes images were not yet being pushed to the Google Container Registry as part of the release process. Any available images were pushed as a one-off. Refer to the following support ticket for an updated status of the release process.
Moreover, as the Hyperkube combines all binaries, is based on debian:jessie
and includes additional packages such iptables
(required by the kube-proxy
), its size is considerable:
Outputgcr.io/google_containers/hyperkube v1.1.2 aa1283b0c02d 2 weeks ago 254.3 MB
As a result, for this tutorial, we will run the kube-proxy
binary outside of a container, the same way we run the kubelet
or any system daemon. For the kube-apiserver
, kube-controller-manager
and kube-scheduler
it is recommended to run these within containers and we will take a closer look at the available Docker images to do so now.
As can be seen in the listings earlier, tarred repositories for Docker images wrapping the Kubernetes binaries are also available:
binary_name | base image | size |
---|---|---|
kube-apiserver | busybox | 47.97 MB |
kube-controller-manager | busybox | 40.12 MB |
kube-scheduler | busybox | 21.44 MB |
Assuming you have access to a Docker daemon, we can curl
and load
these Docker images with the following commands.
- curl -sLo ./kube-apiserver.tar https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-apiserver.tar
- docker load -i ./kube-apiserver.tar
We have loaded the kube-apiserver
image in this example.
The Kubernetes build script tags these images as gcr.io/google_containers/$binary_name:$md5_sum
, which uses the md5_sum instead of the version. To easily run a container from this image, or push the image to a private registry for bootstrapping, we may re-tag the images with commands similar to the following:
- #get md5 via docker_tag
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-apiserver.docker_tag)"
- #re-tag
- docker tag -f "gcr.io/google_containers/kube-apiserver:${docker_tag}" "kube-apiserver:1.1.2"
In the above command we first get the md5_sum from the cloud storage bucket and use it to re-tag the image. We can automate this for all the Kubernetes containers and will do this to pre-load the Kubernetes images.
Pre-Loading Kubernetes images
We will curl
and load
each image individually, see Pre-pulling images for details on how pre-pulled images may affect the Kubelet Pod definitions if we were to use the latest
tag. The following script combines the commands described above for each binary:
- #!/bin/bash
- tag=1.1.2
- docker_wrapped_binaries=(
- "kube-apiserver"
- "kube-controller-manager"
- "kube-scheduler"
- #"kube-proxy"
- )
- temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
-
- for binary in "${docker_wrapped_binaries[@]}"; do
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
- echo "downloading ${binary} ${docker_tag}"
- curl -Lo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
- echo "loading docker image"
- docker load -i ${temp_dir}/${binary}.tar
- echo "tagging docker image as ${binary} ${tag}"
- docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
- done
-
- echo "cleaning up temp dir"
- rm -rf "${temp_dir}"
The build scripts provided on the Kubernetes repository were used as a reference to create this script.
If you have a Docker registry available, you may modify this script to push these images to a repository there and simplify the provisioning of your cluster nodes significantly. In that case, our cluster nodes know how to simply pull the containers from your repository and we do not need to include the above script as part of our controller Droplet configuration. For this tutorial, we assume such registry is not available and we will embed the above script into the cloud-config
file of every controller node.
<!-- Ideally a version of this tutorial is created which uses a bootstrapping droplet running the Etcd discovery service, a TLS secured Docker registry and Ansible to provision TLS secrets for every node... -->
Insert this script to the cloud-config-controller.yaml
with a write-files
directive and make it executable. To reduce the output to the Systemd journal, we make the curl command silent and strip the echo
commands. We inserted the write-files
directive before the CoreOS configuration directives for clarity. We will execute the script through a oneshot
service, which we will add to the cloud-config
after this code listing:
- #cloud-config
-
- write-files:
- - path: /opt/bin/pull-kube-images.sh
- permissions: '0755'
- content: |
- #!/bin/bash
- tag=1.1.2
- docker_wrapped_binaries=(
- "kube-apiserver"
- "kube-controller-manager"
- "kube-scheduler"
- )
- temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
- for binary in "${docker_wrapped_binaries[@]}"; do
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
- curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
- docker load -i ${temp_dir}/${binary}.tar
- docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
- done;
- rm -rf "${temp_dir}";
- exit $?
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: start
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
- - name: "docker.service"
- command: start
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
We will now instruct cloudinit
to run this script during initiation of the Droplet by adding a oneshot
service for it to our cloud-config
. Systemd will flag services which exit as failed
, unless we explicitly inform Systemd our service should still be flagged as successful after a non-zero exit with the RemainAfterExit
flag. The script also requires the network and Docker services to be available and we add these as conditions to our Service Unit. Add the Unit definition to the end of the cloud-config-controller.yaml
file (line 48 and below):
- #cloud-config
-
- write-files:
- - path: /opt/bin/pull-kube-images.sh
- permissions: '0755'
- content: |
- #!/bin/bash
- tag=1.1.2
- docker_wrapped_binaries=(
- "kube-apiserver"
- "kube-controller-manager"
- "kube-scheduler"
- )
- temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
- for binary in "${docker_wrapped_binaries[@]}"; do
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
- curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
- docker load -i ${temp_dir}/${binary}.tar
- docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
- done;
- rm -rf "${temp_dir}";
- exit $?
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: start
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
- - name: "docker.service"
- command: start
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
- - name: "pull-kube-images.service"
- command: start
- content: |
- [Unit]
- Description=Pull and load all Docker wrapped Kubernetes binaries
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target docker.service
- After=network-online.target docker.service
- [Service]
- ExecStart=/opt/bin/pull-kube-images.sh
- RemainAfterExit=yes
- Type=oneshot
A note about the Kubernetes Up and Running images:
<!-- do we really need to mention the Kuar images? I believe so as they are mentioned in every Workshop from Kelsey and used a lot... may be just to clarify they are not official and their presence is not guaranteed.. -->
Non-official images are also made available by Kelsey Hightower for his Kubernetes Up And Running book through his kuar
repository (backed by the Kubernetes Up And Running Google cloud storage bucket), we may list the contents of the kuar bucket using the repositories/library
key to find the images available within the repository, as follows:
- gsutil ls gs://kuar/repositories/library/
Or without Python, using curl
, jq
and grep
to grab all v1.1.2 Kubernetes tagged images:
- curl -sL https://www.googleapis.com/storage/v1/b/kuar/o?prefix='repositories/library/kube' | jq .items[].name | grep tag_1.1.2
The kuar
images are very easy to use with a single Docker command using the registry endpoint: b.gcr.io/kuar/$binary_name:$version
, for example: to run the kube-apiserver
:
- docker run b.gcr.io/kuar/kube-apiserver:1.1.2
This is much easier than manually curl
-ing, load
-ing, re-tag
-ing and run
-ing the images, but keep in mind that these are not the official Kubernetes images and their availability is not guaranteed. We will not be using these images as part of this tutorial.
Step 6 — The Kubernetes Controller Systemd Services
Most of the Kubernetes controller configuration will be done through cloud-config
, aside from placing the TLS assets on disk. The cloud-config
we are writing will take into account the possibility to have load-balanced controller nodes for High Availability in the future. How this affects our configuration will be discussed in detail in the [Controller Services set up: Master Election](#) section.
Note: The TLS assets shouldn't be stored in the cloud-config
for enhanced security. If you do prefer to transfer the TLS assets as part of the cloud-config
refer to this CoreOS tutorial as an example of storing TLS assets within the cloud-config
file.
We will now introduce every Kubernetes component and its configuration to incrementally add to our controller Droplet's cloud-config
file.
The Kubernetes Kubelet Service
As seen in the Architectural overview section, Kubernetes is made up of several components. One such fundamental component is the kubelet
. The kubelet
is responsible for what's running on each individual Droplet within your cluster. You can think of it as a process watcher like systemd
, but focused on running containers. It has one job: given a set of containers to run, make sure they are all running.
The unit of execution Kubernetes works with is the Pod, not an individual container. A Pod is a collection of containers and volumes sharing the same execution environment. The containers within a Pod share a single IP, in our case this IP is provided by Docker within the Flannel overlay network. Pods are defined by a JSON or YAML file called a Pod manifest.
Within a Kubernetes cluster, the kubelet
functions as a local agent that watches for Pod specs via the Kubernetes API server. The kubelet
is also responsible for registering a node with a Kubernetes cluster, sending events and pod status, and reporting resource utilization.
While the kubelet
plays an important role in a Kubernetes cluster, it also works well in standalone mode - outside of a Kubernetes cluster. With the kubelet
running in standalone mode we will be able to use containers to distribute our binaries, monitor container resource utilization through the built-in support for cAdvisor and establish resource limits for the daemon services. The kubelet
provides a convenient interface for managing containers on a local system, allowing us to update our controller services by updating the containers without rebuilding our unit files. To achieve this, the kubelet
supports the configuration of a manifest directory, which is monitored for pod manifests every 20 seconds by default.
We will use our controller's cloud-config
to configure & start the kube-kubelet.service
in standalone mode on our controller Droplet. We will run the kube-proxy
service in the same way. Next, we will deploy all the Kubernetes cluster control services using a Pod manifest placed in the manifest
directory of the controller as soon as the TLS assets become available. The kubelet will start and make sure all containers within the Pod keep running, just as if the Pod was submitted via the API. The cool trick here is that we don't have an API running yet, but the Pod will function in the exact same way, which simplifies troubleshooting later on.
<!-- TODO: Should we always curl the kubelet for consistency? -->
Note: Please note that only CoreOS Alpha or Beta images come with the Kubernetes kubelet. The Stable channel has never contained a version which included the kubelet
. If a Droplet was booted from the Beta or Alpha channels and then moved to the Stable channel, it will lose the kubelet
when it updates to the stable release.
At the time of writing, the CoreOS Alpha image on Digital Ocean has the following version:
Output$ cat /etc/lsb-release
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=891.0.0
And the Kubelet bundled with this is:
Output$ kubelet --version=true
Kubernetes v1.1.2+3085895
If you are required to use the CoreOS stable channel or need a different Kubelet version, you may curl
the kubelet
binary as part of the cloud-config
using the paths identified in [Step 5 — Downloading The Kubernetes Artifacts](#) of this tutorial.
...
ExecStartPre=/usr/bin/curl -sLo /opt/bin/kubelet -z /opt/bin/kubelet https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kubelet
ExecStartPre=/usr/bin/chmod +x /opt/bin/kubelet
...
Note: The -z
option of curl
will only download newer files based on a date expression or, as used here - given an existing local file - only if the date of the remote file is newer than the date of the local file. This will generate a warning if the local file does not exist, as shown below.
OutputWarning: Illegal date format for -z/--timecond (and not a file name).
Warning: Disabling time condition. See curl_getdate(3) for valid date syntax.
Wherever we curl
binaries with the -z
option as part of a Systemd unit, these warnings will show in the journal and can safely be ignored.
Running the Kubelet in standalone mode
The parameters we will pass on to the kubelet are as follows, we will break these down one by one next:
kubelet \
--api-servers=http://127.0.0.1:8080 \
--register-node=false \
--allow-privileged=true \
--config=/etc/kubernetes/manifests \
--hostname-override=$public_ipv4 \
--cluster-dns=10.3.0.10 \
--cluster-domain=cluster.local
The kubelet
will communicate with the API server through localhost
as we specify this with the --api-servers
flag, but it will not register our controller node for cluster work as we set the --register-node=false
flag, this ensures our controller Pods will not be affected by Pods scheduled by users within the cluster. As mentioned in the Kubelet deep dive section, to run the kubelet
in standalone mode, we need to point it to a manifest directory. We set the kubelet
manifest directory via the --config
flag, which will be /etc/kubernetes/manifests
in our setup. To facilitate the routing between Droplets, we also override the hostname with the Droplet public IP through the --hostname-override
flag.
CoreOS Linux ships with reasonable defaults for the kubelet, which have been optimized for security and ease of use. However, we are going to loosen the security restrictions in order to enable support for privileged containers through the --allow-privileged=true
flag.
Service Discovery and Kubernetes Services
To enable service discovery within the Kubernetes cluster, we need to provide our kubelet
with the service IP for the cluster DNS component as well as the DNS domain. The kubelet
will pass this on as the DNS server and DNS search suffix to each container running within the cluster. In this tutorial we will deploy DNS as a service within our Kubernetes cluster through the cluster DNS add-on in [Step 11 — Deploying Kubernetes-ready applications](#). Kubernetes uses cluster Virtual IPs (VIPs) for all services defined within the cluster. Routing to these VIPs is handled by the Kubernetes proxy components and VIPs are not required to be routable between nodes.
We configure Kubernetes to use the 10.3.0.0/24
IP range for all services. Each service will be assigned a cluster IP in this range. This range must not overlap with any IP ranges assigned to Pods as configured in our Flannel overlay network, or the Digital Ocean public and private IP ranges. The API server will take the first IP in that range (10.3.0.1
) by itself and we will configure the DNS service to take the static IP of 10.3.0.10
. Modify these values to mirror your own configuration.
We must pass on this DNS service IP to the kubelet
via the --cluster-dns
flag and the DNS domain via the --cluster-domain
flag.
If the kubelet
is bundled with CoreOS (Alpha/Beta), it is stored on /usr/bin/kubelet
, if you manually download (Stable) to another path (/opt/bin/kubelet
for example), make sure to update the paths in the snippet below. Prior to starting the Kubelet service, we will also ensure the manifests
and ssl
directories exist on the host using the ExecStartPre
directive, preceded by "-" which indicates to Systemd that failure of the command will be tolerated.
We combine all the information above in the systemd service unit file for running the kubelet. We add a dependency on the docker.service
and make sure the unit restarts on failure. Here is the relevant cloud-config
snippet:
...
units:
- name: "kube-kubelet.service"
command: "start"
content: |
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
Requires=docker.service
After=docker.service
[Service]
ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
ExecStart=/usr/bin/kubelet \
--api-servers=http://127.0.0.1:8080 \
--register-node=false \
--allow-privileged=true \
--config=/etc/kubernetes/manifests \
--hostname-override=$public_ipv4 \
--cluster-dns=10.3.0.10 \
--cluster-domain=cluster.local
Restart=always
RestartSec=10
...
Adding this to our existing cloud-config-controller.yaml
gives us the following new contents (changes from line 60 onwards):
- #cloud-config
-
- write-files:
- - path: /opt/bin/pull-kube-images.sh
- permissions: '0755'
- content: |
- #!/bin/bash
- tag=1.1.2
- docker_wrapped_binaries=(
- "kube-apiserver"
- "kube-controller-manager"
- "kube-scheduler"
- )
- temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
- for binary in "${docker_wrapped_binaries[@]}"; do
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
- curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
- docker load -i ${temp_dir}/${binary}.tar
- docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
- done;
- rm -rf "${temp_dir}";
- exit $?
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: start
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
- - name: "docker.service"
- command: start
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
- - name: "pull-kube-images.service"
- command: start
- content: |
- [Unit]
- Description=Pull and load all Docker wrapped Kubernetes binaries
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target docker.service
- After=network-online.target docker.service
- [Service]
- ExecStart=/opt/bin/pull-kube-images.sh
- RemainAfterExit=yes
- Type=oneshot
- - name: "kube-kubelet.service"
- command: "start"
- content: |
- [Unit]
- Description=Kubernetes Kubelet
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=docker.service
- After=docker.service
- [Service]
- ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
- ExecStart=/usr/bin/kubelet \
- --api-servers=http://127.0.0.1:8080 \
- --register-node=false \
- --allow-privileged=true \
- --config=/etc/kubernetes/manifests \
- --hostname-override=$public_ipv4 \
- --cluster-dns=10.3.0.10 \
- --cluster-domain=cluster.local
- Restart=always
- RestartSec=10
With this configuration, all state-less controller services will be managed through the Pod manifests dropped into the kubelet
's manifest folder (/etc/kubernetes/manifests
). After configuring the kube-proxy
service next, we will go through the structure of a Pod manifest and the Pod manifest for each controller service. We will finalize the controller configuration section with an overview of the full Kubernetes controller Pod manifests.
The Kubernetes Proxy Service
All nodes should run kube-proxy
(Running kube-proxy
on a "controller" node is not strictly required, but being consistent is easier.) The proxy is responsible for directing traffic destined for specific services and pods to the correct location. The proxy communicates with the API server periodically to keep up to date.
Unlike the kubelet
, the kube-proxy
binary is currently not shipped with any CoreOS release and we will always need to download it. The URL to download the binary is described in [Step 5 — Downloading The Kubernetes Artifacts](#) of this tutorial. We will curl
the binary from this URL prior to starting the service by providing the following ExecStartPre
directives within the [Service]
section of our Systemd unit:
ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
We will also delay the kube-proxy
daemon from trying to connect to the API server until the kube-apiserver
service has started with the following ExecStartPre
directive:
ExecStartPre=/bin/bash -c "until /usr/bin/curl http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
Both the controller and worker nodes in your cluster will run the proxy. The following kube-proxy
parameters will be defined in our systemd service unit:
--master=http://127.0.0.1:8080
: The address of the Kubernetes API server for our Kubernetes Controller node. In the section below, we will configure ourkube-apiserver
to bind to the network of the host and be reachable on the loopback interface.--proxy-mode=iptables
: The proxy mode for ourkube-proxy
. At the time of writing the following two options are valid:userspace
(older, stable) oriptables
(experimental). If theiptables
mode is selected, but the system's kernel or iptables versions are insufficient, this always falls back to theuserspace
proxy.--hostname-override=$public_ipv4
: to facilitate routing without DNS resolution.
Add the kube-proxy Systemd Service Unit definition to your cloud-config-controller.yaml
. We insert this before the Kubelet Service to ensure the kube-proxy
binary is downloaded and started as soon as possible.
- #cloud-config
-
- write-files:
- - path: /opt/bin/pull-kube-images.sh
- permissions: '0755'
- content: |
- #!/bin/bash
- tag=1.1.2
- docker_wrapped_binaries=(
- "kube-apiserver"
- "kube-controller-manager"
- "kube-scheduler"
- )
- temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
- for binary in "${docker_wrapped_binaries[@]}"; do
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
- curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
- docker load -i ${temp_dir}/${binary}.tar
- docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
- done;
- rm -rf "${temp_dir}";
- exit $?
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: start
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
- - name: "docker.service"
- command: start
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
- - name: "pull-kube-images.service"
- command: start
- content: |
- [Unit]
- Description=Pull and load all Docker wrapped Kubernetes binaries
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target docker.service
- After=network-online.target docker.service
- [Service]
- ExecStart=/opt/bin/pull-kube-images.sh
- RemainAfterExit=yes
- Type=oneshot
- - name: "kube-proxy.service"
- command: start
- content: |
- [Unit]
- Description=Kubernetes Proxy
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target
- After=network-online.target
- [Service]
- ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
- ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
- # wait for kube-apiserver to be up and ready
- ExecStartPre=/bin/bash -c "until /usr/bin/curl http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
- ExecStart=/opt/bin/kube-proxy \
- --master=http://127.0.0.1:8080 \
- --proxy-mode=iptables \
- --hostname-override=$public_ipv4
- TimeoutStartSec=10
- Restart=always
- RestartSec=10
- - name: "kube-kubelet.service"
- command: "start"
- content: |
- [Unit]
- Description=Kubernetes Kubelet
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=docker.service
- After=docker.service
- [Service]
- ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
- ExecStart=/usr/bin/kubelet \
- --api-servers=http://127.0.0.1:8080 \
- --register-node=false \
- --allow-privileged=true \
- --config=/etc/kubernetes/manifests \
- --hostname-override=$public_ipv4 \
- --cluster-dns=10.3.0.10 \
- --cluster-domain=cluster.local
- Restart=always
- RestartSec=10
Note: At the time of writing, the kube-proxy
binary is 18.3MB while the docker wrapped image based on Debian with the Iptables package installed is over 180MB, downloading the kube-proxy
binary takes less than 10 seconds and is therefore the method used in this tutorial as opposed to running the proxy in a privileged Hyperkube container.
Note: By setting the TimeoutStartSec
to 10, Systemd will fail the kube-proxy if it hasn't started after 10 seconds, but it will be restarted after the specified RestartSec
timeout. We may notice these failures in the journal until the kube-apiserver
has started. Until we are certain the kube-apiserver
has started, these warning may be ignored. The cloudinit
utility will only continue with the next service after Systemd has failed the kube-proxy
once.
For the full overview of all kube-proxy
arguments, refer to the official Kubernetes documentation.
Step 7 — The Kubernetes Controller Pods
For every controller node, state-less and state-full services will be managed by the Kubelet using Pods. All state-less controller services will be managed through the Kubernetes manifest files dropped into the kubelet
's manifest folder (/etc/kubernetes/manifests
). All state-full controller services will be defined in Kubernetes manifest files managed by state-less master elector Pods. To understand this configuration, we require a good understanding of Kubernetes Manifest files.
Our kubelet will be used to manage our controller services within containers based on manifest files. In this section we will have a closer look at the structure of these files, you may refer to this section to understand manifest files better.
Introduction to Kubernetes Manifest files
Kubernetes manifests can be written using YAML or JSON, but only YAML provides the ability to add comments. All of the manifests accepted and returned by the server have a schema, identified by the kind
and apiVersion
fields. These fields are required for proper decoding of the object.
The kind
field takes a string that identifies the schema of an object, in our case we are writing a manifest to create Pod objects, as such we write kind: Pod
in our Pod manifest.
The apiVersion
field takes a string that identifies the API group & version of the schema of an object. API groups will enable the Kubernetes API to be broken down into modular groups which can then be enabled/disabled individually, versioned separately as well as provide 3rd parties the ability to develop Kubernetes plug-ins without naming conflicts. At the time of writing there are only 2 API groups:
- The "core" group, which currently consists of the original monolithic Kubernetes v1 API. This API group is simply omitted and specified only by it's version, for example:
apiVersion: v1
- The "extensions" group, which is the first API group introduced with v1.1. The
extensions
API group is still inv1beta1
at the time of writing, as such this API group is specified asapiVersion: extensions/v1beta1
. Resources within theextensions
API group can be enabled or disabled through the--runtime-config
flag passed on to the apiserver. For example, to disableHorizontalPodAutoscalers
andJobs
we may set--runtime-config=extensions/v1beta1/horizontalpodautoscalers=false,extensions/v1beta1/jobs=false
.
For a more detailed explanation of the Kubernetes API, refer to the API documentation.
Once the schema has been specified, Pod manifests mainly consist of the following key structures:
- A metadata structure for describing the pod and its labels
- A spec structure for describing volumes, and a list of containers that will run in the Pod.
The name
and namespace
of the metadata
structure are generally user provided. The name has to be unique within the namespace specified. An empty namespace is equivalent to the default
namespace. In our case, we will scope our Pods related to the Kubernetes system environment to the kube-system
namespace. We will combine all stateless controller service containers within one Pod and call it the kube-controller
Pod.
Every Pod spec
structure must at least have a list of containers with a minimum of 1 container. Each container in a pod must have a unique name
. For example, the API service container may be named identical to it's binary name: kube-apiserver
. Next, we may specify the image
for the container, the command
ran within the container (equivalent to the Docker image entrypoint
array), the args
passed on to the container process (equivalent to Docker image cmd
array), the volumeMounts
, ...
A special requirement for our controller service containers is that they need to use the host's network namespace. This can be achieved by setting the hostNetwork: true
for the spec
structure of our controller Pod manifest.
Thus, this is how our Pod manifest for the controller services starts:
apiVersion: v1
kind: Pod
metadata:
name: kube-controller
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: "kube-apiserver"
...
Master Election
By using a single Kubernetes controller Droplet, we have a single point of failure in our infrastructure. To ensure high availability we will need to run multiple controller nodes at some point. Every stateless Kubernetes component, such as the kube-apiserver
, can be scaled across multiple controller nodes without concern. However, there are components which modify the state of the cluster, such as the kube-controller-manager
and the kube-scheduler
. Of these components, only one instance may modify the state at a time. To achieve this, we need to have a way to ensure only 1 instance for each of these components is running which is done by setting up master election per component.
At the time of writing, master election is not integrated within the kube-controller-manager
and kube-scheduler
, but is planned to be added in the future. Until then, a powerful and generic master election utility called the Podmaster is recommended.
The Podmaster is a small (8MB) utility written in Go that uses Etcd's atomic CompareAndSwap
functionality to implement master election. The first controller node to reach the Etcd cluster wins the race and becomes the master node for that service, marking itself as such with an expiring key identifying the service. The Podmaster will then periodically extend its service key. If a Podmaster finds the service key it monitors has expired, it attempts to take over by setting itself as the new master for that service. If it is the current master, the Podmaster copies the manifest of its service into the manifests directory of its host, ensuring a single instance of the service is always running within the cluster. If the Podmaster finds it is no longer the master, it removes the manifest file from the manifests directory of its host, ensuring the kubelet will no longer run the service on that controller node.
A Podmaster instance may run for each service requiring master election, each instance takes the key identifying the service as well as a source manifest file and a destination manifest file. The Podmaster itself will run inside a container and a Docker image wrapping the Podmaster can be pulled from the Google Container Registry under the gcr.io/google_containers/podmaster
repository. At the time of writing there is only 1 tag: 1.1
.
Even though we are only creating one controller node in this tutorial, we will set up the master election for the controller manager and scheduler service by storing their manifest files under the /srv/kubernetes/manifests
path and letting Podmaster instances copy the manifest files to the /etc/kubernetes/manifests
path on the elected master node.
In a single-controller deployment, the Podmaster will simply ensure that the kube-scheduler
and kube-controller-manager
run on the current node. In a multi-controller deployment, the Podmaster will be responsible for ensuring no additional instances are started, unless a machine dies, in which case the Podmaster will ensure new instances are started on one of the other controller nodes.
As our Podmasters depend on Kubernetes volumes, we will see the full Podmaster configurations after defining the Kubernetes volumes and kube-apiserver
Pod manifests.
Kubernetes Volumes
At its core, a Kubernetes volume is just a directory, possibly with some data in it, which is accessible to the containers in a Pod. How that directory comes to be, the medium that backs it, and the contents of it are determined by the particular volume type used. Each volume type is backed by a Kubernetes volume plug-in.
For our controller services, we ensure the Pod is tied to our controller node, and we will use HostPath
type volumes. HostPath
type volumes represent a pre-existing file or directory on the host machine that is directly exposed to the container. They are generally used for system agents or other privileged things that are allowed to see the host machine.
We will place our API server certificates, once generated, in the following pre-defined path on the host:
- File:
/etc/kubernetes/ssl/ca.pem
- File:
/etc/kubernetes/ssl/apiserver.pem
- File:
/etc/kubernetes/ssl/apiserver-key.pem
The address of the controller node is required to generate these API Server certificates. On Digital Ocean, this address is not known in advance. Therefore, we will generate the certificates and securely copy them after we provision our controller Droplet in a separate step, but we will prepare our cloud-config
and the volumes defined in our Pod manifests to expect these certificates into these pre-defined host paths.
Every volume requires a name which is unique within the Pod. The name is how we reference the volumes when we mount them into the Pod containers.
We define the following volume collection as part of our controller Pod manifest:
- a
HostPath
volume to provision the Kubernetes TLS Credentials from the parent directory/etc/kubernetes/ssl
. - a
HostPath
volume for the list of "well-known" ca certificates - which, under CoreOS, is located under the read-only/usr/share/ca-certificates
path. - a
HostPath
volume to provision as a source for manifest files for master election for the Podmaster/srv/kubernetes/manifests
- a
HostPath
volume for the Podmaster to access the host manifest folder/etc/kubernetes/manifests
, where it will store the destination manifest files.
spec:
volumes:
- hostPath:
path: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-host
- hostPath:
path: /srv/kubernetes/manifests
name: manifest-src
- hostPath:
path: /etc/kubernetes/manifests
name: manifest-dst
The kube-apiserver
The first controller service we will configure in our controller Pod manifest is the API server. The API server is where most of the magic happens. It is stateless by design and takes in API requests, processes them and stores the result in Etcd if needed, and then returns the result of the request. the API server will run on every controller Droplet and will be stored directly under the kubelet manifest folder.
In this tutorial we are using individual Docker images wrapping each Kubernetes binary, in our Pod manifest we specify this binary as the entrypoint for the containers through the command
array together with all of its arguments.
Below is the kube-apiserver
container spec for our controller Pod, we will go through each argument in detail right after:
containers:
- name: "kube-apiserver"
image: "kube-apiserver:1.1.2"
command:
- "kube-apiserver"
- "--etcd-servers=http://127.0.0.1:2379"
- "--bind-address=0.0.0.0"
- "--secure_port=443"
- "--advertise-address=$public_ipv4"
- "--service-cluster-ip-range=10.3.0.0/24"
- "--service-node-port-range=30000-37000"
- "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
- "--allow-privileged=true"
- "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
- "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
- "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
ports:
- containerPort: 443
hostPort: 443
name: https
- containerPort: 8080
hostPort: 8080
name: local
volumeMounts:
- mountPath: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
readOnly: true
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
As highlighted in the [Kubernetes manifest files](#) section of this tutorial, our kube-controller
Pod uses the host's network namespace and each container running within the Pod can reach the host services, such as the Etcd proxy, over localhost.
--etcd-servers=[]
: By design, thekube-apiserver
component is the only Kubernetes component communicating with Etcd. We specify the location of the Etcd cluster through the--etcd-servers=[]
flag. This flag takes a comma separated list of etcd servers to watch. In this tutorial we bind an Etcd proxy for the cluster to the loopback interface of each Droplet, thus the Etcd cluster can be reached throughhttp://127.0.0.1:2379
. Also note that by default, Kubernetes objects are stored under the/registry
key in Etcd. We could prefix this path by also setting the--etcd-prefix="/foo"
flag, but wont do this for this tutorial.--bind-address=0.0.0.0
: The IP address on which the API server listens for requests. We explicitely configure our API server to listen on all interfaces of the host.--secure-port=443
: To enable HTTPS with authentication and authorization we need to set this flag.
<!-- TODO: use $private_ipv4? -->
--advertise-address=$public_ipv4
: The IP address on which to advertise the apiserver to members of the cluster. This address must be reachable by the rest of the cluster. If blank, the--bind-address
will be used, which would not work in our set up.--service-cluster-ip-range=10.3.0.0/24
: A required CIDR notation IP range from which to assign service cluster IPs. See the [Running the kubelet in standalone mode](#) section for more details on how this is used within Kubernetes, we use10.3.0.0/24
for Kubernetes Services within this Tutorial. Modify these values to mirror your own configuration.--service-node-port-range=30000-37000
: A port range to reserve for services with NodePort visibility. If we do not specify this range we will not be able to run some of the Kubernetes service examples using nodePort.--admission-control=[]
: In Kubernetes, API requests need to pass through a chain of admission controllers after authentication and authorization but prior to being accepted and executed. Admission controllers are chained plug-ins, many advanced features in Kubernetes require an admission control plug-in to be enabled in order to properly support the feature. As a result, a Kubernetes API server that is not properly configured with the right set of admission control plug-ins is an incomplete server and will not support all the features you expect. The recommended set of admission controllers for Kubernetes 1.0 isNamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
. We would like to highlight theNamespaceLifecycle
plug-in which ensures that API requests in a non-existant Namespace are rejected. Due to this, we will be required to manually create thekube-system
namespace used by our controller services once ourkube-apiserver
is available or our other nodes won't be able to discover them.--allow-privileged=true
: We have to explicitely allow privileged containers to run in our cluster.--tls-cert-file="etc/kubernetes/ssl/apiserver.pem"
: The certificate used for SSL/TLS connections to the API Server. We will generate The apiserver certificate containing host identities (DNS name, IP, ..) and securely copy it to our controller Droplet in a separate step. If HTTPS serving is enabled, and--tls-cert-file
and--tls-private-key-file
are not provided, a self-signed certificate and key are generated for the public address and saved to/var/run/kubernetes
. If you intend to use this approach, ensure to provide a volume for/var/run/kubernetes/
as well.--tls-private-key-file="/etc/kubernetes/ssl/apiserver-key.pem"
: The API Server private key matching the--tls-cert-file
we generated.--client-ca-file="/etc/kubernetes/ssl/ca.pem"
: The trusted certificate authority, Kubernetes will check all incoming HTTPs request for a client certificate signed by this trusted CA. Any request presenting a client certificate signed by one of the authorities in theclient-ca-file
is authenticated with an identity corresponding to the CommonName of the client certificate.--service-account-key-file="/etc/kubernetes/ssl/apiserver-key.pem"
: used to verify ServiceAccount tokens. We explicitely set this to the same private key as our--tls-private-key-file
flag. If - unspecified,--tls-private-key-file
is also used.
Refer to the full kube-apiserver reference for a full overview of all API server flags.
Creating The kube-system namespace
<!-- Move to Step 8 — Finalizing the Controller Cloud Config? -->
As soon as the kube-apiserver
is available we need to create the kube-system
namespace used by our controller services or our cluster nodes won't be able to discover them. In this section we define the Systemd unit responsible for this.
We wait until the kube-apiserver
service has started, the same way as our kube-proxy
service was configured to wait:
ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
The command to create the namespace using the Kubernetes API is:
- curl -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"
We are passing in a the Manifest file as a JSON string in this case.
Putting the command into a oneshot Systemd unit which depends on a successful start of the kubelet service, gives us the following unit definition:
coreos:
units:
- name: "create-kube-system-ns.service"
command: "start"
content: |
[Unit]
Description=Create the kube-system namespace
Documentation=https://github.com/kubernetes/kubernetes
Requires=kube-kubelet.service
After=kube-kubelet.service
[Service]
ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
ExecStart=/usr/bin/curl -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"
RemainAfterExit=yes
Type=oneshot
The kube-podmaster
In this section we add our Podmaster containers to our kube-controller
Pod manifest. As mentioned in the [Controller Services: Master Election](#) section of this tutorial, the kube-scheduler
and kube-controller-manager
services require master election. We will create 1 Podmaster container for each component requiring master election and define the Pod manifests in the following sections.
We will go into the contents of the kube-scheduler.yaml
Pod manifest and kube-controller-manager.yaml
Pod manifest after finalizing this kube-controller.yaml
Pod manifest.
As the kube-controller
Pod shares the host network, our Podmaster containers can reach the Etcd cluster via the localhost Etcd proxy. To ease the setup, we overwrite the hostname the Podmaster stores in the master reservation with the Droplet public IP by setting the --whoami
flag. The Droplet IP is always routable without the need for DNS services. We mount the manifest-src
volume as a read only volume within the Podmaster containers. The manifest-dst
volume is the path monitored by the Kubelet and needs to be writable by the Podmaster.
Here is the Podmaster container managing the master election for the kube-scheduler
service
<!-- TODO: should this be $private_ipv4?-->
containers:
- name: "scheduler-elector"
image: "gcr.io/google_containers/podmaster:1.1"
args:
- "--whoami=$public_ipv4"
- "--etcd-servers=http://127.0.0.1:2379"
- "--key=scheduler"
- "--source-file=/src/manifests/kube-scheduler.yaml"
- "--dest-file=/dst/manifests/kube-scheduler.yaml"
volumeMounts:
- mountPath: /src/manifests
name: manifest-src
readOnly: true
- mountPath: /dst/manifests
name: manifest-dst
For the kube-scheduler
our Podmaster sets the value of the scheduler
key in Etcd to record which controller Droplet is the master. We point this Podmaster to the kube-scheduler
Pod manifest source and destination files.
For the kube-controller-manager
the master elector looks almost identical, apart from the key, source and destination manifest files. The key used for the kube-controller-manager
is controller
and the kube-controller-manager.yaml
Pod manifest file is used instead.
<!-- TODO: should this be $private_ipv4?-->
containers
- name: "controller-manager-elector"
image: "gcr.io/google_containers/podmaster:1.1"
args:
- "--whoami=$public_ipv4"
- "--etcd-servers=http://127.0.0.1:2379"
- "--key=controller"
- "--source-file=/src/manifests/kube-controller-manager.yaml"
- "--dest-file=/dst/manifests/kube-controller-manager.yaml"
volumeMounts:
- mountPath: /src/manifests
name: manifest-src
readOnly: true
- mountPath: /dst/manifests
name: manifest-dst
Combining kube-controller Pod manifest snippets
Combining all the kube-controller.yaml
snippets above into a single kube-controller Pod manifest:
-
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-controller
- namespace: kube-system
- spec:
- hostNetwork: true
- volumes:
- - hostPath:
- path: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- - hostPath:
- path: /usr/share/ca-certificates
- name: ssl-certs-host
- - hostPath:
- path: /srv/kubernetes/manifests
- name: manifest-src
- - hostPath:
- path: /etc/kubernetes/manifests
- name: manifest-dst
- containers:
- - name: "kube-apiserver"
- image: "kube-apiserver:1.1.2"
- command:
- - "kube-apiserver"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--bind-address=0.0.0.0"
- - "--secure_port=443"
- - "--advertise-address=$public_ipv4"
- - "--service-cluster-ip-range=10.3.0.0/24"
- - "--service-node-port-range=30000-37000"
- - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
- - "--allow-privileged=true"
- - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
- - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
- - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- ports:
- - containerPort: 443
- hostPort: 443
- name: https
- - containerPort: 8080
- hostPort: 8080
- name: local
- volumeMounts:
- - mountPath: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- readOnly: true
- - mountPath: /etc/ssl/certs
- name: ssl-certs-host
- readOnly: true
- - name: "scheduler-elector"
- image: "gcr.io/google_containers/podmaster:1.1"
- args:
- - "--whoami=$public_ipv4"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--key=scheduler"
- - "--source-file=/src/manifests/kube-scheduler.yaml"
- - "--dest-file=/dst/manifests/kube-scheduler.yaml"
- volumeMounts:
- - mountPath: /src/manifests
- name: manifest-src
- readOnly: true
- - mountPath: /dst/manifests
- name: manifest-dst
- - name: "controller-manager-elector"
- image: "gcr.io/google_containers/podmaster:1.1"
- args:
- - "--whoami=$public_ipv4"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--key=controller"
- - "--source-file=/src/manifests/kube-controller-manager.yaml"
- - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
- volumeMounts:
- - mountPath: /src/manifests
- name: manifest-src
- readOnly: true
- - mountPath: /dst/manifests
- name: manifest-dst
The kube-controller Pod Pre-conditions
<!-- Move to Step 8 — Finalizing the Controller Cloud Config? -->
The kube-apiserver
requires the TLS assets to be in place, if these are not in place the container will die after starting. The kubelet
will create a new container every 5 minutes until the container stays up. To keep the error logs and dead containers minimal during first boot, we prefer to hold off on putting the kube-controller
Pod manifest in the kubelet manifest directory until the kube-apiserver
TLS assets are available. We will use the write_files
directive to create the kube-controller
Pod manifest under the /srv/kubernetes/manifests/
Path until then.
We will use a Systemd unit to monitor the /etc/kubernetes/ssl
path and copy the kube-controller
manifest file to the kubelet manifest directory as soon as the TLS assets are detected.
The following loop sleeps until all 3 TLS assets required on the controller node, are available:
- until [ `ls -1 /etc/kubernetes/ssl/{apiserver,apiserver-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo "waiting for TLS assets...";sleep 5; done
Putting this into a oneshot Systemd unit which starts as soon as the kubelet is ready, gives us the following unit definition:
...
coreos:
units:
- name: "tls-ready.service"
command: "start"
content: |
[Unit]
Description=Ensure TLS assets are ready
Requires=kube-kubelet.service
After=kube-kubelet.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{apiserver,apiserver-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
ExecStart=/usr/bin/cp /srv/kubernetes/manifests/kube-controller.yaml /etc/kubernetes/manifests/
...
We will now proceed with defining the master elected kube-scheduler
and kube-controller-manager
Pod manifests which will also be stored under the /srv/kubernetes/manifests
path.
The kube-controller-manager Pod manifest
The controller manager embeds the core control loops within Kubernetes such as the replication controller, endpoints controller, namespace controller and serviceaccount controller. In short, a control loop watches the shared state of the cluster through the kube-apiserver
and makes changes attempting to move the current state towards the desired state.
For example, if you increased the replica count for a replication controller, the controller manager would generate a scale up event, which would cause a new Pod to get scheduled in the cluster. The controller manager communicates with the API to submit these events.
We start writing this Pod manifest in exactly the same way as our kube-controller
Pod manifest, but with it's own unique name in the kube-system
namespace:
apiVersion: v1
kind: Pod
metadata:
name: kube-controller-manager
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: "kube-controller-manager"
...
This Pod also shares the network with the host (hostNetwork: true
), allowing the containers running within to access the kube-apiserver
through localhost as well as exposing themselves to the kubelet over localhost.
We define volumes for the ssl certificates and list off "well-known" ca certificates stored on the host so we can mount these into the Pod containers:
<!-- TODO: Do we need the "well-known" certificates mount? It seems it is not used?? -->
spec:
...
volumes:
- hostPath:
path: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-host
Our kube-controller-manager
is called with the following arguments:
kube-controller-manager \
--master=http://127.0.0.1:8080 \
--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
--root-ca-file=/etc/kubernetes/ssl/ca.pem
We provide the address of the kube-apiserver
via the --master=http://127.0.0.1:8080
flag. We provide the private key (to sign service account tokens) and our Kubernetes cluster root CA certificate for inclusion in service account tokens via the --service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
and --root-ca-file=/etc/kubernetes/ssl/ca.pem
flags respectively.
We are also adding a livenessProbe to our Pod manifest. This is a diagnostic performed periodically by the kubelet on a container. The LivenessProbe hints to the kubelet when a container is unhealthy. If the LivenessProbe fails, the kubelet will kill the container and the container will be subjected to its RestartPolicy
. If RestartPolicy
is not set, the default value is Always
. The default state of Liveness before the initial delay is Success
. The state of Liveness for a container when no probe is provided is assumed to be Success
.
The httpGet
handler used in our livenessProbe performs an HTTP Get against the provided IP address on a specified port and path expecting on success that the response has a status code greater than or equal to 200 and less than 400. Note the default port used by kube-controller-manager
is always 10252
and the Kubernetes "healtz" package registers a handler on the '/healthz' path , that serves 200s.
This gives us the following container spec for our kube-controller-manager
container:
spec:
...
containers:
- name: "kube-controller-manager"
image: "kube-controller-manager:1.1.2"
command:
- "kube-controller-manager"
- "--master=http://127.0.0.1:8080"
- "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
livenessProbe:
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
initialDelaySeconds: 15
timeoutSeconds: 1
volumeMounts:
- mountPath: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
readOnly: true
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
Refer to the official kube-controller-manager reference for a full overview of all arguments.
Combining the above snippets together, the full kube-controller-manager.yaml
Pod manifest file will look as follows:
-
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-controller-manager
- namespace: kube-system
- spec:
- hostNetwork: true
- volumes:
- - hostPath:
- path: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- - hostPath:
- path: /usr/share/ca-certificates
- name: ssl-certs-host
- containers:
- - name: "kube-controller-manager"
- image: "kube-controller-manager:1.1.2"
- command:
- - "kube-controller-manager"
- - "--master=http://127.0.0.1:8080"
- - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
- livenessProbe:
- httpGet:
- host: 127.0.0.1
- path: /healthz
- port: 10252
- initialDelaySeconds: 15
- timeoutSeconds: 1
- volumeMounts:
- - mountPath: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- readOnly: true
- - mountPath: /etc/ssl/certs
- name: ssl-certs-host
- readOnly: true
The kube-scheduler Pod manifest
The scheduler is the last major piece of our control services. It monitors the API for unscheduled pods, finds them a machine to run on, and communicates the decision back to the API.
The full kube-scheduler.yaml
Pod manifest file introduces no new concepts, does liveness probes on the scheduler default port of 10251
, doesn't require any volumes and looks as follows:
-
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-scheduler
- namespace: kube-system
- spec:
- hostNetwork: true
- containers:
- - name: "kube-scheduler"
- image: "kube-scheduler:1.1.2"
- command:
- - "kube-scheduler"
- - "--master=http://127.0.0.1:8080"
- livenessProbe:
- httpGet:
- host: 127.0.0.1
- path: /healthz
- port: 10251
- initialDelaySeconds: 15
- timeoutSeconds: 1
Refer to the official kube-scheduler reference for a full overview of all arguments.
Step 8 — Finalizing the Controller Cloud Config and Provisioning the Controller Droplet
We will embed the Pod manifest files constructed in Step 7 into our controller cloud-config
file.
Embedding all Pod manifests into the Controller cloud-config
We will store each manifest under the following paths:
- /srv/kubernetes/manifests/kube-scheduler.yaml
- /srv/kubernetes/manifests/kube-controller-manager.yaml
- /srv/kubernetes/manifests/kube-controller.yaml
This is achieved through the write-files
directive highlighted earlier.
- #cloud-config
-
- write-files:
- - path: "/srv/kubernetes/manifests/kube-scheduler.yaml"
- permissions: "0644"
- owner: "root"
- content: |
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-scheduler
- namespace: kube-system
- spec:
- hostNetwork: true
- containers:
- - name: "kube-scheduler"
- image: "kube-scheduler:1.1.2"
- command:
- - "kube-scheduler"
- - "--master=http://127.0.0.1:8080"
- livenessProbe:
- httpGet:
- host: 127.0.0.1
- path: /healthz
- port: 10251
- initialDelaySeconds: 15
- timeoutSeconds: 1
- - path: "/srv/kubernetes/manifests/kube-controller-manager.yaml"
- permissions: "0644"
- owner: "root"
- content: |
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-controller-manager
- namespace: kube-system
- spec:
- hostNetwork: true
- volumes:
- - hostPath:
- path: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- - hostPath:
- path: /usr/share/ca-certificates
- name: ssl-certs-host
- containers:
- - name: "kube-controller-manager"
- image: "kube-controller-manager:1.1.2"
- command:
- - "kube-controller-manager"
- - "--master=http://127.0.0.1:8080"
- - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
- livenessProbe:
- httpGet:
- host: 127.0.0.1
- path: /healthz
- port: 10252
- initialDelaySeconds: 15
- timeoutSeconds: 1
- volumeMounts:
- - mountPath: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- readOnly: true
- - mountPath: /etc/ssl/certs
- name: ssl-certs-host
- readOnly: true
- - path: "/srv/kubernetes/manifests/kube-controller.yaml"
- permissions: "0644"
- owner: "root"
- content: |
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-controller
- namespace: kube-system
- spec:
- hostNetwork: true
- volumes:
- - hostPath:
- path: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- - hostPath:
- path: /usr/share/ca-certificates
- name: ssl-certs-host
- - hostPath:
- path: /srv/kubernetes/manifests
- name: manifest-src
- - hostPath:
- path: /etc/kubernetes/manifests
- name: manifest-dst
- containers:
- - name: "kube-apiserver"
- image: "kube-apiserver:1.1.2"
- command:
- - "kube-apiserver"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--bind-address=0.0.0.0"
- - "--secure_port=443"
- - "--advertise-address=$public_ipv4"
- - "--service-cluster-ip-range=10.3.0.0/24"
- - "--service-node-port-range=30000-37000"
- - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
- - "--allow-privileged=true"
- - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
- - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
- - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- ports:
- - containerPort: 443
- hostPort: 443
- name: https
- - containerPort: 8080
- hostPort: 8080
- name: local
- volumeMounts:
- - mountPath: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- readOnly: true
- - mountPath: /etc/ssl/certs
- name: ssl-certs-host
- readOnly: true
- - name: "scheduler-elector"
- image: "gcr.io/google_containers/podmaster:1.1"
- args:
- - "--whoami=$public_ipv4"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--key=scheduler"
- - "--source-file=/src/manifests/kube-scheduler.yaml"
- - "--dest-file=/dst/manifests/kube-scheduler.yaml"
- volumeMounts:
- - mountPath: /src/manifests
- name: manifest-src
- readOnly: true
- - mountPath: /dst/manifests
- name: manifest-dst
- - name: "controller-manager-elector"
- image: "gcr.io/google_containers/podmaster:1.1"
- args:
- - "--whoami=$public_ipv4"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--key=controller"
- - "--source-file=/src/manifests/kube-controller-manager.yaml"
- - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
- volumeMounts:
- - mountPath: /src/manifests
- name: manifest-src
- readOnly: true
- - mountPath: /dst/manifests
- name: manifest-dst
-
The Final Controller cloud-config with all CoreOS Units
To finally create the controller Droplet, we will combine all above cloud-config
snippets into a single cloud-config
file:
write-files
snippets:/opt/bin/pull-kube-images.sh
script to pre-load the Kubernetes docker images/srv/kubernetes/manifests/kube-scheduler.yaml
Pod manifest source for thekube-scheduler
/srv/kubernetes/manifests/kube-controller-manager.yaml
Pod manifest source for thekube-controller-manager
/etc/kubernetes/manifests/kube-controller.yaml
Pod manifest to start thekube-apiserver
,controller-manager-elector
andscheduler-elector
etcd2.service
snippet to start a local Etcd proxy, notice theETCD_PEER
placeholder.flanneld.service
snippet to start the overlay network daemon with a drop-in to configure the network subnetdocker.service
drop-in snippet to add flannel dependencykubelet.service
snippet running the kubelet in standalone modekube-proxy.service
snippet running thekube-proxy
servicekube-pull-images.service
snippet running the script to pre-load the Kubernetes docker imagescreate-kube-system.service
snippet creating thekube-system
namespace as soon as the API server is available
Several of these services depend on the TLS assets, which we generate as soon as the IP addresses are known for our Droplet.
In a multi-controller set-up, every controller node may be created using this cloud-config
, although the flanneld
drop-in and create-kube-system.service
unit only need to be ran once within the cluster and are not required on subsequent controller nodes.
<!-- TODO: test if including the flannel config commands twice cause the other controller nodes to fail? It should not-->
As we are running a single controller node, we are also turning off CoreOS updates and reboots in our cloud-config
.
- #cloud-config
-
- write-files:
- - path: /opt/bin/pull-kube-images.sh
- permissions: '0755'
- content: |
- #!/bin/bash
- tag=1.1.2
- docker_wrapped_binaries=(
- "kube-apiserver"
- "kube-controller-manager"
- "kube-scheduler"
- )
- temp_dir="$(mktemp -d -t 'kube-server-XXXX')"
- for binary in "${docker_wrapped_binaries[@]}"; do
- docker_tag="$(curl -sL https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.docker_tag)"
- curl -sLo ${temp_dir}/${binary}.tar https://storage.googleapis.com/kubernetes-release/release/v${tag}/bin/linux/amd64/${binary}.tar
- docker load -i ${temp_dir}/${binary}.tar
- docker tag -f "gcr.io/google_containers/${binary}:${docker_tag}" "${binary}:${tag}"
- done;
- rm -rf "${temp_dir}";
- exit $?
- - path: "/srv/kubernetes/manifests/kube-scheduler.yaml"
- permissions: "0644"
- owner: "root"
- content: |
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-scheduler
- namespace: kube-system
- spec:
- hostNetwork: true
- containers:
- - name: "kube-scheduler"
- image: "kube-scheduler:1.1.2"
- command:
- - "kube-scheduler"
- - "--master=http://127.0.0.1:8080"
- livenessProbe:
- httpGet:
- host: 127.0.0.1
- path: /healthz
- port: 10251
- initialDelaySeconds: 15
- timeoutSeconds: 1
- - path: "/srv/kubernetes/manifests/kube-controller-manager.yaml"
- permissions: "0644"
- owner: "root"
- content: |
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-controller-manager
- namespace: kube-system
- spec:
- hostNetwork: true
- volumes:
- - hostPath:
- path: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- - hostPath:
- path: /usr/share/ca-certificates
- name: ssl-certs-host
- containers:
- - name: "kube-controller-manager"
- image: "kube-controller-manager:1.1.2"
- command:
- - "kube-controller-manager"
- - "--master=http://127.0.0.1:8080"
- - "--service-account-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- - "--root-ca-file=/etc/kubernetes/ssl/ca.pem"
- livenessProbe:
- httpGet:
- host: 127.0.0.1
- path: /healthz
- port: 10252
- initialDelaySeconds: 15
- timeoutSeconds: 1
- volumeMounts:
- - mountPath: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- readOnly: true
- - mountPath: /etc/ssl/certs
- name: ssl-certs-host
- readOnly: true
- - path: "/srv/kubernetes/manifests/kube-controller.yaml"
- permissions: "0644"
- owner: "root"
- content: |
- apiVersion: v1
- kind: Pod
- metadata:
- name: kube-controller
- namespace: kube-system
- spec:
- hostNetwork: true
- volumes:
- - hostPath:
- path: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- - hostPath:
- path: /usr/share/ca-certificates
- name: ssl-certs-host
- - hostPath:
- path: /srv/kubernetes/manifests
- name: manifest-src
- - hostPath:
- path: /etc/kubernetes/manifests
- name: manifest-dst
- containers:
- - name: "kube-apiserver"
- image: "kube-apiserver:1.1.2"
- command:
- - "kube-apiserver"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--bind-address=0.0.0.0"
- - "--secure_port=443"
- - "--advertise-address=$public_ipv4"
- - "--service-cluster-ip-range=10.3.0.0/24"
- - "--service-node-port-range=30000-37000"
- - "--admission-control=NamespaceLifecycle,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota"
- - "--allow-privileged=true"
- - "--tls-cert-file=/etc/kubernetes/ssl/apiserver.pem"
- - "--tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- - "--client-ca-file=/etc/kubernetes/ssl/ca.pem"
- - "--service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem"
- ports:
- - containerPort: 443
- hostPort: 443
- name: https
- - containerPort: 8080
- hostPort: 8080
- name: local
- volumeMounts:
- - mountPath: /etc/kubernetes/ssl
- name: ssl-certs-kubernetes
- readOnly: true
- - mountPath: /etc/ssl/certs
- name: ssl-certs-host
- readOnly: true
- - name: "scheduler-elector"
- image: "gcr.io/google_containers/podmaster:1.1"
- args:
- - "--whoami=$public_ipv4"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--key=scheduler"
- - "--source-file=/src/manifests/kube-scheduler.yaml"
- - "--dest-file=/dst/manifests/kube-scheduler.yaml"
- volumeMounts:
- - mountPath: /src/manifests
- name: manifest-src
- readOnly: true
- - mountPath: /dst/manifests
- name: manifest-dst
- - name: "controller-manager-elector"
- image: "gcr.io/google_containers/podmaster:1.1"
- args:
- - "--whoami=$public_ipv4"
- - "--etcd-servers=http://127.0.0.1:2379"
- - "--key=controller"
- - "--source-file=/src/manifests/kube-controller-manager.yaml"
- - "--dest-file=/dst/manifests/kube-controller-manager.yaml"
- volumeMounts:
- - mountPath: /src/manifests
- name: manifest-src
- readOnly: true
- - mountPath: /dst/manifests
- name: manifest-dst
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: "start"
- drop-ins:
- - name: 50-network-config.conf
- content: |
- [Unit]
- Requires=etcd2.service
- [Service]
- ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.2.0.0/16", "Backend": {"Type": "vxlan"}}'
- - name: "docker.service"
- command: "start"
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
- - name: "pull-kube-images.service"
- command: "start"
- content: |
- [Unit]
- Description=Pull and load all Docker wrapped Kubernetes binaries
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target docker.service
- After=network-online.target docker.service
- [Service]
- ExecStart=/opt/bin/pull-kube-images.sh
- RemainAfterExit=yes
- Type=oneshot
- - name: "kube-proxy.service"
- command: "start"
- content: |
- [Unit]
- Description=Kubernetes Proxy
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target
- After=network-online.target
- [Service]
- ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
- ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
- # wait for kube-apiserver to be up and ready
- ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
- ExecStart=/opt/bin/kube-proxy \
- --master=http://127.0.0.1:8080 \
- --proxy-mode=iptables \
- --hostname-override=$public_ipv4
- TimeoutStartSec=10
- Restart=always
- RestartSec=10
- - name: "kube-kubelet.service"
- command: "start"
- content: |
- [Unit]
- Description=Kubernetes Kubelet
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=docker.service
- After=docker.service
- [Service]
- ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
- ExecStart=/usr/bin/kubelet \
- --api-servers=http://127.0.0.1:8080 \
- --register-node=false \
- --allow-privileged=true \
- --config=/etc/kubernetes/manifests \
- --hostname-override=$public_ipv4 \
- --cluster-dns=10.3.0.10 \
- --cluster-domain=cluster.local
- Restart=always
- RestartSec=10
- - name: "tls-ready.service"
- command: "start"
- content: |
- [Unit]
- Description=Ensure TLS assets are ready
- Requires=kube-kubelet.service
- After=kube-kubelet.service
- [Service]
- Type=oneshot
- RemainAfterExit=yes
- ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{apiserver,apiserver-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
- ExecStart=/usr/bin/cp /srv/kubernetes/manifests/kube-controller.yaml /etc/kubernetes/manifests/
- - name: "create-kube-system-ns.service"
- command: "start"
- content: |
- [Unit]
- Description=Create the kube-system namespace
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=kube-kubelet.service
- After=kube-kubelet.service
- [Service]
- ExecStartPre=/bin/bash -c "until /usr/bin/curl -s http://127.0.0.1:8080; do echo \"waiting for API server to come online...\"; sleep 3; done"
- ExecStart=/usr/bin/curl -XPOST -d'{"apiVersion":"v1","kind":"Namespace","metadata":{"name":"kube-system"}}' "http://127.0.0.1:8080/api/v1/namespaces"
- RemainAfterExit=yes
- Type=oneshot
- update:
- group: alpha
- reboot-strategy: off
Create the Controller Droplet
Validate your cloud-config file, then create your kube-controller-01
droplet with the following Doctl command. :
Ensure your ETCD_PEER
environment variable is still set from the [Deploy the data storage back end](#) section of this tutorial:
- $ echo $ETCD_PEER
- http://10.129.69.201:2380
If not - set it to the private_ip
of your single node ECTD cluster:
- export ETCD_PEER=`doctl -f json d f etcd-01.$region | jq -r '.networks.v4[] | select(.type == "private") | "http://\(.ip_address):2380"'`
Substitute the ETCD_PEER
placeholder from above cloud-config-controller.yaml
template file with the following command:
- sed -e "s|ETCD_PEER|${ETCD_PEER}|g;" cloud-config-controller.yaml > kube-controller.yaml
And send the command to create the droplet:
- doctl d c --wait-for-active \
- -i "CoreOS-alpha" \
- -s 512mb \
- -r "$region" \
- -p \
- -k k8s-key \
- -uf kube-controller.yaml kube-controller-01
<!-- TODO: Droplet size? -->
Note: running free -m
on a 512mb
Droplet shows only 12mb free memory after all controller services have started, it may be better to use a 1024mb
droplet to fully test Kubernetes.
We are waiting for the droplet to be flagged as active before proceeding. Once the Doctl command completes, the Droplet configuration is returned. As it usually takes more time for the Droplet to return its public and private ip addresses, we need to re-query the Droplet configuration. We will cache the json string returned in the $CONTROLLER_JSON
environment variable for subsequent commands:
- CONTROLLER_JSON=`doctl -f 'json' d f kube-controller-01.$region`
We parse the private and public IPs out as explained in the [Working with doctl responses](#) section of this tutorial.
- CONTROLLER_PUBLIC_IP=`echo $CONTROLLER_JSON | jq -r '.networks.v4[] | select(.type == "public") | .ip_address'`
- CONTROLLER_PRIVATE_IP=`echo $CONTROLLER_JSON | jq -r '.networks.v4[] | select(.type == "private") | .ip_address'`
Confirm values were populated correctly:
- echo $CONTROLLER_PUBLIC_IP && echo $CONTROLLER_PRIVATE_IP
You may monitor the initialization process driven by cloud-config
by connecting to the Droplet:
- ssh core@$CONTROLLER_PUBLIC_IP
and follow the oem-cloudinit
service running the cloud-config
:
- journalctl -u oem-cloudinit -f
Once the oem-cloudinit
service has reached the "tls-ready.service" it will wait for our actions. CTRL+C
and confirm Etcd proxy is running:
- systemctl status etcd2
Confirm Flannel service started
- systemctl status flanneld
If Flannel started, confirm it was able to retrieve its configuration from Etcd:
- cat /run/flannel/subnet.env
The Docker daemon options for the overlay network generated by Flannel are stored under /run/flannel_docker_opts.env
:
- cat /run/flannel_docker_opts.env
Confirm all services are running:
- systemctl status tls-ready
- systemctl status docker
- systemctl status pull-kube-images
confirm the docker images loaded the kube images have all been loaded by running
- docker images | grep kube
confirm all files have been written to disk:
- ls -l /opt/bin/
- ls -l /srv/kubernetes/manifests/
Monitor when the kubelet will launch the containers (which will happen as soon as we copy the TLS assets)
- watch -n 1 'docker ps --format="table {{.Image}}\t{{.ID}}\t{{.Status}}\t{{.Ports}}" -a'
If the oem-cloudinit
failed, review the cloud-config
stored by the Digital Ocean Metadata Service:
- curl -sL 169.254.169.254/metadata/v1/user-data | less
If you find a mistake in the cloud-config
, your only option is to delete and re-create the Droplet.
Generating and Transferring the kube-apiserver TLS Assets
The address of the controller node is required for the API Server certificate. In most cases this will be the publicly routable IP or hostname of the controller cluster. Worker nodes must be able to reach the controller node(s) via this address on port 443. Additionally, external clients (such as an administrator using kubectl) will also need access, since this will run the Kubernetes API endpoint.
If you will be running a highly-available control-plane consisting of multiple controller nodes, then the host name for the certificate will ideally be pointing at a network load balancer that sits in front of the controller nodes. Alternatively, a DNS name can be configured which will resolve to the controller node IPs. In either case, the certificate which is generated next, needs to have the correct CommonName and/or SubjectAlternateNames.
Ensure you have populated the $CONTROLLER_PUBLIC_IP
, $region
and $CONTROLLER_PRIVATE_IP
variables:
- echo $region && echo $CONTROLLER_PUBLIC_IP && echo $CONTROLLER_PRIVATE_IP
which should show output similar to:
Output$ echo $region && echo $CONTROLLER_PUBLIC_IP && echo $CONTROLLER_PRIVATE_IP
ams2
188.166.252.4
10.130.158.66
The API Server will take the first IP in the Kubernetes Service IP range. In this tutorial we are using the 10.3.0.1/24
IP range for the cluster services (See [Running the Kubelet in standalone mode](#)). The IP used by the apiserver
service within Kubernetes is thus 10.3.0.1
and needs to be included in the API server certificate. If you are using a different Service IP range, update the value in the configuration file below.
Now we are ready to prepare the openssl config file (see CoreOS OpenSSL tutorial).
- cat > openssl.cnf <<EOF
- [req]
- req_extensions = v3_req
- distinguished_name = req_distinguished_name
- [req_distinguished_name]
- [ v3_req ]
- basicConstraints = CA:FALSE
- keyUsage = nonRepudiation, digitalSignature, keyEncipherment
- subjectAltName = @alt_names
- [alt_names]
- DNS.1 = kube-controller-01.$region
- IP.1 = 10.3.0.1
- IP.2 = $CONTROLLER_PUBLIC_IP
- IP.3 = $CONTROLLER_PRIVATE_IP
- EOF
<!-- TODO: change apiserver openssl.cnf
use
$ENV::CONTROLLER_PUBLIC_IP
instead of cat <<EOF...?
refer to worker config
-->
Generate the API server private key (apiserver-key.pem
) which is needed to create the signing request:
- openssl genrsa -out ~/.kube/apiserver-key.pem 2048
Generate the Certificate Signing Request (CSR):
- openssl req -new -key ~/.kube/apiserver-key.pem -out apiserver.csr -subj "/CN=kube-apiserver" -config openssl.cnf
And finally, use the Certificate Authority to generate the signed API Server certificate (apiserver.pem
):
- openssl x509 -req -in apiserver.csr \
- -CA "$HOME/.kube/ca.pem" \
- -CAkey "$HOME/.kube/ca-key.pem" \
- -CAcreateserial \
- -out "$HOME/.kube/apiserver.pem" \
- -days 365 \
- -extensions v3_req \
- -extfile openssl.cnf
Note: the above command does not work on git-for-windows
due to windows path conversions, it is recommended to copy the apiserver.csr
and openssl.cnf
to ~/.kube/
and just run the command from within the ~/.kube/
directory (without the "$HOME/.kube/"
parts)
Copy the necessary certificates to the controller node. The core
user does not have write permissions to /etc/kubernetes/ssl
directly, thus we store the files in the home directory first.
- scp ~/.kube/apiserver-key.pem ~/.kube/apiserver.pem ~/.kube/ca.pem core@$CONTROLLER_PUBLIC_IP:~
Move the certificates from the Home directory to the /etc/kubernetes/ssl
path and fix the permissions by executing the following commands over ssh:
- ssh core@$CONTROLLER_PUBLIC_IP <<EOF
- sudo mkdir -p /etc/kubernetes/ssl/
- sudo mv ~core/*.pem /etc/kubernetes/ssl/
- sudo chown root:root /etc/kubernetes/ssl/*.pem
- sudo chmod 600 /etc/kubernetes/ssl/*-key.pem
- EOF
Troubleshooting: Review the certificate contents with the following command:
- openssl x509 -text -noout -in apiserver.pem
As soon as the certificates are available it will take just a few minutes for all the Controller services to start running.
The kube-proxy
service will start as soon as the kube-apiserver
is available. As we specify the iptables
mode, it will try to flush the userpace
settings from iptables
- which don't exist - this will show up in the log files, but can be ignored
Output$ journalctl -u kube-proxy -f
...
Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-HOST": exit status 1: iptables: No chain/target/match by that name.
Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-CONTAINER": exit status 1: iptables: No chain/target/match by that name.
If you started the docker ps -a
watch
on the controller, you should notice all containers being created by the kubelet.
We can confirm the apiserver
authenticates itself with the certificate we provided and requires client authentication using curl
. As the self-signed root CA used by the cluster is not trusted by our client, we need to pass it in:
- curl -s https://$CONTROLLER_PUBLIC_IP/api/v1/namespaces --cacert ~/.kube/ca.pem -v
The -v
flag allows us to see the verbose log of communication between our client and the apiserver. As we did not present our client certificate, the server responds with unauthorized
.
Output...
* successfully set certificate verify locations:
* CAfile: /home/demo/.kube/ca.pem
CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
...
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-SHA
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=kube-apiserver
* start date: Dec 15 08:15:44 2015 GMT
* expire date: Dec 14 08:15:44 2016 GMT
* subjectAltName: 188.166.252.4 matched
* issuer: CN=kube-ca
* SSL certificate verify ok.
...
< HTTP/1.1 401 Unauthorized
< Content-Type: text/plain; charset=utf-8
< Date: Wed, 16 Dec 2015 00:05:36 GMT
< Content-Length: 13
<
Unauthorized
...
We will need to authenticate by presenting a client certificate signed by our Kubernetes root CA, we will generate an admin certificate in the Administrator set up section of this tutorial.
At this stage we can configure our client to communicate with our Kubernetes Controller, although we do not have any worker nodes and won't be able to start the workload yet, this will ensure our configuration is working so far.
Note: We may re-use the same Controller cloud-config files to spin-up a cluster of Controller Droplets with a load balancer in front of it. In this case, our apiserver certificate should have included all necessary IP addresses (such as the Load Balancer IP) for proper TLS authentication.
Step 9 — Setting Up The Kubernetes Cluster Administrator
Generate the Cluster Administrator Keypair
Every administrator, needs to have a private key, which we generate using openssl as follows:
- openssl genrsa -out ~/.kube/admin-key.pem 2048
Using his private key, the administrator needs to create a Certificate Signing Request (CSR):
- openssl req -new -key ~/.kube/admin-key.pem -out admin.csr -subj "/CN=kube-admin"
To be authorized to connect to the Kubernetes apiserver, this admin.csr
needs to be sent to and processed by the Kubernetes Cluster root CA to generate the signed admin.pem
certificate:
- openssl x509 -req -in admin.csr -CA ~/.kube/ca.pem -CAkey ~/.kube/ca-key.pem -CAcreateserial -out ~/.kube/admin.pem -days 365
From now on, the administrator can use his admin-key.pem
and signed certificate admin.pem
to connect to the Kubernetes cluster.
Test the freshly signed admin certificate by passing it in to the curl
command we used earlier:
- curl -s https://$CONTROLLER_PUBLIC_IP/api/v1/namespaces --cacert ~/.kube/ca.pem --cert ~/.kube/admin.pem --key ~/.kube/admin-key.pem
Now authenticated, this should return a json
response containing all namespaces within our cluster. You can use Jq to simplify the output:
- curl -s https://$CONTROLLER_PUBLIC_IP/api/v1/namespaces \
- --cacert ~/.kube/ca.pem --cert ~/.kube/admin.pem --key ~/.kube/admin-key.pem \
- | jq .items[].metadata.name
Instead of talking to the API directly, we will download and configure the command line tool kubectl
.
Download Kubectl
As highlighted in [Step 5 — Downloading The Kubernetes Artifacts](#) of this tutorial, the kubectl
binary can be downloaded from the Google cloud storage bucket.
For 64bit Linux clients:
- sudo curl -Lo /opt/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kubectl
- sudo chmod +x /opt/bin/kubectl
For 64bit OSX clients:
- sudo curl -Lo /usr/local/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/darwin/amd64/kubectl
- sudo chmod +x /usr/local/bin/kubectl
For 64bit Windows clients (for this tutorial, tested using git-for-windows) bash:
- curl -Lo /usr/bin/kubectl.exe https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/windows/amd64/kubectl.exe
Before we can use kubectl
, we need to understand how configuration is managed within Kubernetes. This section is also important for our worker node configuration as we will use these concepts to simplify the worker setup.
Introduction to Kubeconfig files
Several tools, Docker for example, rely on command-line flags and environment variables to configure the environment:DOCKER_HOST=tcp://192.168.99.101:2376
DOCKER_CERT_PATH=/home/demo/.docker/machines/.client
DOCKER_TLS_VERIFY=1
DOCKER_MACHINE_NAME=dev
When users have to work with multiple environments which require a different configuration however, managing several environment variables to define a single configuration becomes cumbersome, even more so when the combination of clusters and users allow for many different configurations as is the case with Kubernetes.
Docker opted to facilitate environment management by creating the docker-machine env
command. This tool generates the necessary shell commands allowing users to easily switch the server their client talks to. The commands generated by docker-machine
in turn need to support each shell (bash/fish/cmd/PowerShell/..) users may be using and ideally also auto-detect the shell in use.
For Kubernetes, kubeconfig files were created instead to store the environment definitions such as authentication and connection details as well as provide a mechanism to easily switch between multiple clusters and multiple user credentials. Kubernetes components were written to read the configuration from these config files including functionality for merging multiple configurations based on certain rules.
On one side, kubeconfig files store connection information for clusters in an associative array of name->cluster
entries. A cluster
entry consists of information such as the server
to connect to, the api-version
of the cluster and the certificate-authority
for the cluster or a flag to skip verification of the authority which signed the server certificate (insecure-skip-tls-verify
).
On the other side, kubeconfig files also store user credentials in a second associative array of name->user
entries. A user
entry defines user
authentication mechanisms which may be:
- Authentication through a client certificate,
- Basic authentication with username and password or
- Authentication through a bearer token
Decoupling users from clusters provides the ability to define cross cluster users only once. A user
entry and a cluster
entry combine to make up a context
. Several such (cluster
, user
) pairs are then defined in a third associative array of name->context
entries. Context entries also provide a namespace
field to specify the Kubernetes namespace
to be used for that context. The current-context
may be set to define the context in use.
To declare the above components, kubeconfig files are written in YAML and similar to Pod manifests start with a versioned schema definition:
apiVersion: v1
kind: Config
...
In the next step we will manually write out a kubeconfig file to fully understand these concepts. We will also be using the kubectl
tool to more easily manipulate kubeconfig files, with a series of kubectl config
subcommands. Refer to the official Kubernetes kubectl config
documentation for full details.
As mentioned, kubeconfig files also define a way multiple configurations may be merged together along with override options specified from the command line. See the loading and merging rules section of the Kubernetes documentation for a technical overview of these rules. We will only define a single kubeconfig file in this tutorial.
Configure Kubectl
As we have a theoretical understanding of what a kubeconfig if made up from, we will first manually write our default kubeconfig file (~/.kube/config
).
Define the Digital Ocean cluster we just created as do-cluster
:
apiVersion: v1
kind: Config
clusters:
- name: do-cluster
cluster:
certificate-authority: ca.pem
server: https://$CONTROLLER_PUBLIC_IP
Note: Relative paths are supported, in this tutorial we store our certificates in the same directory as our kubeconfig file (~/.kube/
), modify these values to mirror your own configuration.
Next, we define the admin
user and specify the associated TLS assets we generated for certification based authentication:
...
users:
- name: admin
user:
client-certificate: admin.pem
client-key: admin-key.pem
Followed by the definition of the context combining these two, which we will name the do-cluster-admin
context:
...
contexts:
- name: do-cluster-admin
context:
cluster: do-cluster
user: admin
As we did not specify a namespace
for our context, the default
namespace will be used.
We may set this as our current context in our kubeconfig file by adding the current-context: do-cluster-admin
setting at the end.
Using the cat
command to combine all the above snippets with a here-string for variable substitution, we write out the file as follows:
cat > ~/.kube/config<<EOF
apiVersion: v1
kind: Config
clusters:
- name: do-cluster
cluster:
certificate-authority: ca.pem
server: https://$CONTROLLER_PUBLIC_IP
users:
- name: admin
user:
client-certificate: admin.pem
client-key: admin-key.pem
contexts:
- name: do-cluster-admin
context:
cluster: do-cluster
user: admin
current-context: do-cluster-admin
EOF
If the kubeconfig file is not passed in to kubectl
through the --kubeconfig
flag, it will first look for a kubeconfig
file in the current directory as well as the $KUBECONFIG
environment variable. If none of these are set, kubectl
will use the default ~/.kube/config
file we just created.
We may also generate the above file using kubectl with the following 4 commands:
Set the "do-cluster" entry:
- kubectl config set-cluster do-cluster --server=https://$CONTROLLER_PUBLIC_IP --certificate-authority=$HOME/.kube/ca.pem
Set the "admin" user entry:
- kubectl config set-credentials admin --client-key=$HOME/.kube/admin-key.pem --client-certificate=$HOME/.kube/admin.pem
Set the "do-cluster-admin" context:
kubectl config set-context do-cluster-admin --cluster=do-cluster --user=admin
Set the current-context
:
- kubectl config use-context do-cluster-admin
Confirm your configuration was successful with the following command:
- kubectl version
If everything worked so far, this should return output similar to:
OutputClient Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.2", GitCommit:"3085895b8a70a3d985e9320a098e74f545546171", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.2", GitCommit:"3085895b8a70a3d985e9320a098e74f545546171", GitTreeState:"clean"}
We may confirm the pods running in the kube-system namespace with the following command
- kubectl get pods --namespace=kube-system
Expected output looks like:
OutputNAME READY STATUS RESTARTS AGE
kube-controller-188.166.252.4 3/3 Running 0 3h
kube-controller-manager-188.166.252.4 1/1 Running 0 3h
kube-scheduler-188.166.252.4 1/1 Running 0 3h
Indicating all 3 containers (kube-apiserver
, scheduler-elector
& controller-manager-elector
) of the kube-controller
pod as well as the kube-controller-manager
and kube-scheduler
pods are running.
We now have our Etcd data store and first controller droplet ready, but we do not have any worker nodes yet to schedule workloads on.
- kubectl get nodes
Returns an empty collection of worker nodes. We will spin up our first worker node next.
Step 10 — Provisioning The Kubernetes Worker Droplets
Configuring the workers is significantly less complicated and we can reuse many of the controllers configuration.
Ensure your DIGITALOCEAN_API_KEY
, ETCD_PEER
, CONTROLLER_PUBLIC_IP
, CONTROLLER_PRIVATE_IP
and region
environment variables are set for the next steps.
The Etcd, Flannel & Docker services
As mentioned in the Etcd configuration section, we are running an Etcd daemon in proxy mode on each droplet for Flannel to access. As we are sharing our Etcd cluster between Flannel and Kubernetes we should note that exposing your Kubernetes data back end to each node is a bad practice. For production environments we should use a separate Etcd cluster to store the Flannel meta data configuration. If Flannel was not used to configure the overlay network, Etcd access would not be needed on the worker nodes at all.
Our cloud-config section for Etcd proxy daemon configuration as we saw before looks like this:
coreos:
etcd2:
proxy: on
listen-client-urls: http://localhost:2379
initial-cluster: "etcd-01=ETCD_PEER"
units:
- name: "etcd2.service"
command: "start"
We need to ensure Flannel starts and add a Drop-in for Docker to depend on flannel. (Flannel will use localhost to get it's network configuration)
[label cloud-config-worker.yaml - flannel snippet
coreos:
units:
- name: "flanneld.service"
command: "start"
- name: "docker.service"
command: "start"
drop-ins:
- name: 40-flannel.conf
content: |
[Unit]
Requires=flanneld.service
After=flanneld.service
The kubelet service
In order to facilitate secure communication between Kubernetes components, kubeconfig
can also be used to define authentication settings for the kubelet. In this case, the kubelet and proxy are reading this configuration to communicate with the API. Refer back to the [Introduction to kubeconfig files](#) section of this tutorial for a detailed explanation of the kubeconfig specification.
Very similar to our previous kubeconfig file, we define a single cluster, in this case called "local" with a certificate-authority
path. and a single user called "kubelet" with certificate based authentication through a worker private key and signed certificate. The combination of this user
and cluster
is defined as the "kubelet-context" and set as the current-context
.
-
- apiVersion: v1
- kind: Config
- clusters:
- - name: local
- cluster:
- certificate-authority: /etc/kubernetes/ssl/ca.pem
- users:
- - name: kubelet
- user:
- client-certificate: /etc/kubernetes/ssl/worker.pem
- client-key: /etc/kubernetes/ssl/worker-key.pem
- contexts:
- - name: kubelet-context
- context:
- cluster: local
- user: kubelet
- current-context: kubelet-context
Our kubelet parameters are
kubelet \
--api-servers=https://CONTROLLER_PUBLIC_IP \
--register-node=true \
--allow-privileged=true \
--config=/etc/kubernetes/manifests \
--hostname-override=$public_ipv4 \
--cluster-dns=10.3.0.10 \
--cluster-domain=cluster.local \
--kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml
The --api-servers
flag points to the https protocol to use port 443 and we use a placeholder for our CONTROLLER_PUBLIC_IP
to make a generic cloud-config file which we can re-use for creating multiple clusters. We pass in the kubeconfig we described above with the --kubeconfig
flag. Our worker nodes are registered to receive work by specifying the --register-node=true
flag. We still configure our Kubelet to monitor a local directory for Pod manifests, although we will not be using this at this point. The remaining parameters are identical to the controller Droplets configuration.
Similar to our controller setup, we define a Systemd unit to wait for the Worker TLS assets to be in place and require the kube-kubelet
service to depend on this unit.
coreos:
units:
- name: "tls-ready.service"
command: "start"
content: |
[Unit]
Description=Ensure TLS assets are ready
Requires=kube-kubelet.service
After=kube-kubelet.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-/usr/bin/mkdir -p /etc/kubernetes/ssl
ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{worker,worker-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
Add the tls-ready.service
dependency to the kube-kubelet
and kube-proxy
services
coreos:
units:
- name: "kube-kubelet.service"
command: "start"
content: |
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
Requires=docker.service tls-ready.service
After=docker.service tls-ready.service
[Service]
ExecStartPre=-/bin/bash -c "mkdir -p /etc/kubernetes/{manifests,ssl}"
ExecStart=/usr/bin/kubelet \
--api-servers=https://CONTROLLER_PUBLIC_IP \
--register-node=true \
--allow-privileged=true \
--config=/etc/kubernetes/manifests \
--hostname-override=$public_ipv4 \
--cluster-dns=10.3.0.10 \
--cluster-domain=cluster.local \
--kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml
Restart=always
RestartSec=10
The kube-proxy Service
coreos:
units:
- name: "kube-proxy.service"
command: "start"
content: |
[Unit]
Description=Kubernetes Proxy
Documentation=https://github.com/kubernetes/kubernetes
Requires=network-online.target tls-ready.service
After=network-online.target tls-ready.service
[Service]
ExecStartPre=-/usr/bin/mkdir -p /opt/bin
ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
ExecStart=/opt/bin/kube-proxy \
--master=https://CONTROLLER_PUBLIC_IP \
--kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
--proxy-mode=iptables \
--hostname-override=$public_ipv4
Restart=always
RestartSec=10
The Final Worker cloud-config with all CoreOS Units
etcd2.service
snippet to start a local Etcd proxy, notice theETCD_PEER
placeholder.flanneld.service
snippet to start the overlay network daemon with a drop-in to configure the network subnetdocker.service
drop-in snippet to add flannel dependencytls-ready.service
to block other units until the TLS assets for the worker have been put in placekubelet.service
snippet running the kubelet to register with our controller nodeskube-proxy.service
snippet running thekube-proxy
service
-
- #cloud-config
-
- write-files:
- - path: /etc/kubernetes/worker-kubeconfig.yaml
- permissions: '0644'
- content: |
- apiVersion: v1
- kind: Config
- clusters:
- - name: local
- cluster:
- certificate-authority: /etc/kubernetes/ssl/ca.pem
- users:
- - name: kubelet
- user:
- client-certificate: /etc/kubernetes/ssl/worker.pem
- client-key: /etc/kubernetes/ssl/worker-key.pem
- contexts:
- - name: kubelet-context
- context:
- cluster: local
- user: kubelet
- current-context: kubelet-context
- coreos:
- etcd2:
- proxy: on
- listen-client-urls: http://localhost:2379
- initial-cluster: "etcd-01=ETCD_PEER"
- units:
- - name: "etcd2.service"
- command: "start"
- - name: "flanneld.service"
- command: "start"
- - name: "docker.service"
- command: "start"
- drop-ins:
- - name: 40-flannel.conf
- content: |
- [Unit]
- Requires=flanneld.service
- After=flanneld.service
- - name: "tls-ready.service"
- command: "start"
- content: |
- [Unit]
- Description=Ensure TLS assets are ready
- Requires=docker.service
- After=docker.service
- [Service]
- Type=oneshot
- RemainAfterExit=yes
- ExecStartPre=-/usr/bin/mkdir -p /etc/kubernetes/ssl
- ExecStart=/bin/bash -c "until [ `ls -1 /etc/kubernetes/ssl/{worker,worker-key,ca}.pem 2>/dev/null | wc -l` -eq 3 ]; do echo \"waiting for TLS assets...\";sleep 5; done"
- - name: "kube-proxy.service"
- command: "start"
- content: |
- [Unit]
- Description=Kubernetes Proxy
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=network-online.target tls-ready.service
- After=network-online.target tls-ready.service
- [Service]
- ExecStartPre=-/usr/bin/mkdir -p /opt/bin
- ExecStartPre=/usr/bin/curl -sLo /opt/bin/kube-proxy -z /opt/bin/kube-proxy https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/amd64/kube-proxy
- ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-proxy
- ExecStart=/opt/bin/kube-proxy \
- --master=https://CONTROLLER_PUBLIC_IP \
- --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml \
- --proxy-mode=iptables \
- --hostname-override=$public_ipv4
- Restart=always
- RestartSec=10
- - name: "kube-kubelet.service"
- command: "start"
- content: |
- [Unit]
- Description=Kubernetes Kubelet
- Documentation=https://github.com/kubernetes/kubernetes
- Requires=docker.service tls-ready.service
- After=docker.service tls-ready.service
- [Service]
- ExecStart=/usr/bin/kubelet \
- --api-servers=https://CONTROLLER_PUBLIC_IP \
- --register-node=true \
- --hostname-override=$public_ipv4 \
- --cluster-dns=10.3.0.10 \
- --cluster-domain=cluster.local \
- --kubeconfig=/etc/kubernetes/worker-kubeconfig.yaml
- Restart=always
- RestartSec=10
Use the above template to generate the worker node cloud config for this Digital Ocean cluster:
- sed -e "s|ETCD_PEER|${ETCD_PEER}|g;s|CONTROLLER_PUBLIC_IP|${CONTROLLER_PUBLIC_IP}|g;" cloud-config-worker.yaml > kube-worker.yaml
And send the command to create the Droplet
- doctl d c --wait-for-active \
- -i "CoreOS-alpha" \
- -s 512mb \
- -r "$region" \
- -p \
- -k k8s-key \
- -uf kube-worker.yaml kube-worker-01
Note: running free -m
after freshly starting all Kubernetes services on a 512mb
Worker Droplet shows 129mb free, consider using 1024mb Droplets
We refresh the Droplet configuration and cache the json string returned in the $WORKER_JSON environment variable for subsequent commands.
- WORKER_JSON=`doctl -f 'json' d f kube-worker-01.$region`
-
We parse the private and public IPs out as explained in the [Working with doctl responses](#) section of this tutorial.
- WORKER_PUBLIC_IP=`echo $WORKER_JSON | jq -r '.networks.v4[] | select(.type == "public") | .ip_address'`
- WORKER_PRIVATE_IP=`echo $WORKER_JSON | jq -r '.networks.v4[] | select(.type == "private") | .ip_address'`
Confirm
- echo $WORKER_PUBLIC_IP && echo $WORKER_PRIVATE_IP
Troubleshooting:
- ssh core@$WORKER_PUBLIC_IP
- journalctl -u oem-cloudinit -f
- watch -n 1 'docker ps --format="table {{.Image}}\t{{.ID}}\t{{.Status}}\t{{.Ports}}" -a'
Generating and Transferring the Worker TLS assets
As it is recommended to generate a unique certificate per worker, we will do so and transfer it to our worker droplet now.
The IP addresses and fully qualified hostnames of all worker nodes will be needed. The certificates generated for the worker nodes will need to reflect how requests will be routed to those nodes. In most cases this will be a routable IP and/or a routable hostname. These will be unique per worker; when you see them used below, consider it a loop and do that step for each worker.
This procedure generates a unique TLS certificate for every Kubernetes worker node in your cluster. While unique certificates are less convenient to generate and deploy, they do provide stronger security assurances and the most portable installation experience across multiple cloud-based and on-premises Kubernetes deployments.
We will use a common openssl configuration file for all workers. The certificate output will be customized per worker based on environment variables used in conjunction with the configuration file. Create the file worker-openssl.cnf on your local machine with the following contents.
-
- [req]
- req_extensions = v3_req
- distinguished_name = req_distinguished_name
- [req_distinguished_name]
- [ v3_req ]
- basicConstraints = CA:FALSE
- keyUsage = nonRepudiation, digitalSignature, keyEncipherment
- subjectAltName = @alt_names
- [alt_names]
- IP.1 = $ENV::WORKER_IP
Generate the private key for our first Worker Droplet
- openssl genrsa -out ~/.kube/worker-01-key.pem 2048
Generate the Certificate Signing Request, substituting the WORKER_IP environment variable:
- WORKER_IP=${WORKER_PRIVATE_IP} openssl req -new -key ~/.kube/worker-01-key.pem -out worker-01.csr -subj "/CN=kube-worker-01" -config worker-openssl.cnf
Generate the worker certificate
- WORKER_IP=${WORKER_PRIVATE_IP} openssl x509 -req -in worker-01.csr \
- -CA "$HOME/.kube/ca.pem" \
- -CAkey "$HOME/.kube/ca-key.pem" \
- -CAcreateserial \
- -out "$HOME/.kube/worker-01.pem" \
- -days 365 \
- -extensions v3_req \
- -extfile worker-openssl.cnf
Note: the above command does not work on git-for-windows
due to windows path conversions, it is recommended to copy the worker-01.csr
and worker-openssl.cnf
to ~/.kube/
and just run the command from within the ~/.kube/
directory (without the "$HOME/.kube/"
parts)
Copy the necessary certificates to the controller node. We store the files in the home directory first.
- scp ~/.kube/worker-01-key.pem ~/.kube/worker-01.pem ~/.kube/ca.pem core@$WORKER_PUBLIC_IP:~
Move the certificates from the Home directory to the /etc/kubernetes/ssl path, fix the permissions and create links to match our generic kubeconfig (which expects /etc/kubernetes/worker-key.pem
instead of /etc/kubernetes/worker-01-key.pem
) by executing the following commands over ssh:
- ssh core@$WORKER_PUBLIC_IP <<EOF
- sudo mkdir -p /etc/kubernetes/ssl/
- sudo mv ~core/*.pem /etc/kubernetes/ssl/
- sudo chown root:root /etc/kubernetes/ssl/*.pem
- sudo chmod 600 /etc/kubernetes/ssl/*-key.pem
- sudo ln -s /etc/kubernetes/ssl/worker-01.pem /etc/kubernetes/ssl/worker.pem
- sudo ln -s /etc/kubernetes/ssl/worker-01-key.pem /etc/kubernetes/ssl/worker-key.pem
- EOF
As soon as the certificates are available it will take just a few minutes for the kubelet and kube-proxy to start running on the worker and register with the Controller.
We can verify by watching the kubectl get nodes
:
- kubectl get nodes
Which should show output as follows:
OutputNAME LABELS STATUS AGE
128.199.203.205 kubernetes.io/hostname=128.199.203.205 Ready 9m
We may repeat the above steps to create additional Worker droplets with their own TLS assets. We now have a working Kubernetes cluster, ready to start running our containerized applications. To facilitate the application deployment however, we are recommended to run a few cluster services and will proceed to do so in the next step.
Step 11 — Running Kubernetes Cluster Services
Several cluster services are provided as cluster add-ons (UI/Dashboard, Image Registry, DNS, ...). Deploying these add-ons is optional, but availability of some of these services is often expected by Kubernetes users. A full listing of all supported add-ons can be found within the Kubernetes GitHub repository at kubernetes/cluster/addons/.
Add-ons are built on the same Kubernetes components as user-submitted jobs — Pods, Replication Controllers and Services, however, cluster add-ons are expected to specify the label: kubernetes.io/cluster-service: "true"
.
One such cluster add-on facilitates the discovery of services running within Kubernetes, we will first define the problem and the options Kubernetes provides to solve this problem.
When Pods depend on each other (for example: front end services may depend on back end services), mechanisms need to be in place to enable service discovery. Within Kubernetes, Pods are short lived objects and their IPs change over time due to crashes or scheduling changes. Because of this, addressing Pods directly has now become difficult, thus Kubernetes introduced the concept of Service objects to address this problem. Service objects are long lived objects which get a static Virtual IP within the cluster, usually referred to as their clusterIP
, to address sets of Pods internally or externally to the cluster. This clusterIP
is stable as long as the Service object exists. Kubernetes sets up a load balancer forwarding traffic through this clusterIP
to the Service EndPoints, unless you explicitly disable the load balancer (by setting clusterIP
to none
) and expect to work with a list of the Service EndPoints directly. Such Services without a clusterIP
are called Headless. Service objects may also be created for services running outside of the Kubernetes cluster (by omitting the Pod selector) as long as you manually create the EndPoint definitions for these external services. Full details on how to do this are available within the official Kubernetes documentation.
Once Service objects have been defined, Kubernetes provides 2 ways of finding them:
- Through Environment variables, or
- Using DNS
Upon Pod creation, the kubelet adds a set of environment variables for each active Service within the same namespace, similar to how Docker links worked. These environment variables enforce an ordering requirement as any Service that a Pod wants to access must be created before the Pod itself and may require applications to be modified before they can run within Kubernetes. If we use DNS to discover services, we do not have these restrictions, but we are required to deploy the DNS cluster add-on.
As part of this tutorial we ensure our Kubernetes cluster is integrated with DNS for Service discovery by deploying the DNS add-on with kubectl
.
DNS Integration with Kubernetes
When enabled, the DNS add-on for Kubernetes will assign a DNS name for every Service object defined in the Kubernetes cluster.
At the time of writing, the DNS protocol implementation for the DNS add-on is provided by SkyDNS. SkyDNS is configured as a slave to the Kubernetes API Server with custom logic implemented in a bridge component called Kube2sky. SkyDNS itself is only a thin layer over Etcd to translate Etcd keys and values to the DNS protocol. In this way, SkyDNS can be as highly available and stable as the underlying Etcd cluster. We will have a closer look at how each of these 3 components work together and how we will deploy them as a single Pod.
<!--
Services announce their availability by sending a POST with a small JSON payload, Each service has a Time To Live that allows SkyDNS to expire the records for the services that haven't update their availability within the TTL window. Services can send a periodic POST to SkyDNS updating their TTL to keep them in the pool.
-->
We will create a Replication Controller to run the DNS Pod and a Service to expose its ports.
Our Replication Controller manifest starts, as we saw earlier, with the schema definition and a metadata
section:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v9
namespace: kube-system
...
<!-- use labels effectively: http://kubernetes.io/v1.1/docs/user-guide/managing-deployments.html#using-labels-effectively -->
A Replication Controller can be thought of as a process supervisor, but which supervises multiple Pods across multiple nodes instead of individual processes on a single node. The Replication Controller creates Pods from a template and uses labels and selectors to monitor the actual running Pods. The selector finds Pods within the cluster by label, the labels we'll use for this Replication Controller are the k8s-app
and version
labels. We specify these labels together with the kubernetes.io/cluster-service: "true"
label required for cluster add-ons in the PodTemplateSpec as well as attach them to our Replication Controller itself.
At the time of writing we are using version 9 and refer to the DNS add-on as the kube-dns
app. By default Replication Controllers will run 1 replica, but we explicitly set the replicas
field to 1 for clarity in our spec
. This looks as follows in our manifest file:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v9
namespace: kube-system
labels:
k8s-app: kube-dns
version: v9
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v9
template:
metadata:
labels:
k8s-app: kube-dns
version: v9
kubernetes.io/cluster-service: "true"
spec:
volumes:
...
We added the labels to the Replication Controller object in its metadata
field. In the ReplicationControllerSpec we set the labels for the Pod template.metadata
and also set the replicas
and selector
values. Let's look at the volumes and containers defined in the PodTemplateSpec next.
We will only define a volume for Etcd. Giving Etcd a volume, outside of the union filesystem used by container runtimes such as Docker, will ensure optimal performance by reducing filesystem overhead. As the data is just a scratch space and it's fine to lose the data when the Pod is rescheduled on a different Node, it is sufficient to use an EmptyDir-type volume:
...
volumes:
- name: etcd-storage
emptyDir: {}
...
Let's look at the actual definitions of the containers in the Pod template. We see a container for each of the 3 components described earlier as well as an ExecHealthz sidecar container:
- Etcd - the storage for SkyDNS
- Kube2sky - the glue between SkyDNS and Kubernetes
- SkyDNS - the DNS server
- ExecHealthz - sidecar container for health monitoring, see details below.
The Etcd instance used by the DNS add-on is best ran separately from the Etcd cluster used by the Kubernetes API Services. For simplicity we run Etcd within the same Pod as our SkyDNS and Kube2sky components. This is sufficient considering the DNS add-on only requires a small subset of everything Etcd has to offer.
For the Etcd container we will use the busybox-based image available on the Google container registry, refer to the kubernetes/cluster/images/etcd repository on GitHub to see the full details of how that image is made.
...
- name: etcd
image: gcr.io/google_containers/etcd:2.0.9
resources:
limits:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
...
We run this container with the etcd-storage
volume mounted and used as the Etcd data-dir
. We configure Etcd to listen on localhost for connections on both the IANA-assigned 2379
port as well as the legacy 4001
port, this is required for Kube2sky and SkyDNS which still connect to port 4001
by default.
This spec also applies resource limits which define an upper bound on the maximum amount of resources that will be made available to this container. Resource limits are crucial to enable the scheduling components within Kubernetes to be effective. Without a definition of the required resources, schedulers can do little more than round robin assignments. The CPU resource is defined in Compute Units per second (KCU) and in this case the unit is milli-KCUs, where 1 KCU will roughly be equivalent to a single CPU hyperthreaded core for some recent x86 processor. The memory resource is defined in bytes. For a full overview of Resource management within Kubernetes, refer to the official Kubernetes resource guidelines.
The Kube2sky container uses another Busybox based-image made available on the Google Container Registry, refer to the kubernetes/cluster/addons/dns/kube2sky repository on GitHub to see the source for that image.
...
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.11
resources:
limits:
cpu: 100m
memory: 50Mi
args:
- -domain=cluster.local
...
The Kube2sky Docker image has the Entrypoint
set to /kube2sky
, thus we only need to pass on the -domain
under which we want all DNS names to be hosted through the args
array. This should match our kubelet configuration which we set to cluster.local
in this tutorial, modify this value to mirror your own configuration.
Kube2sky discovers and authenticates with the Kubernetes API Service through environment variables provisioned and secrets mounted by the kubelet into the container, we will have a closer look at these in [Step 12 — Running Kubernetes Cluster Services](#). Once authenticated and connected, Kube2sky watches the Kubernetes API Service for changes in Service objects and publishes those changes to Etcd for SkyDNS. SkyDNS supports A and AAAA records to handle "legacy" services. With A/AAAA records the port number must be known by the client connection because that information is not in the returned records. Given we defined our cluster domain as cluster.local
, the keys created by Kube2sky and served by SkyDNS will have the following DNS naming scheme:
<service_name>.<namespace_name>.svc.cluster.local
For example: for a Service called "my-service" in the "default" namespace, an A record for my-service.default.svc.cluster.local
is created. Other Pods within the same default
namespace should be able to find the service simply by doing a name lookup for my-service
, Pods which exist in other namespaces must use the fully qualified name.
For Service objects which define named ports, Kube2sky ensures SRV records are created with the following naming scheme:
_<port_name>._<port_protocol>.<service_name>.<namespace_name>.svc.cluster.local
For example, If the Service called "my-service" in the "default" namespace has a port named "http" with a protocol of TCP, you can do a DNS SRV query for "http.tcp.my-service.default.svc.cluster.local" to discover the port number for "http".
We will confirm the above DNS records are served correctly after we have deployed the DNS add-on to our cluster.
The skynetservices/skydns image based on Alpine Linux is available on the Docker Hub at about 19MB and comes with dig
. The official kubernetes/cluster/addons/dns/skydns add-on uses a busybox based image at about 41MB without dig
. The discussion as to which image should be used in the long run can be followed on GitHub. In this tutorial we opt to use the skynetservices/skydns
image as the version tags are slightly more intuitive:
...
- name: skydns
image: skynetservices/skydns:2.5.3a
resources:
limits:
cpu: 100m
memory: 50Mi
args:
- -machines=http://localhost:4001
- -addr=0.0.0.0:53
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
...
The EntryPoint for both images is /skydns/
, thus we only need to pass in 3 arguments. We point SkyDNS to the Etcd instance running within the Pod through the -machines
flag. We define the address we want SkyDNS to bind to through the -addr
flag and we specify the domain we want SkyDNS to serve records within through the -domain
flag. We also expose the port SkyDNS is bound to on the Pod through named ports for both TCP and UDP protocols.
To monitor the health of the container with liveness probes, we run a health server as a sidecar container using the ExecHealthz utility. By running a sidecar container, we do not make these liveness probes dependent on the container runtime to execute commands directly in the SkyDNS container (which also requires those binaries to be available within the container image). Instead our sidecar container will provide the /healthz
http endpoint, this usage of a sidecar container illustrates very well the concept of creating single purpose and re-usable components and the power of Pods to bundle them. This is one of the fundamental features of Kubernetes Pods and you may reuse these Kubernetes components for your own application setup.
The ExecHealthz image available on the Google Container Registry uses Busybox as a base image. We use the nslookup
utility bundled with Busybox for liveness probes as dig
is not available in this image.
Add the ExecHealthz container with the following container spec:
...
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
limits:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
...
Our health check does a simple probe for the Kubernetes API service which, as discussed above, SkyDNS should serve under the kubernetes.default.svc.cluster.local
DNS record.
We can now add the liveness and readiness probes via this sidecar health server to report on the health status of our SkyDNS container:
- name: skydns
image: skynetservices/skydns:2.5.3a
resources:
limits:
cpu: 100m
memory: 50Mi
args:
- -machines=http://localhost:4001
- -addr=0.0.0.0:53
- -domain=cluster.local.
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
<^>livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 1
timeoutSeconds: 5<^>
The full manifest of the Replication Controller for the kube-dns-v9
add-on will be listed next for your reference, we will look at the manifest for the Service right after.
-
- apiVersion: v1
- kind: ReplicationController
- metadata:
- name: kube-dns-v9
- namespace: kube-system
- labels:
- k8s-app: kube-dns
- version: v9
- kubernetes.io/cluster-service: "true"
- spec:
- replicas: 1
- selector:
- k8s-app: kube-dns
- version: v9
- template:
- metadata:
- labels:
- k8s-app: kube-dns
- version: v9
- kubernetes.io/cluster-service: "true"
- spec:
- containers:
- - name: etcd
- image: gcr.io/google_containers/etcd:2.0.9
- resources:
- limits:
- cpu: 100m
- memory: 50Mi
- command:
- - /usr/local/bin/etcd
- - -data-dir
- - /var/etcd/data
- - -listen-client-urls
- - http://127.0.0.1:2379,http://127.0.0.1:4001
- - -advertise-client-urls
- - http://127.0.0.1:2379,http://127.0.0.1:4001
- - -initial-cluster-token
- - skydns-etcd
- volumeMounts:
- - name: etcd-storage
- mountPath: /var/etcd/data
- - name: kube2sky
- image: gcr.io/google_containers/kube2sky:1.11
- resources:
- limits:
- cpu: 100m
- memory: 50Mi
- args:
- - -domain=cluster.local
- - name: skydns
- image: skynetservices/skydns:2.5.3a
- resources:
- limits:
- cpu: 100m
- memory: 50Mi
- args:
- - -machines=http://localhost:4001
- - -addr=0.0.0.0:53
- - -domain=cluster.local.
- ports:
- - containerPort: 53
- name: dns
- protocol: UDP
- - containerPort: 53
- name: dns-tcp
- protocol: TCP
- livenessProbe:
- httpGet:
- path: /healthz
- port: 8080
- scheme: HTTP
- initialDelaySeconds: 30
- timeoutSeconds: 5
- readinessProbe:
- httpGet:
- path: /healthz
- port: 8080
- scheme: HTTP
- initialDelaySeconds: 1
- timeoutSeconds: 5
- - name: healthz
- image: gcr.io/google_containers/exechealthz:1.0
- resources:
- limits:
- cpu: 10m
- memory: 20Mi
- args:
- - -cmd=nslookup kubernetes.default.svc.cluster.local localhost >/dev/null
- - -port=8080
- ports:
- - containerPort: 8080
- protocol: TCP
- volumes:
- - name: etcd-storage
- emptyDir: {}
- dnsPolicy: Default
The kube-dns Service will expose the DNS Pod internally to the cluster on the fixed IP we assigned for our DNS server, this clusterIP
has to match the value we passed to all our kubelets previously, which is 10.3.0.10
in this tutorial. Modify this value to mirror your own configuration. The full Service definition is listed below:
-
- apiVersion: v1
- kind: Service
- metadata:
- name: kube-dns
- namespace: kube-system
- labels:
- k8s-app: kube-dns
- kubernetes.io/cluster-service: "true"
- kubernetes.io/name: "KubeDNS"
- spec:
- selector:
- k8s-app: kube-dns
- clusterIP: 10.3.0.10
- ports:
- - name: dns
- port: 53
- protocol: UDP
- - name: dns-tcp
- port: 53
- protocol: TCP
-
In our Service object metadata we attach the same k8s-app
label as our Pods and Replication Controller as well as the necessary labels for Kubernetes add-on services. In our ServiceSpec our selector, used to route traffic to Pods with matching labels, only specifies the k8s-app
label. This does not specify the version, allowing us to do rolling updates of our DNS add-on in the future, see the Rolling Update Example for more details. Finally, we also define named ports for the DNS service on both TCP and UDP protocols. We will later confirm SRV records exist for these named ports of the kube-dns
service in the kube-system
namespace itself.
Note: Multiple yaml documents can be concatenated with the ---
separator. We may simplify management of multiple resources by grouping them together in the same file separated by ---
, we may just specify multiple resources through multiple -f
arguments for the kubectl create
command. See the official Managing Deployments Guide
The resources will be created in the order they appear in the file. Therefore, it's best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the replication controller(s).
use kubectl with multiple -f
arguments:
- kubectl create -f ./skydns-svc.yaml -f ./skydns-rc.yaml
And wait for the DNS add-on to start running
- kubectl get pods --namespace=kube-system | grep kube-dns-v9
Create a Busybox pod to test the DNS from within the cluster using the following Pod manifest:
-
- apiVersion: v1
- kind: Pod
- metadata:
- name: busybox
- namespace: default
- spec:
- containers:
- - image: busybox
- command:
- - sleep
- - "3600"
- imagePullPolicy: IfNotPresent
- name: busybox
- restartPolicy: Always
This busybox will sleep for 1 hour before exiting and being restarted by the kubelet, we will use it to test nslookup
commands from within the cluster. Create the Pod:
- kubectl create -f busybox.yaml
Although it seems we are only creating a Pod, Kubernetes will create a Replication Controller to manage this Pod for us. After a few seconds, confirm the Pod is running:
- kubectl get pods busybox
When the Pod is running, output will look as follows:
OutputNAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 14s
Do a DNS lookup from within the busybox Pod on your client's terminal with the following command:
kubectl exec busybox -- nslookup kubernetes.default
The expected output should look as follows:
OutputServer: 10.3.0.10
Address 1: 10.3.0.10
Name: kubernetes.default
Address 1: 10.3.0.1
If you are using the skynetservices/skydns:5.2.3a
image, you may use the dig
binary within to confirm the SRV records for the named ports are served as expected (the nslookup utility bundled in the busybox Pod does not support SRV queries).
To do this, get the name of the kube-dns
Pod created by the kube-dns
Replication Controller (Pod names are dynamic and change when they are restarted):
dns_pod=`kubectl --namespace=kube-system get po | grep kube-dns | awk '{ print $1}'`
Open an interactive shell into the SkyDNS container of your kube-dns pod:
- kubectl --namespace=kube-system exec $dns_pod -c skydns -it sh
We specify the container we want to execute commands in through the -c
option of the kubectl exec
command.
Use dig
to query the SRV record for the port named dns using the UDP protocol and sed
to only print the response from the ANSWER SECTION
to the Query Time
lines:
<!-- use dig SRV my.url.test +nostats +nocomments +nocmd instead! -->
- dig @localhost SRV _dns._udp.kube-dns.kube-system.svc.cluster.local | sed -n '/ANSWER SECTION:/,/Query time/ p'
We are using sed
with the -n
option to suppress all output, we specify a range of regular expression patterns (/ANSWER SECTION/,/Query time/
) and instruct sed
to print only lines within this range with the p
command
The expected output should look as follows:
Output;; ANSWER SECTION:
_dns._udp.kube-dns.kube-system.svc.cluster.local. 30 IN SRV 10 100 53 kube-dns.kube-system.svc.cluster.local.
;; ADDITIONAL SECTION:
kube-dns.kube-system.svc.cluster.local. 30 IN A 10.3.0.10
;; Query time: 3 msec
As you can see, using the SRV records created by the kube-dns add-on, we are able to get the port as well as the IP.
Refer also to the Official DNS Integration documentation and the DNS Add-on repository.
Step 12 — Deploying Kubernetes-ready applications
You should now have a Kubernetes cluster set up and be able to deploy Kubernetes-ready applications.
To better understand the inner workings of Kubernetes from a Pod perspective, we may use the Kubernetes Pod Inspector application by Kelsey Hightower. With the below yaml file combining both the Replication Controller and Service, you can quickly deploy and expose this application on your cluster:
-
- apiVersion: v1
- kind: Service
- metadata:
- name: inspector
- labels:
- app: inspector
- spec:
- type: NodePort
- selector:
- app: inspector
- ports:
- - name: http
- nodePort: 31000
- port: 80
- protocol: TCP
-
- ---
-
- apiVersion: v1
- kind: ReplicationController
- metadata:
- name: inspector-stable
- labels:
- app: inspector
- track: stable
- spec:
- replicas: 1
- selector:
- app: inspector
- track: stable
- template:
- metadata:
- labels:
- app: inspector
- track: stable
- spec:
- containers:
- - name: inspector
- image: b.gcr.io/kuar/inspector:1.0.0
-
As seen previously, we provide our Replication Controller with the necessary labels and use the b.gcr.io/kuar/inspector:1.0.0
image. Note that we are exposing the inspector application by telling Kubernetes to open port 31000
on every worker node (this will work if you ran the API service with --service-node-port-range=30000-37000
as shown in Step 6 of this Tutorial).
Expected Output:
OutputYou have exposed your service on an external port on all nodes in your
cluster. If you want to expose this service to the external internet, you may
need to set up firewall rules for the service port(s) (tcp:31000) to serve traffic.
See http://releases.k8s.io/release-1.1/docs/user-guide/services-firewalls.md for more details.
<^>service "inspector" created
replicationcontroller "inspector-stable" created<^>
We can now point our web browser to http://$WORKER_PUBLIC_IP:31000/env
on any worker node to access the Inspector Pod and view all environment variables published by the kubelet as well as access http://$WORKER_PUBLIC_IP:31000/mnt?path=/var/run/secrets/kubernetes.io/serviceaccount
to see the secrets mounted into the Pods. To see how Kubernetes-ready applications can use these, refer to the InClusterConfig function of the client Kubernetes client helper library and the KubeClient Setup) section of Kube2Sky as an example implementation.
You may now proceed to set up a multi-tier web application (Guestbook) from the official Kubernetes documentation to visualize how the various Kubernetes components fit together.
See the Guestbook Example app from the Official Kubernetes documentation.
Conclusion
Following this Tutorial you have created a fully functional Kubernetes cluster. This gives you a great management and scheduling interface for working with services in logical groupings. As you have used many of the Kubernetes concepts to set up the cluster itself, you have a deep understanding of many of the core concepts and deployment workflow of Kubernetes. To review all the Kubernetes concepts, refer to the official Kubernetes Concept Guide.
You probably noticed that the steps above were still very manual, but the Cloud-config files created are flexible enough for you to automate the process.
Deleting your Kubernetes Cluster
<!-- probably won't include this section in the final tutorial...? -->
If you decide you do no longer want to run this cluster (or want to start from scratch), below are the commands to do so:
NOTE These commands destroy your cluster and all the data contained within without any backups, these commands are irreversible.
Repeat for every controller
- doctl d d kube-controller-01.$region
Repeat for every worker
- doctl d d kube-worker-01.$region
Delete all Kubernetes data
- doctl d d etcd-01.$region
Delete the apiserver & worker certificates as they are tied to the IPs of the Droplets, but keep the Admin and CA certificates.
- rm ~/.kube/apiserver*.{pem,csr}
- rm ~/.kube/worker*.{pem,csr}
- rm *.srl
- rm *openssl.cnf