Recommended Cluster Architecture
There are three roles that can be assigned to nodes: etcd, controlplane and worker.
Separating Worker Nodes from Nodes with Other Roles
When designing your cluster(s), you have two options:
- Use dedicated nodes for each role. This ensures resource availability for the components needed for the specified role. It also strictly isolates network traffic between each of the roles according to the port requirements.
- Assign the etcdandcontrolplaneroles to the same nodes. These nodes must meet the hardware requirements for both roles.
In either case, the worker role should not be used or added to nodes with the etcd or controlplane role.
Therefore, each node should have one of the following role configurations:
- etcd
- controlplane
- Both etcdandcontrolplane
- worker
Recommended Number of Nodes with Each Role
The cluster should have:
- At least three nodes with the role etcdto survive losing one node. Increase this count for higher node fault toleration, and spread them across (availability) zones to provide even better fault tolerance.
- At least two nodes with the role controlplanefor master component high availability.
- At least two nodes with the role workerfor workload rescheduling upon node failure.
For more information on what each role is used for, refer to the section on roles for nodes in Kubernetes.
Number of Controlplane Nodes
Adding more than one node with the controlplane role makes every master component highly available.
Number of etcd Nodes
The number of nodes that you can lose at once while maintaining cluster availability is determined by the number of nodes assigned the etcd role. For a cluster with n members, the minimum is (n/2)+1. Therefore, we recommend creating an  etcd node in 3 different availability zones within a region to survive the loss of one availability zone. If you use only two zones, you can only survive the loss of the zone where you don't lose the majority of nodes.
| Nodes with etcdrole | Majority | Failure Tolerance | 
|---|---|---|
| 1 | 1 | 0 | 
| 2 | 2 | 0 | 
| 3 | 2 | 1 | 
| 4 | 3 | 1 | 
| 5 | 3 | 2 | 
| 6 | 4 | 2 | 
| 7 | 4 | 3 | 
| 8 | 5 | 3 | 
| 9 | 5 | 4 | 
References:
- Official etcd documentation on optimal etcd cluster size
- Official Kubernetes documentation on operating etcd clusters for Kubernetes
Number of Worker Nodes
Adding more than one node with the worker role will make sure your workloads can be rescheduled if a node fails.
Why Production Requirements are Different for the Rancher Cluster and the Clusters Running Your Applications
You may have noticed that our Kubernetes Install instructions do not meet our definition of a production-ready cluster, as there are no dedicated nodes for the worker role. However, for your Rancher installation, this three node cluster is valid, because:
- It allows one etcdnode failure.
- It maintains multiple instances of the master components by having multiple controlplanenodes.
- No other workloads than Rancher itself should be created on this cluster.