Kubernetes: All You Need to Know!!!
Kubernetes is growing, kubernetes is gaining momentum and is de-facto orchestrator in cluster environment. There is lot of traction around using kubernetes for management in premise clusters as well as public clouds. This blog talks about the new jargons in kubernetes.
- There are 3 basic services exposed by kubernetes for accessing pods viz: Cluster IP, NodePort and Loadbalancer.
- Cluster IP is used for internal pod to pod communication. In NodePort- node IP is exposed for external connectivity whereas in LoadBalancer- an external IP is exposed and it balances traffic load between different pods. Undoubtedly, LoadBalancer is most widely service used to access the applications residing in pods.
Problems faced in existing implementations:
- A new LoadBalancer needs to be implemented for every service.
- If we are running kubernetes on bare metal and we have an external LoadBalancer, then there is no way to integrate them with kubernetes.
- The services like SSL termination cannot be run on each pod and hence if we want pods to expose HTTPS like services, then it is not possible.
Ingress provides solution to all the problems listed above.
- Ingress acts a Reverse Proxy for the services exposed by pods.
- Ingress is a special type of controller, deployed within cluster.
It is a daemon deployed as Kubernetes pod, which watches the APIserver’s ingress endpoint for updates to the ingress resource. Ingress is an object that defines rules which can handle traffic coming into the Kubernetes cluster. Common uses for Ingress include: sharing port 80 or 443 among many different services, doing HTTP host or path based routing to different services, and terminating SSL.
An Ingress Controller is a pod which manages and deploys the rules to a set of pods which handle the traffic. The Ingress can be implemented by different Ingress Controllers, the most popular of which is the nginx IngressController. Other Ingress controllers are also popular like HAproxy, Traefik etc.
The most basic type of load balancing in Kubernetes is actually load distribution, which is easy to implement at the dispatch level. Kubernetes uses two methods of load distribution, both of them uses a feature called kube-proxy, which manages the virtual IPs used by services.
1) The default mode for kube-proxy is called iptables, which allows fairly sophisticated rule-based IP management. The native method for load distribution in iptables mode is random selection— an incoming request goes to a randomly chosen pod within a service.
2) The older (and former default) kube-proxy mode is user space, which uses round-robin load distribution, allocating the next available pod on an IP list, and then rotates the list.
For true load balancing, the most flexible method is Ingress, which operates by means of a controller in a specialized Kubernetes pod.
- The controller includes an Ingress resource—a set of rules governing traffic—and a daemon which applies those rules.
- The controller has its own built-in features for load balancing.
- You can also include more complex load-balancing rules in an Ingress resource, allowing you to take into account load-balancing feature.
As an alternative to Ingress, you can also use a service of the LoadBalancer type, which uses a cloud service-based, external load balancer. LoadBalancer can only be used with specific cloud service providers, such as AWS, Azure, OpenStack, Cloud Stack, and Google Compute Engine, and the capabilities of the balancer are provider-dependent.
Containers are designed to be stateless. Running stateful applications like database on containers is challenging because of lack of stable naming conventions and lack of stable persistent storage per pod. But, Pods can be made stateful through volumes.
Current problem with Stateful pods:
- Files in containers are ephemeral
- Containers termination/crashes result in loss of data
- Can’t run stateful applications
- Cant share files between containers
Solution: Kubernetes volumes subsystem where we define a pod with storage path configured in it. Storage can be: Remote, Ephemeral, Local, and Container Storage Interface (in future)
(If we create a volume pod and attach a GCE instance as a data path, all the data is stored in that location. Now even if we delete the pod, it first detaches this volume and then deletes the pod. And if we use the same data path again, the data from previous pod instance is restored)
Here we directly mention the volume storage path while creating Volume pod and this is not a recommended way as we are hardcoding the data path and it could be vulnerable.
Life cycle of volume can be independent on pod. This is resolved with persistent volume and claims.
- Persistent volumes and claims are API objects in kubernetes.
- Persistent volumes don’t have a namespace and is accessible only by cluster administrator.
We can create PV with 100GB volume and call cloud programming APIs to create these volumes. Once these volumes are created, they are available for anybody to use.
Persistent volume Claims is a request made to Persistent volumes for storage. Kubernetes checks for Persistent volumes and binds Persistent volumes to Persistent volume Claims (PVC) depending upon the request. Now we can create a pod. Instead of calling volumes directly, it will call PVC.
Introduction to StatefulSets:
Statefulsets bring concept of ReplicaSets to stateful pods. (High Availability [HA] of applications is core concept of Kubernetes and it is fundamentally achieved through replicasets. We could package replicasets and define minimum number of replicas we need to run and kubernetes makes sure minimum number of pods are running.)
Statefulsets enable running Pods in “Clustered Mode”. They are ideal for deploying highly available database workloads. In general, they are valuable for applications that need:
- Stable and unique network identifiers.
- Stable and persistent Storage
- Ordered, graceful deployment and scaling
- Ordered, graceful deletion and termination
- Depend on Headless Service for Pod to Pod communication. (Headless service doesn’t point to a specific pod or endpoint, but acts as a bridge between pods that are clustered and part of statefulsets)
- Each Pod gets a DNS name accessible to other Pods in the Set.
- Leverage Persistent Volumes and Persistent Volume Claims.
- Each Pod is suffixed with a predictable, consistent original index.e.g.mysql-01, mysql-02
- Pods are created sequentially which is ideal for setting up master/Slave configuration.
- The identity is consistent regardless of the Node it is scheduled on
- Pods are terminated in LIFO order
Proxies plays main role in Kubernetes .There are several different proxies you may encounter when using Kubernetes, typically first 2 types are used:
1) The kubectl proxy:
- runs on a user’s desktop or in a pod
- proxies from a localhost address to the Kubernetes APIserver
- client to proxy uses HTTP
- proxy to APIserver uses HTTPS
- locates APIserver
- adds authentication headers
2) The APIserver proxy:
- is a bastion built into the APIserver
- connects a user outside of the cluster to cluster IPs which otherwise might not be reachable
- runs in the APIserver process
- client to proxy uses HTTPS (or http if APIserver so configured)
- proxy to target may use HTTP or HTTPS as chosen by proxy using available information
- can be used to reach a Node, Pod, or Service
- does load balancing when used to reach a Service
3) The kube proxy:
- runs on each node
- proxies UDP and TCP
- does not understand HTTP
- provides load balancing
- is just used to reach services
4) A Proxy/Load-balancer in front of APIserver(s):
- Existence and implementation varies from cluster to cluster (e.g. nginx)
- sits between all clients and one or more APIservers
- Acts as load balancer if there are several APIservers.
5) Cloud Load Balancers on external services:
- Provided by some cloud providers (e.g. AWS ELB, Google Cloud Load Balancer)
- Created automatically when the Kubernetes service has type LoadBalancer
- use UDP/TCP only
- Implementation varies from one cloud provider to others.
Kubernetes users will typically not need to worry about anything other than the first two types. The cluster admin will typically ensure that the latter types are setup correctly.
Kubernetes NetworkPolicy resources let you configure network access policies for the Pods.
- NetworkPolicy resources use labels to select pods and define rules which specify what traffic is allowed to the selected pods.
- In respect of pods, by default pods are non-isolated means they accept traffic from any source. Pods can become isolated with Network policies. Once there is any NetworkPolicy in a namespace selecting a particular pod, that pod will reject any connections that are not allowed by any NetworkPolicy.
- If no policies exist in a namespace, then all ingress and egress traffic is allowed to and from pods in that namespace.
- You can create a policy that explicitly allows all traffic in that namespace.
- A “default” egress isolation policy for a namespace by creating a NetworkPolicy that selects all pods but does not allow any egress traffic from those pods.
- You can create a policy that explicitly allows all egress traffic in that namespace.
- You can create a “default” policy for a namespace which prevents all ingress AND egress traffic by creating the following NetworkPolicy in that namespace.
Daemonsets are responsible to ensure all nodes can run copy of Pod. When nodes are added to cluster, pods are added to them. After nodes are removed, pods are garbage collected. Deleting a daemonset cleans up the Pods.
- Running a cluster storage daemon, such as glusterd, Ceph on each node.
- Running a logs collection daemon on every node, such as fluentd or logstash.
Daemonsets are optional and there are various alternatives to DaemonSet:
- Init scripts: It is possible to run daemon processes by directly starting them on a node using init.
- Bare Pods: It is possible to create Pods directly which specify a particular node to run on.
- Static Pods: Static Pods can be created by writing a file to a certain directory watched by Kubelet.
- Deployments: DaemonSets are similar to Deployments in that they both create Pods, and those Pods have processes which are not expected to terminate.
It is recommended to use a DaemonSet when it is important that a copy of a Pod always run on all or certain hosts.
Federation is used to manage multiple kubernetes clusters easily. It can be done by 2 methods:
- Sync resources across clusters: Federation provides the ability to keep resources in multiple clusters in sync.
- Cross cluster discovery: Federation provides the ability to auto-configure DNS servers and load balancers with backends from all clusters.
Federation enables areas like High Availability and avoids vendor lock-in. By spreading load across clusters and auto configuring DNS servers and load balancers, federation minimises the impact of cluster failure. It makes easier to migrate applications across clusters. Also, It prevents cluster provider lock-in. Federation is only helpful with multiple clusters. Multiple clusters are needed to achieve the following:
- Low latency: Having clusters in multiple regions minimises latency by serving users from the cluster that is closest to them.
- Fault isolation: It might be better to have multiple small clusters rather than a single large cluster for fault isolation
- Hybrid cloud: You can have multiple clusters on different cloud providers or on premise data centres.
But, federation can increase the network cost if the clusters are running in different regions on a cloud provider or on different cloud providers. A single error or bug can impact all clusters and overall the federation project is relatively new.
Kubeless is a serverless framework designed to run on top of kubernetes cluster. It accepts commands to register, list and delete functions that can be run on the cluster. The function registered with kubeless would be available over the web using a specified name. It can be triggered using HTTP requests. Serverless functions are most suited to run in containers because of their properties like short-lived, stateless, no need to publish services.
All of these are the core concepts of kubernetes. The services, their discovery, replica sets, ingress make the kubernetes ecosystem complete for reliable and stable deployment. Concepts of serverless computing are taking kubernetes to next generation.
[Tweet “Kubernetes: All You Need to Know!!! ~ via @CalsoftInc”]