Advanced Troubleshooting in Kubernetes

First Steps in Troubleshooting

Check Status of Cluster:

Use kubectl get nodes to make sure that all nodes are in the ‘Ready’ state.

Review what happened:

kubectl get events -n namespace shows what has been going on in the cluster recently.

Access Logs:

Use kubectl logs <pod-name> to look at recent logs and learn about possible problems.

Troubleshooting at the node level

Node Conditions:

Use kubectl describe node <node-name> to see what’s going on. Check for DiskPressure, MemoryPressure, or PID Pressure.

Use of the Resource:

Use top or free right on the node to keep track of how resources are being used.

Kubelet Status:

Make sure that the node’s kubelet service is working. Use journalctl -u kubelet to look at the logs.

Troubleshooting at the pod level

state:

Type kubectl describe pod <pod-name> to find out more about the state and events of a pod.

Container Inspections:

Use kubectl logs <pod-name> -c <container-name> to look at the logs for a particular container.

Pod Restart Problems:

Check the logs for crash loops and think about using kubectl describe.

Network Troubleshooting

Service Discovery:

Use kubectl get endpoints to make sure services are properly pointing to pods.

Pod-to-Pod Communication:

Use tools like ping or curl from inside a pod to test how well they work.

Network Policies:

Use kubectl get networkpolicy to look over the policies and make sure that the desired network traffic paths are allowed.

DNS Problems:

Make sure the CoreDNS or kube-dns service is running and looking up service names properly.

Storage Troubleshooting

PV & PVC Status:

Use kubectl get pv,pvc to check the binding status of persistent volumes and claims.

Access Modes:

Make sure the access mode of the pod fits the access mode of the provisioned PV.

Storage Class Problems:

Make sure the right storage class is given and the provisioner is working.

Mount Problems:

Use kubectl describe pod to see any problems with mounts that are caused by pods.

Advanced Troubleshooting in Kubernetes

Kubernetes, as the leading container orchestration platform, presents multiple intricate components that could lead to potential issues in various scenarios. Efficiently diagnosing these problems is an essential skill. Here are some common troubleshooting scenarios and their resolutions.

Preliminary Troubleshooting Steps

1. Examining Cluster Health

Description: In this scenario, we’ll intentionally taint a node to make it unschedulable and then inspect its state.

Scenario Creation:

kubectl taint nodes test-node key=value:NoSchedule

Troubleshooting:

kubectl get nodes
kubectl describe node test-node

2. Log Analysis

Description: A simulated faulty application will be deployed, which will exit immediately after logging an error.

Scenario Creation:

kubectl run faulty-app --image=busybox --command -- /bin/sh -c "echo 'Error: Something went wrong!' && exit 1"

Troubleshooting:
```
kubectl logs faulty-app
```

Node-level Troubleshooting

1. Node Resource Exhaustion

Description: A pod demanding a high amount of memory will be scheduled, potentially leading to resource exhaustion on the node.

Scenario Creation:

kubectl run resource-hog --image=busybox --requests='memory=800Mi' -- /bin/sh -c "while true; do sleep 1; done"

Troubleshooting:
```
kubectl describe node test-node
```

Pod-level Troubleshooting

1. Crashing Pods

Description: Investigate the reasons behind a crashing pod (this scenario has been previously set up in the log analysis example).
Troubleshooting:
```
kubectl describe pod faulty-app
```

2. Pod Access

Description: Deploy a simple pod and access its shell, ensuring that there are no access-related issues.

Scenario Creation:

kubectl run simple-pod --image=busybox --command -- /bin/sh -c "sleep 3600"

Troubleshooting:
```
kubectl exec -it simple-pod -- /bin/sh
```

Network Troubleshooting

1. Networking Issues

Description: Create two pods in different namespaces and attempt to communicate between them.

Scenario Creation:

kubectl create namespace ns1
kubectl create namespace ns2
kubectl run nginx1 --image=nginx --namespace=ns1
kubectl run nginx2 --image=nginx --namespace=ns2

Troubleshooting:

kubectl exec -it -n ns1 nginx1 -- curl nginx2.ns2.svc.cluster.local

2. Network Policies

Description: Establish a network policy that blocks incoming traffic and diagnose its impact.

Scenario Creation:

kubectl apply -f- <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: block-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
EOF

Troubleshooting:

kubectl get networkpolicies
kubectl describe networkpolicy block-all

Storage Troubleshooting

1. PV and PVC Binding

Description: Simulate a mismatch between the configurations of a PersistentVolume and a PersistentVolumeClaim.

Scenario Creation:

Generate PersistentVolume YAML:

kubectl create pv example-pv --storage-class=manual --capacity=storage=1Gi --access-mode=ReadWriteOnce --host-path=path="/tmp" --dry-run=client -o yaml > example-pv.yaml

Edit example-pv.yaml to ensure the hostPath section is:

hostPath:
  path: "/tmp"

Apply the configuration:

kubectl apply -f example-pv.yaml

Generate PersistentVolumeClaim YAML:

kubectl create pvc example-pvc --storage-class=manual --access-mode=ReadWriteMany --resources=requests=storage=1Gi --dry-run=client -o yaml > example-pvc.yaml

Apply the configuration:

kubectl apply -f example-pvc.yaml

Troubleshooting:
```
kubectl describe pvc example-pvc
```