Why Your Kubernetes Pods Crash and How to Fix Them

The Reality of Production Incidents

It is 3 AM. Your pager rings. You type kubectl get pods. The screen shows your pods are Running. Yet your app is still dropping user traffic. Looking at app logs will not help much here. In large incidents, logs often stop at the container level. A Running status only means the main process is alive. It does not mean the entire system is working as expected. Skilled engineers do more than react to symptoms. They identify the root cause by examining the system layer by layer.

When you check these layers in order, you can resolve incidents much faster:

Layer	What to Inspect
Container & Process	Exit codes, memory limits, Java memory, and hidden OOM kills.
Runtime & Probes	CPU throttling, dead processes, and bad health checks.
Node & Eviction	Node memory limits, Pod rules (PDBs), and crash loops.
Storage Layer	Volume locks, stuck storage, and ConfigMap bugs.
Network & Routing	DNS delays, full network tables, and dead IP addresses.

This guide shows you the exact commands and simple explanations to fix these problems.

Container & Process Level: Decoding Exit Codes and Memory

When a container dies, Linux saves an exit code. Read these codes carefully. They tell you exactly what the system did.

Exit Code 137: The OOMKilled Reality

Exit Code 137 means the system killed your container because it used too much memory. Do not just add more memory. Find out why it happened first. Linux picks what to kill based on a score called oom_score_adj.

Guaranteed pods: The score is -997. They are very hard to kill.
BestEffort pods: The score is 1000. They get killed first.
Burstable pods: The score depends on memory requests. Warning: smaller memory requests actually give your pod a higher chance to be killed.

Before Kubernetes 1.28, if a child process used too much memory, only the child died. Kubernetes did not notice. Now, with cgroup v2, Kubernetes 1.28 kills the whole container if any child process uses too much memory.

If you want the old way back, Kubernetes 1.32 added a new flag called singleProcessOOMKill. Set this to true to only kill the bad process, not the whole container. Also, Kubernetes 1.36 added a better way to protect memory called Tiered Memory Protection. Guaranteed pods get strict protection using memory.min. Burstable pods get soft protection using memory.low.

# Get the exact termination reason from the Kubernetes API
kubectl get pod <pod-name>

-o jsonpath='{. status. containerStatuses[*]. lastState. terminated. reason}'

# Verify the kernel OOM killer logs on the affected node
dmesg | grep -i "oom\|killed\|memory"
journalctl -k | grep -i "Memory cgroup out of memory"

Exit Code 139: Segmentation Faults

Exit Code 139 means the app tried to use memory it does not own. This is usually a bug in C/C++ code. You need to check the core dump file to fix it.

JVM Memory Misalignment and cgroup v2

Java apps often get killed even when the main memory (heap) is only half full. This happens because Java uses extra memory outside the heap for background tasks. If you use the -Xmx flag and set it too high, this extra memory pushes you over the container limit. The system kills the pod.

Fix: Do not use the static -Xmx flag. Use -XX:MaxRAMPercentage=75.0 instead. This tells Java to save 25% of the memory for those extra background tasks.

Runtime Limits & Probes: When the Pod Lies About Its Health

A running pod can still be broken. Bad health checks can restart a perfectly healthy app.

CPU Throttling-Induced Probe Timeouts

If a health check times out, your app might not be frozen. It might just be paused by the CPU. The system gives your container a small slice of CPU time. If your app uses all its time too fast, the system pauses it. If Kubernetes runs a health check while the app is paused, the check fails. Kubernetes then restarts your app for no reason.

To check for this:

# Check cgroup v2 CPU throttle stats directly on the node

cat /sys/fs/cgroup/cpu. stat | grep throttled

# Or check per-container stats via containerd

crictl stats <container-id>

The Cascading Failure of Dependency-Checking Probes

Never check your database from a liveness probe. If the database goes offline for 5 seconds, all your pod liveness probes will fail at the same time. Kubernetes will kill and restart all your pods at once. When the database comes back, hundreds of pods will hit it at the exact same time. The database will crash again.

Rule: Liveness probes should only check if the app itself is stuck.

Exec Probes and Zombie Processes

Using shell scripts for health checks can leave dead "zombie" processes behind. These dead processes will slowly fill up your system.

Fix: Use an init system like dumb-init inside your container, or use simple HTTP probes instead.

Node Pressure & Eviction: Debugging CrashLoopBackOff

CrashLoopBackOff means your pod keeps crashing. Kubernetes makes it wait longer and longer before trying again. If the logs are empty, the pod is not crashing from a code bug. The node is kicking it out.

Kubelet Evictions vs. Application Crashes

When a node runs out of memory or disk space, it kicks pods off. If your pod gets sent back to the exact same full node, it gets kicked off again. This looks like a crash loop. Stop looking at app logs. Look at the node status instead.

# Check why the previous container instance actually died
kubectl logs <pod-name> --previous

# Inspect node pressure conditions
kubectl describe node <node-name> | grep -A5 Conditions

# Find all evicted pods across the cluster
kubectl get pods --all-namespaces --field-selector status. phase=Failed
kubectl get events --all-namespaces --field-selector reason=Evicted

The PodDisruptionBudget Mask

A PodDisruptionBudget (PDB) keeps your app safe when you want to drain a node. But a node that is out of memory does not care about your PDB. It will kick the pod out immediately. Make sure your PDB allows at least one pod to go down, or it might hide bigger node problems.

Storage Layer: Deadlocks and Finalizers

Sometimes storage volumes make pods refuse to start or stop.

The ReadWriteOncePod (RWOP) Evolution

Old storage volumes used a mode called ReadWriteOnce (RWO). This locked the storage to one node. During an update, a new pod on a new node could get stuck. It would wait forever for the old node to let go of the storage.

Fix: In Kubernetes 1.29, ReadWriteOncePod (RWOP) became fully ready (GA). It locks the storage to exactly one pod in the whole cluster. Use this to stop pods from getting stuck.

PVCs Stuck in Terminating

If a node dies suddenly, the storage might stay locked to it.

Fix: Check if the storage is detached in the cloud. Only after that, remove the finalizer to unlock it.

# Inspect stuck volume attachments
kubectl get volumeattachments

# Remove the protection finalizer -- only after confirming storage detached
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'

The ConfigMap subPath Trap

If you mount a ConfigMap as a full folder, Kubernetes updates the files automatically. But if you mount just one file using subPath, it will never update. This happens because Linux locks onto the file's ID (called an inode). When Kubernetes makes a new file, the ID changes, but the pod stays locked to the old one.

Fix: Mount the whole folder instead of using subPath.

Network Layer: DNS Delays and IP Exhaustion

Network bugs are very hard to find. They usually look like random timeouts.

The DNS 4-Query Tax (ndots:5)

Kubernetes adds a rule called ndots:5. This makes the system search cluster names before trying a real web address. If you look for api. external. com, the system tries three wrong names first. This makes your network slow.

Fix: Add a dot at the end of your address in your code (api. external. com.). This skips the extra searches.

Conntrack Exhaustion and the 5-Second Delay

If your DNS sometimes takes exactly 5 seconds, you have a Linux bug. When your app asks for IPv4 and IPv6 addresses at the exact same time using UDP, the Linux network tracker (nf_conntrack) gets confused. It drops the second packet. Because UDP does not retry right away, your app just sits there and waits 5 seconds before trying again

# Check current conntrack table fill level on the node
cat /proc/sys/net/netfilter/nf_conntrack_count
cat /proc/sys/net/netfilter/nf_conntrack_max

Fix: Install NodeLocal DNSCache. It runs a local cache and changes UDP requests to TCP. This fixes the bug forever.

Subnet IP Exhaustion (CNI Limitations)

In cloud setups, pods get real IP addresses from the network. The network tool holds a pool of extra IPs. If your subnet is small, it can run out of IPs before pods even start.

Fix: Put pod IP addresses in a different, larger network space than your main nodes.

Service-Level Routing: Beyond iptables

If your pod is healthy but you see 502 Bad Gateway errors, the network routing is broken. Old tools (kube-proxy with iptables) get very slow when you have many pods. They fall behind and send traffic to dead pods.

Solution	Mechanism	Scale	Recommended For
kube-proxy (iptables)	Sequential rules	Up to ~1,000 nodes	Small clusters
kube-proxy (IPVS)	Faast Lookups	Up to ~3,000 nodes	Medium clusters
Cilium (eBPF)	Deep kernel routing	Any scale	2026 standard

Observability & Advanced Debugging Tools

Ephemeral Containers for Distroless Images

Very safe container images do not have tools like curl or a shell. You cannot use kubectl exec. Instead, use an ephemeral container. This attaches a temporary container with tools right into your running pod:

kubectl debug -it <pod-name> \

--image=nicolaka/netshoot \

--target=<container-name>

If you need to check the node itself:

# Spawn a privileged debug container on the node itself

kubectl debug node/<node-name> -it --image=ubuntu -- bash

eBPF: Kernel-Level Tracing

eBPF lets you watch the system deep inside the kernel without changing your code. Old security tools checked data before it was fully loaded. Hackers could change the data and hide. This is called a TOCTOU attack. Modern eBPF tools like Tetragon use LSM hooks. LSM hooks read the data deep inside the kernel where it is safe from hackers.

The Clock Skew Anomaly in Distributed Tracing

If your tracing tools show a child task starting before its parent, your code is probably fine. This happens when the physical hardware clock on a worker node is out of sync (NTP drift).

Fix: Fix the hardware clock on the worker node.

# Debug NTP synchronization directly on the affected node
kubectl debug node/<node-name> -it --image=ubuntu -- \
bash -c "apt-get update -q && apt-get install -y chrony && chronyc tracking"

The Live Debug Playbook

Use this table when your pager rings. Run these commands before guessing what is wrong.

Symptom	First Commands to Run	Likely Root Cause
Exit Code 137 (OOMKilled)	dmesg \| grep -i oomkubectl get pod -o jsonpath='{. . . reason}'	JVM heap + off-heap over cgroup limit; wrong QoS class
Exit Code 139 (SIGSEGV)	Generate core dump; check JNI library CPU architecture	Native C/C++ bug or wrong-arch shared library
CrashLoopBackOff (empty logs)	kubectl logs --previouskubectl describe node \| grep Conditions	Node memory/disk pressure eviction loop
502 Bad Gateway (pod healthy)	kubectl get endpoints <svc>iptables -L -t nat \| grep <svc-ip>	Stale kube-proxy iptables rules routing to dead pod IP
DNS 5-second delay	cat /proc/sys/net/netfilter/nf_conntrack_count	conntrack UDP race deploy NodeLocal DNSCache
Pods stuck in Pending	kubectl describe pod \| grep Events	CNI subnet IP exhaustion; expand CIDR or use ENIConfig
PVC stuck in Terminating	kubectl get volumeattachments	Dead node holding VolumeAttachment; patch PVC finalizers
Trace spans out of order	kubectl debug node/<n> -- chronyc tracking	NTP clock drift on worker node; resync chronyd

Building Systemic Reliability

Do not just look at application logs. Good engineers look at the whole system. Use this guide to find out if the problem is in the kernel, the network, or the hardware. By learning these simple patterns, you will stop guessing. You will find and fix the real problem the first time.