Kubernetes manages container failures within Pods using a restartPolicy
defined in the Pod spec. This policy determines how Kubernetes reacts to
containers exiting due to errors or other reasons. For a broader view of
how containers move through their lifecycle, see Pod Lifecycle.
When a container fails, Kubernetes follows this sequence:
restartPolicy.restartPolicy.
This prevents rapid, repeated restart attempts from overloading the system.In practice, a CrashLoopBackOff is a condition or event that might be seen as output
from the kubectl command, while describing or listing Pods, when a container in the Pod
fails to start properly and then continually tries and fails in a loop.
In other words, when a container enters the crash loop, Kubernetes applies the exponential backoff delay mentioned in the Container restart policy. This mechanism prevents a faulty container from overwhelming the system with continuous failed start attempts.
The CrashLoopBackOff can be caused by issues like the following:
Failure result
as mentioned in the probes section.To investigate the root cause of a CrashLoopBackOff issue, a user can:
kubectl logs <name-of-pod> to check the logs of the container.
This is often the most direct way to diagnose the issue causing the crashes.kubectl describe pod <name-of-pod> to see events
for the Pod, which can provide hints about configuration or resource issues.When a container in your Pod stops, or experiences failure, Kubernetes can restart it. A restart isn't always appropriate; for example, init containers run only once (if successful), during Pod startup. You can configure restarts as a policy that applies to all Pods, or using container-level configuration (for example: when you define a sidecar container) or define container-level override.
The Kubernetes project recommends following cloud-native principles, including resilient design that accounts for unannounced or arbitrary restarts. You can achieve this either by failing the Pod and relying on automatic replacement, or you can design for container-level resilience. Either approach helps to ensure that your overall workload remains available despite partial failure.
The spec of a Pod has a restartPolicy field with possible values Always, OnFailure,
and Never. The default value is Always.
The restartPolicy for a Pod applies to app containers
in the Pod and to regular init containers.
Sidecar containers
ignore the Pod-level restartPolicy field: in Kubernetes, a sidecar is defined as an
entry inside initContainers that has its container-level restartPolicy set to Always.
For init containers that exit with an error, the kubelet restarts the init container if
the Pod level restartPolicy is either OnFailure or Always:
Always: Automatically restarts the container after any termination.OnFailure: Only restarts the container if it exits with an error (non-zero exit status).Never: Does not automatically restart the terminated container.The following table shows how containers behave under different restart policies and exit codes:
| Exit Code | restartPolicy: Always |
restartPolicy: OnFailure |
restartPolicy: Never |
Sidecar Containers |
|---|---|---|---|---|
| 0 (Success) | Restarts | Does not restart | Does not restart | Always restarts |
| Non-zero (Failure) | Restarts | Restarts | Does not restart | Always restarts |
The restart behavior is particularly important when choosing between Deployments and Jobs:
restartPolicy: Always (the only allowed value) to keep applications running continuouslyrestartPolicy: OnFailure or restartPolicy: Never to handle batch processing tasks appropriatelyrestartPolicy because they have their own container-level restartPolicy: AlwaysHere are concrete examples demonstrating the different restart behaviors:
Example 1: Web server with restartPolicy: Always (typical for Deployments)
apiVersion: v1
kind: Pod
metadata:
name: web-server
spec:
restartPolicy: Always # Container restarts regardless of exit code
containers:
- name: nginx
image: nginx:1.14.2
# If this container crashes or exits for any reason, it will be restarted
Example 2: Batch job with restartPolicy: OnFailure
apiVersion: batch/v1
kind: Job
metadata:
name: data-processor
spec:
template:
spec:
restartPolicy: OnFailure # Only restart on non-zero exit codes
containers:
- name: processor
image: busybox:1.28
command: ['sh', '-c', 'echo "Processing data..."; exit 0']
# Exit code 0: Job completes successfully, no restart
# Exit code 1+: Container restarts to retry the task
Example 3: One-time task with restartPolicy: Never
apiVersion: v1
kind: Pod
metadata:
name: migration-task
spec:
restartPolicy: Never # Never restart, regardless of exit code
containers:
- name: migrate
image: busybox:1.28
command: ['sh', '-c', 'echo "Running migration..."; exit 1']
# Even with exit code 1 (failure), the container will not restart
# The Pod will remain in Failed state
Sidecar containers have special restart behavior that differs from regular app containers:
restartPolicy: They use their own container-level restartPolicy field, which is always set to AlwaysExample: Pod with sidecar container
apiVersion: v1
kind: Pod
metadata:
name: app-with-sidecar
spec:
restartPolicy: OnFailure # Applies to main container only
initContainers:
- name: logging-sidecar # This is a sidecar container
image: fluent/fluent-bit:1.8
restartPolicy: Always # Sidecar always restarts regardless of exit code
# Provides logging services throughout Pod lifetime
containers:
- name: main-app # This follows Pod-level restartPolicy
image: nginx:1.14.2
# Will only restart on failure (non-zero exit) due to Pod's OnFailure policy
restartPolicy: OnFailure, the sidecar container will restart regardless of its exit code because sidecar containers always have restartPolicy: Always at the container level.When the kubelet is handling container restarts according to the configured restart
policy, that only applies to restarts that make replacement containers inside the
same Pod and running on the same node. After containers in a Pod exit, the kubelet
restarts them with an exponential backoff delay (10s, 20s, 40s, …), that is capped at
300 seconds (5 minutes). Once a container has executed for 10 minutes without any
problems, the kubelet resets the restart backoff timer for that container.
Sidecar containers and Pod lifecycle
explains the behaviour of init containers when specify restartPolicy field on it.
Kubernetes v1.35 [beta](enabled by default)If your cluster has the feature gate ContainerRestartRules enabled, you can specify
restartPolicy and restartPolicyRules on individual containers to override the Pod
restart policy. Container restart policy and rules applies to app containers
in the Pod and to regular init containers.
A Kubernetes-native sidecar container
has its container-level restartPolicy set to Always.
The container restarts will follow the same exponential backoff as pod restart policy described above. Supported container restart policies:
Always: Automatically restarts the container after any termination.OnFailure: Only restarts the container if it exits with an error (non-zero exit status).Never: Does not automatically restart the terminated container.Additionally, individual containers can specify restartPolicyRules. If the restartPolicyRules
field is specified, then container restartPolicy must also be specified. The restartPolicyRules
define a list of rules to apply on container exit. Each rule will consist of a condition
and an action. The supported condition is exitCodes, which compares the exit code of the container
with a list of given values. The supported action is Restart, which means the container will be
restarted. The rules will be evaluated in order. On the first match, the action will be applied.
If none of the rules’ conditions matched, Kubernetes fallback to container’s configured
restartPolicy.
For example, a Pod with OnFailure restart policy that have a try-once container. This allows
Pod to only restart certain containers:
apiVersion: v1
kind: Pod
metadata:
name: on-failure-pod
spec:
restartPolicy: OnFailure
containers:
- name: try-once-container # This container will run only once because the restartPolicy is Never.
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'echo "Only running once" && sleep 10 && exit 1']
restartPolicy: Never
- name: on-failure-container # This container will be restarted on failure.
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'echo "Keep restarting" && sleep 1800 && exit 1']
A Pod with Always restart policy with an init container that only execute once. If the init
container fails, the Pod fails. This allows the Pod to fail if the initialization failed,
but also keep running once the initialization succeeds:
apiVersion: v1
kind: Pod
metadata:
name: fail-pod-if-init-fails
spec:
restartPolicy: Always
initContainers:
- name: init-once # This init container will only try once. If it fails, the pod will fail.
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'echo "Failing initialization" && sleep 10 && exit 1']
restartPolicy: Never
containers:
- name: main-container # This container will always be restarted once initialization succeeds.
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'sleep 1800 && exit 0']
A Pod with Never restart policy with a container that ignores and restarts on specific exit codes. This is useful to differentiate between restartable errors and non-restartable errors:
apiVersion: v1
kind: Pod
metadata:
name: restart-on-exit-codes
spec:
restartPolicy: Never
containers:
- name: restart-on-exit-codes
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'sleep 60 && exit 0']
restartPolicy: Never # Container restart policy must be specified if rules are specified
restartPolicyRules: # Only restart the container if it exits with code 42
- action: Restart
exitCodes:
operator: In
values: [42]
Restart rules can be used for many more advanced lifecycle management scenarios. Note, restart rules are affected by the same inconsistencies as the regular restart policy. The kubelet restarts, container runtime garbage collection, intermitted connectivity issues with the control plane may cause the state loss and containers may be re-run even when you expect a container not to be restarted.
Kubernetes v1.35 [alpha](disabled by default)If your cluster has the feature gate RestartAllContainersOnContainerExits enabled, you can specify
RestartAllContainers as an action in restartPolicyRules at container level. When a container's exit
matches a rule with this action, the entire Pod is terminated and restarted in-place.
This "in-place" restart offers a more efficient way to reset a Pod's state compared to full deletion and recreation. This is especially valuable for workloads where rescheduling is costly, such as batch jobs or AI/ML training tasks.
When a RestartAllContainers action is triggered, the kubelet performs the following steps:
Fast Termination: All running containers in the Pod are terminated.
The configured terminationGracePeriodSeconds is not respected, and any configured preStop hooks
are not executed. This ensures a swift shutdown.
Preservation of Pod Resources: The Pod's essential resources are preserved:
emptyDir and mounted volumesPod Status Update: The Pod's status is updated with a PodRestartInPlace condition set to True.
This makes the restart process observable.
Full Restart Sequence: Once all containers are terminated, the PodRestartInPlace condition
is set to False, and the Pod begins the standard startup process:
A key aspect of this feature is that all containers are restarted, including those that
previously completed successfully or failed. The RestartAllContainers action overrides
any configured container-level or Pod-level restartPolicy.
This mechanism is useful in scenarios where a clean slate for all containers is necessary, such as:
init container sets up an environment that can become corrupted, this feature ensures
the setup process is re-executed.Consider a workload where a watcher sidecar is responsible for restarting the main application from a known-good state if it encounters an error. The watcher can exit with a specific code to trigger a full, in-place restart of the worker Pod.
apiVersion: v1
kind: Pod
metadata:
name: ml-worker
spec:
restartPolicy: Never # The pod itself should not restart unless explicitly told to.
initContainers:
- name: setup-environment
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'echo "Setting up environment"']
# This init container runs once to prepare the environment.
# It will run again after a RestartAllContainers action.
- name: watcher-sidecar
image: registry.k8s.io/busybox:1.27.2
# In a real-world scenario, this would be a dedicated watcher image.
# This command simulates the watcher exiting with a special code.
command: ['sh', '-c', 'sleep 60; exit 88']
restartPolicy: Always
restartPolicyRules:
- action: RestartAllContainers
exitCodes:
# Exit code 88 triggers a full pod restart.
operator: In
values: [88]
containers:
- name: main-application
image: registry.k8s.io/busybox:1.27.2
command: ['sh', '-c', 'echo "Application is running"; sleep 3600']
In this example:
restartPolicy is Never.watcher-sidecar runs a command and then exits with code 88.RestartAllContainers action.setup-environment init container and the main-application container,
is then restarted in-place. The pod keeps its UID, sandbox, IP, and volumes.Kubernetes v1.33 [alpha](disabled by default)With the alpha feature gate ReduceDefaultCrashLoopBackOffDecay enabled,
container start retries across your cluster will be reduced to begin at 1s
(instead of 10s) and increase exponentially by 2x each restart until a maximum
delay of 60s (instead of 300s which is 5 minutes).
If you use this feature along with the alpha feature
KubeletCrashLoopBackOffMax (described below), individual nodes may have
different maximum delays.
Kubernetes v1.35 [beta](enabled by default)With the feature gate KubeletCrashLoopBackOffMax enabled, you can
reconfigure the maximum delay between container start retries from the default
of 300s (5 minutes). This configuration is set per node using kubelet
configuration. In your kubelet configuration,
under crashLoopBackOff set the maxContainerRestartPeriod field between "1s" and
"300s". As described above in Container restart policy,
delays on that node will still start at 10s and increase exponentially by 2x
each restart, but will now be capped at your configured maximum. If the
maxContainerRestartPeriod you configure is less than the default initial value
of 10s, the initial delay will instead be set to the configured maximum.
See the following kubelet configuration examples:
# container restart delays will start at 10s, increasing
# 2x each time they are restarted, to a maximum of 100s
kind: KubeletConfiguration
crashLoopBackOff:
maxContainerRestartPeriod: "100s"
# delays between container restarts will always be 2s
kind: KubeletConfiguration
crashLoopBackOff:
maxContainerRestartPeriod: "2s"
If you use this feature along with the alpha feature
ReduceDefaultCrashLoopBackOffDecay (described above), your cluster defaults
for initial backoff and maximum backoff will no longer be 10s and 300s, but 1s
and 60s. Per node configuration takes precedence over the defaults set by
ReduceDefaultCrashLoopBackOffDecay, even if this would result in a node having
a longer maximum backoff than other nodes in the cluster.