Schedule Tasks in Kubernetes with CronJobs

Kubernetes CronJobs

Introduction to CronJobs

In Kubernetes, CronJobs are native objects that manage time-based jobs. They are essentially the Kubernetes equivalent of a cron task in traditional systems. A CronJob will run jobs on a time-based schedule, and those jobs will create individual Pods for their tasks which are then managed by the CronJob until they complete.

Defining a CronJob

Here’s a basic structure for a CronJob in YAML:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-cronjob
spec:
  schedule: "*/1 * * * *"  # Every minute
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: my-container
            image: my-image
          restartPolicy: OnFailure

schedule: This is a string that defines when the job should run. It follows the standard cron format.
jobTemplate: The job that should be run at the specified schedule. It is essentially a pod specification with added fields relevant to jobs.

Other important properties

You can include startingDeadlineSeconds, completions, parallelism, and other relevant fields under the spec section of your CronJob definition. Here’s an example of how you can modify your existing YAML configuration:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-cronjob
spec:
  schedule: "*/1 * * * *"  # Every minute
  startingDeadlineSeconds: 100  # Max time to start a job if missed
  concurrencyPolicy: Forbid  # Options: Allow, Forbid, Replace
  successfulJobsHistoryLimit: 3  # Keep 3 successful job records
  failedJobsHistoryLimit: 1  # Keep 1 failed job record
  ttlSecondsAfterFinished: 3600  # Clean up finished jobs after 1 hour
  jobTemplate:
    spec:
      backoffLimit: 3  # Number of retries before marking a job as failed
      completions: 5  # Total number of completions needed
      parallelism: 2  # Number of pods processed in parallel
      activeDeadlineSeconds: 200  # Max time for the job to run
      template:
        spec:
          containers:
          - name: my-container
            image: my-image
          restartPolicy: OnFailure

In this modified configuration:

startingDeadlineSeconds specifies the deadline in seconds for starting the job if it misses its scheduled time for any reason.
concurrencyPolicy determines how to treat concurrent executions of a job.
successfulJobsHistoryLimit and failedJobsHistoryLimit specify how many completed and failed jobs should be kept.
Inside the jobTemplate, completions is the number of times the job needs to be successfully completed.
parallelism is the number of jobs that will run in parallel.
activeDeadlineSeconds sets the maximum duration that the job can run.
backoffLimit is set to 3, meaning the job will be retried up to 3 times before it is considered failed.
ttlSecondsAfterFinished is set to 3600 seconds (1 hour), which means completed jobs (either successful or failed) will be automatically deleted one hour after finishing.

The restartPolicy in a Kubernetes job or a pod template within a CronJob is quite relevant and plays a crucial role in how the Kubernetes system handles container failures or terminations.

For Jobs and CronJobs, restartPolicy is particularly important. A common setting in Jobs is OnFailure, as you might want the job to retry if it fails.
Always is not a valid restartPolicy for Jobs and CronJobs since Jobs are intended to run to completion (either success or failure), and an always-restarting policy would contradict this behavior.
Never can be used if you want to ensure that no automatic restarts are attempted, which can be useful for debugging or in scenarios where a failure must be handled in a custom manner.
The combination of restartPolicy and backoffLimit in a job definition can define the job’s retry behavior. For example, if restartPolicy is set to OnFailure and backoffLimit is greater than 0, the job will be retried up to the specified limit if it fails.

YAML differences between Jobs and CronJobs

Understanding these differences is crucial, especially for scenarios covered in the CKAD exam:

Schedule: Only present in CronJob for defining the time-based schedule.
Concurrency Policy and History Limits: Specific to CronJob for managing job execution and retention.
JobTemplate: In CronJob, the job configuration is nested under jobTemplate, whereas in a Job, the configuration is directly under spec.

Monitoring and Troubleshooting CronJobs

List CronJobs: Use the command kubectl get cronjobs to list existing CronJobs.
View CronJob Details: To see more details, use kubectl describe cronjob <cronjob-name>.
View Logs: Since CronJobs create Jobs, which in turn create Pods, to view the logs of a CronJob, you first need to identify the specific Job/Pod. This can be done with kubectl get jobs and then kubectl logs <pod-name> to fetch the logs.
Common Issues:
- Misconfigured schedule
- Image or command errors inside the container
- Insufficient resources or quotas
- Job running longer than expected

Best Practices

Idempotence: Ensure that the tasks being run are idempotent. If a CronJob fails and is retried, it shouldn’t create unintended side-effects.
Concurrency: By default, CronJobs are allowed to run concurrently. Use the concurrencyPolicy field to adjust this if needed.
Failure Handling: Utilize the restartPolicy to define what should happen if the job fails. Most of the time, OnFailure is a good choice.
Cleanup Old Jobs: By default, all successful and failed job pods are kept, which can clutter the cluster. Adjust the .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields to clean up old job pods.

Exercises for CronJobs, Troubleshooting, and Imperative Commands

Exercise 1: Creating a CronJob Imperatively

Objective: Create a CronJob that runs every 5 minutes using the busybox image and the command echo "Hello from CronJob!".

Use the imperative command to create a CronJob.
Validate that the CronJob has been created.
Observe the Pods created by the CronJob over a 10-minute period.

Solution:

kubectl create cronjob hello-cron --image=busybox --schedule="*/5 * * * *" -- echo "Hello from CronJob!"
kubectl get cronjobs
# Wait and check for pods periodically
kubectl get pods

Exercise 2: Troubleshooting a Failing CronJob

Objective: Troubleshoot a CronJob that is not producing any Pods.

Here’s a CronJob YAML for a task that’s supposed to run every minute:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: faulty-cron
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: my-container
            image: busybox
            command:
            - "/bin/nonexistent"
          restartPolicy: OnFailure

Apply this YAML to your cluster.
Identify the issue that’s preventing the CronJob from creating any Pods.
Fix the issue and verify.

Solution:

# Apply the CronJob
kubectl apply -f faulty-cron.yaml

# Check the jobs
kubectl get jobs

# Describe the job to see what's wrong
kubectl describe job <job-name>

# From the describe command, you'll notice a command error. 
# The command doesn't exist, which causes the container to fail immediately.

# To fix, modify the YAML to have a valid command, e.g., ["echo", "Fixed!"]

Exercise 3: Using Imperative Commands for Troubleshooting

Objective: Delete all the Pods associated with a specific CronJob.

Use the previous CronJob definition (hello-cron).
Generate some Pods by waiting for 10 minutes.
Use imperative commands to fetch all the Pods associated with hello-cron.
Delete all these Pods using a single command.

Solution:

# Fetch all pods associated with the CronJob
pods=$(kubectl get pods --selector=job-name=hello-cron-<unique_id> -o=jsonpath='{.items[*].metadata.name}')

# Delete these pods
kubectl delete pods $pods