Celery Not Running in Django Production: Debugging Guide

A common production failure is that the Django web app is healthy, but Celery tasks stop running.

Problem statement

A common production failure is that the Django web app is healthy, but Celery tasks stop running. You may see emails not sending, background jobs piling up, periodic tasks never firing, or tasks queued in Redis but never consumed.

In production, this usually comes down to one of four things:

the Celery worker is not running
Celery Beat is not running
the broker connection is broken
tasks are being published, but the worker is not consuming the expected queue or is failing after receipt

The goal is to identify which layer is failing before changing configuration.

Quick answer

To debug Celery not running in Django production, check these in order:

confirm whether the failure is worker, beat, broker, or task code
check service status and logs
verify the worker is using the correct Django settings and environment variables
test broker connectivity from the worker host
confirm tasks are registered and the worker is listening to the right queue
if periodic tasks fail, make sure Beat runs as a separate production service
after any fix, enqueue a known test task and verify it executes exactly once

Step-by-step solution

1. Confirm what is actually failing

Before restarting services, identify the failure mode.

Worker failure: queued tasks never run
Beat failure: scheduled tasks never get queued
Broker failure: workers cannot connect to Redis or RabbitMQ
Task code failure: worker receives tasks, but they crash

Also record your process model:

systemd
supervisord
Docker Compose
Kubernetes

That determines where logs and restart behavior live.

Verification

Check whether tasks are being queued at all. If you use Redis as a broker, queue growth can indicate publish succeeds but consume fails. If nothing is queued, the issue may be in Django task submission or Beat.

2. Check whether Celery processes are running

systemd

sudo systemctl status celery
sudo systemctl status celery-beat
sudo systemctl is-enabled celery
sudo systemctl is-enabled celery-beat

Supervisor

sudo supervisorctl status
sudo supervisorctl restart celery
sudo supervisorctl restart celery-beat

Docker Compose

docker compose ps
docker compose logs worker --tail=100
docker compose logs beat --tail=100

Direct process check

ps aux | grep '[c]elery'

If the service is running, confirm it uses the right user, working directory, and virtualenv.

Example systemd unit

[Unit]
Description=Celery Worker
After=network.target

[Service]
Type=simple
User=django
Group=django
WorkingDirectory=/srv/myapp/current
EnvironmentFile=/etc/myapp/myapp.env
ExecStart=/srv/myapp/venv/bin/celery -A myapp worker --loglevel=INFO
Restart=always

[Install]
WantedBy=multi-user.target

Verification

If the service exits immediately, do not keep restarting it blindly. Go to logs first.

Rollback note

If a recent unit file change broke startup, restore the last known good unit or remove the bad override, then validate what systemd is actually loading:

sudo systemctl cat celery
sudo systemctl daemon-reload
sudo systemctl restart celery
sudo systemctl status celery --no-pager

3. Review logs first

For worker startup failures, logs usually show the cause quickly.

systemd journal

sudo journalctl -u celery -n 100 --no-pager
sudo journalctl -u celery-beat -n 100 --no-pager

Common failures to look for

ModuleNotFoundError
bad DJANGO_SETTINGS_MODULE
permission denied on project directory or socket/file paths
Redis authentication failure
RabbitMQ access refused
queue declaration errors
task import errors during startup

Increase log verbosity temporarily

sudo systemctl edit celery

Override ExecStart with:

[Service]
ExecStart=
ExecStart=/srv/myapp/venv/bin/celery -A myapp worker --loglevel=DEBUG

Then:

sudo systemctl daemon-reload
sudo systemctl restart celery

Use this only long enough to diagnose. Return to INFO after the issue is clear.

Verification

You want to see worker startup complete and queues listed without repeated connection retries.

4. Verify Django settings and environment variables

A very common reason a Celery worker does not run in production is that the web process and worker process are not using the same environment.

Check the worker environment for:

DJANGO_SETTINGS_MODULE
CELERY_BROKER_URL
CELERY_RESULT_BACKEND
SECRET_KEY only if your Django settings import path requires it; Celery itself does not normally need it directly
database credentials if tasks use the database
timezone settings if Beat is involved

Example environment file

DJANGO_SETTINGS_MODULE=myapp.settings.production
CELERY_BROKER_URL=redis://:strongpassword@redis.internal:6379/0
CELERY_RESULT_BACKEND=redis://:strongpassword@redis.internal:6379/1

For Docker:

docker compose exec worker env | sort
docker compose exec beat env | sort

Do not print secrets into shared logs or tickets.

Verification

From the same virtualenv and same environment as the worker, run:

cd /srv/myapp/current
source /srv/myapp/venv/bin/activate
celery -A myapp report

This helps confirm the app loads and the broker configuration is what you expect.

5. Confirm broker connectivity and health

If broker connectivity is the cause, the worker may be healthy but unable to connect.

Redis checks

nc -zv redis.internal 6379

# If Redis does not require auth/TLS
redis-cli -h redis.internal -p 6379 ping

# If Redis requires auth
redis-cli -u 'redis://:strongpassword@redis.internal:6379/0' ping

If Redis requires authentication or TLS, test with the same connection settings the worker uses; a plain redis-cli ... ping can be misleading.

RabbitMQ or generic port reachability

nc -zv broker.internal 5672

Check:

hostname correctness
port
credentials
TLS requirements
firewall rules
DNS resolution on the worker host

If the broker restarted recently, investigate whether in-memory queues were lost or whether persistence is configured correctly.

Verification

Worker logs should stop showing reconnect loops and start showing ready or connected state.

6. Make sure the worker is consuming the right queue

A major reason tasks are queued but not running is queue mismatch.

If you route tasks to a custom queue, but start the worker with a restrictive queue list, tasks will sit forever.

Check worker queue settings

# Run from the same virtualenv/environment as the worker
celery -A myapp inspect ping
celery -A myapp inspect active_queues
celery -A myapp inspect registered

If these commands return no replies, confirm the worker is running and that remote control is not disabled before concluding the queues or task registry are wrong.

If the worker is started with -Q, verify it includes the queue being used:

ExecStart=/srv/myapp/venv/bin/celery -A myapp worker -Q default,emails --loglevel=INFO

If no queue is explicitly set in task routing, confirm your default queue configuration is consistent.

Example Django settings

CELERY_TASK_DEFAULT_QUEUE = "default"

Verification

The task should appear on a queue the worker actually lists in active_queues.

7. Confirm task discovery and registration

If workers start but do not know about your tasks, they cannot execute them.

Typical Celery app setup

# myapp/celery.py
import os
from celery import Celery

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myapp.settings.production")

app = Celery("myapp")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()

And in myapp/__init__.py:

from .celery import app as celery_app

__all__ = ("celery_app",)

Check that task modules exist under installed Django apps, usually as tasks.py.

Verification

celery -A myapp inspect registered

If expected tasks are missing, inspect startup logs for import-time exceptions.

8. Debug failed tasks that appear to be running

Sometimes tasks are received and then fail immediately.

Look for tracebacks involving:

database connectivity
missing migrations
SMTP credentials
cache backends
third-party APIs
serialization errors

If the task arguments are not serializable by your configured serializer, the task may never run correctly.

Also review retry behavior. A task that retries forever can look like a stuck system rather than an obvious failure.

Verification

Find a log sequence showing:

task received
task started
task succeeded or failed with traceback

9. If Celery Beat is not running

For periodic task failures, remember Beat should usually run as its own process.

sudo systemctl status celery-beat
docker compose logs beat --tail=100

Check:

schedule backend configuration
timezone consistency
only one Beat instance runs in multi-server production unless you intentionally use a clustered scheduler

If multiple Beat instances run against the same schedule, periodic tasks may duplicate.

Verification

You should see Beat emitting scheduled tasks at the expected times, and workers receiving them.

10. Deployment-specific fixes

systemd

Check:

ExecStart
WorkingDirectory
User
EnvironmentFile
Restart=always

Repeated crash loops can hide the real startup error. Inspect logs before relying on automatic restarts.

Docker Compose

Make sure worker and beat both receive the same app environment.

services:
  worker:
    image: myapp:latest
    command: celery -A myapp worker --loglevel=INFO
    restart: always
    env_file:
      - .env

  beat:
    image: myapp:latest
    command: celery -A myapp beat --loglevel=INFO
    restart: always
    env_file:
      - .env

Release workflow

A common issue is deploying new Django code without restarting worker and beat. The web app serves new code, but Celery still runs old code.

After deploy, restart background services explicitly.

If a release includes schema changes, apply migrations before workers execute tasks that depend on the new schema. Restarting workers onto new code before migrations complete can break task execution.

Verification

After deployment, confirm service start time matches the new release time.

11. Verify the fix safely

Enqueue a known task and check logs.

Example from Django shell

python manage.py shell

from myapp.tasks import healthcheck_task
result = healthcheck_task.delay()
print(result.id)

Then check worker logs and expected side effect.

Also verify that queue depth decreases and the task runs once, not repeatedly.

Rollback and recovery

If the new release broke Celery:

restore the last working app release or service file
restart worker and beat
inspect queued, retried, or duplicated tasks before replaying anything

Be careful with retries and idempotency. Re-running payment, email, webhook, or external API tasks can create duplicate side effects. If tasks may have been partially processed, treat replay as a recovery operation, not a routine restart step.

Explanation

Celery production failures are usually not random. They happen at boundaries: process manager, environment loading, broker connectivity, task routing, or code import.

This workflow works because it narrows the problem in the right order: first determine whether tasks are queued, then whether workers are alive, then whether workers can connect, then whether they know the task and queue. That is faster and safer than changing multiple settings at once.

If your deployment repeatedly requires manual checks after every release, this is a good point to convert the process into reusable scripts or templates. The first parts worth automating are service restarts, broker connectivity checks, and a post-deploy smoke test that enqueues a known task and verifies completion.

Edge cases / notes

Long-running tasks may be killed by memory pressure or process limits rather than failing cleanly.
Tasks can fail after deployment if database migrations were not applied before workers started using new code.
Separate staging and production brokers. Sharing one broker across environments causes confusing cross-environment task execution.
DNS issues or clock drift can affect broker reachability and periodic scheduling.
If you use Docker, verify the container command was not overridden by a shell script that exits early.
For periodic tasks, avoid running multiple Beat instances unless your scheduler backend is designed for it.
If inspect commands do not return results, do not assume the worker is down until you confirm remote control behavior and the worker environment.

Internal links

For background on architecture, see how Celery works in Django production.

If you need a clean production setup, follow deploy Django with Celery and Redis and run Celery with systemd for Django production.

For operational recovery, use how to restart Django background workers safely in production.

You may also want these related troubleshooting guides: