Gunicorn Worker Timeout in Django: Root Causes and Fixes

A Gunicorn worker timeout in Django production usually means a worker process stopped responding before Gunicorn’s timeout limit expired.

Problem statement

A Gunicorn worker timeout in Django production usually means a worker process stopped responding before Gunicorn’s timeout limit expired. In logs, this often appears as WORKER TIMEOUT followed by a worker restart.

That is not the same as:

a reverse proxy timeout from Nginx or Caddy
a slow PostgreSQL query or lock by itself
a failed deploy where Gunicorn never started cleanly
an OOM kill from the host kernel

The real deployment problem is that requests are hanging or taking too long somewhere in the request path. Raising the timeout blindly can hide bad application behavior and increase recovery time under load.

Quick answer

If you see a Gunicorn worker timeout in Django, do this first:

Confirm the timeout is coming from Gunicorn, not only from Nginx.
Find whether the problem affects one route or the whole app.
Check for slow views, expensive queries, external API calls, wrong worker settings, memory pressure, or static/media requests hitting Django.
Fix the bottleneck first.
Only increase Gunicorn timeout for known valid long requests, and keep proxy timeouts aligned.
Verify with logs, controlled test requests, and resource checks.
Keep a rollback path for config and deploy changes.

Step-by-step solution

1. Confirm that the timeout is coming from Gunicorn

Check recent Gunicorn logs:

sudo journalctl -u gunicorn -n 100 --no-pager
sudo journalctl -u gunicorn -f

Look for messages like:

[CRITICAL] WORKER TIMEOUT (pid:12345)
[INFO] Booting worker with pid:12346

Then compare with Nginx logs:

sudo tail -f /var/log/nginx/error.log
sudo tail -f /var/log/nginx/access.log

If Nginx shows upstream failures around the same time, correlate the timestamp. A 504 from Nginx does not automatically mean Gunicorn timed out first.

Also test whether the issue is route-specific:

curl -I https://example.com/health/
curl -w "%{time_total}\n" -o /dev/null -s https://example.com/problem-endpoint/

If only one endpoint is slow, focus on application logic for that route. If all routes fail, check worker count, startup failures, CPU, memory, or database availability.

Verification check: confirm at least one of these is true before changing config:

Gunicorn logs show WORKER TIMEOUT
one or more requests stall until Gunicorn restarts a worker
proxy logs clearly match Gunicorn failures by time and route

2. Triage common root causes

Slow Django views or blocking code

Look for views doing heavy work inline:

report generation
large queryset iteration
file processing
synchronous HTTP calls
expensive template rendering

If DEBUG=False, you can still add targeted timing logs around known problem views:

import time
import logging

logger = logging.getLogger(__name__)

def report_view(request):
    start = time.monotonic()
    try:
        # expensive logic
        ...
    finally:
        logger.info("report_view duration=%.2fs", time.monotonic() - start)

Long database queries, locks, or missing indexes

Check PostgreSQL for slow or blocked queries:

SELECT pid, now() - query_start AS duration, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY query_start ASC;

If a route triggers many ORM queries, reduce query count first with select_related(), prefetch_related(), or better filtering. If one query is slow, inspect its execution plan and indexing.

External API calls blocking workers

If a Django view calls another service synchronously, set request timeouts. Do not let calls hang indefinitely.

import requests

response = requests.get(
    "https://api.example.com/data",
    timeout=(3.05, 10),
)

Use connect and read timeouts explicitly. Add retries carefully, and only for idempotent operations.

Insufficient workers or wrong worker class

Inspect the running process:

ps aux | grep gunicorn
systemctl status gunicorn
ss -ltnp | grep ':8000'

Review whether you are using too few workers for your traffic pattern, or a sync worker model while doing blocking I/O heavily.

CPU or memory pressure

Check host health:

top
free -m
vmstat 1
dmesg -T | grep -i -E 'killed process|out of memory|oom'

If the host is swapping heavily or the kernel is killing workers, the problem may not be Gunicorn timeout at all.

Large file handling or bad static/media routing

If Django is serving static or media files in production, fix that first. Static files should usually be served by Nginx or object storage, not by Django request workers.

Startup tasks, migrations, or heavy imports

If timeouts happen right after deploy or restart, check whether worker startup is slow because of:

heavy import side effects
app startup code doing network calls
migrations run in the wrong phase
cache warmups inside web startup

Background work inside web requests

If the request starts long-running work, move it to Celery, RQ, or another job system, then return a task status or completion callback path.

3. Inspect current Gunicorn runtime configuration

Review how Gunicorn is started. Common systemd example:

[Service]
User=django
Group=www-data
EnvironmentFile=/etc/myproject/gunicorn.env
WorkingDirectory=/srv/myproject/current
ExecStart=/srv/myproject/venv/bin/gunicorn \
    --bind 127.0.0.1:8000 \
    --workers 3 \
    --threads 2 \
    --timeout 30 \
    --graceful-timeout 30 \
    myproject.wsgi:application

Or a Gunicorn config file:

bind = "127.0.0.1:8000"
workers = 3
threads = 2
worker_class = "gthread"
timeout = 30
graceful_timeout = 30
max_requests = 1000
max_requests_jitter = 100

Check:

timeout
graceful_timeout
workers
threads
worker_class
bind address
environment file path
service user permissions

If you use an environment file, keep secrets readable only by the service user or root.

Verification check: document the current known-good config before editing it.

4. Fix application-level causes before raising timeout

Start with the slow endpoint, not the Gunicorn number.

Reduce ORM query count.
Add indexes for real slow queries.
Remove blocking network calls from request-response flow.
Move long-running work to a queue.
Serve static and media outside Django.

A production Nginx example should include complete proxy headers and TLS directives:

upstream django_app {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl http2;
    server_name example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

    location /static/ {
        alias /srv/myproject/shared/static/;
    }

    location /media/ {
        alias /srv/myproject/shared/media/;
    }

    location / {
        proxy_pass http://django_app;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        proxy_send_timeout 60s;
        proxy_redirect off;
    }
}

If Django is behind Nginx and you rely on forwarded HTTPS headers, make sure Django is configured to trust the proxy correctly:

# settings.py
SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")

If the route is legitimately long-running, redesign it before simply waiting longer.

5. Apply safe Gunicorn changes when needed

If a request is valid but sometimes exceeds 30 seconds, a measured timeout increase may be reasonable.

Example:

timeout = 60
graceful_timeout = 30
workers = 3
threads = 2
worker_class = "gthread"

Use caution:

sync workers are simple and predictable, but one blocked request ties up one worker.
gthread can help if requests spend time waiting on I/O.
async worker classes require compatible app behavior and should not be adopted casually during an incident.

Validate the exact config file you changed and keep a copy of the previous known-good version before restarting services.

Reload carefully:

sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo systemctl status gunicorn

If your unit supports reload safely, confirm it has ExecReload first:

systemctl cat gunicorn
sudo systemctl reload gunicorn

Rollback: if restart makes behavior worse, restore the previous config and restart Gunicorn again.

6. Align reverse proxy timeout settings

Review Nginx timeouts so they are intentional, not arbitrary:

location / {
    proxy_pass http://django_app;
    proxy_connect_timeout 5s;
    proxy_read_timeout 60s;
    proxy_send_timeout 60s;
}

Do not raise every timeout value together just to stop visible errors. That can turn a fast failure into a long outage per request.

A useful rule is:

Gunicorn timeout should reflect the longest acceptable in-app request time.
Nginx proxy_read_timeout should be compatible with that value.
long-running work should still be moved out of the web request path when possible

If you change Nginx, test and reload it explicitly:

sudo nginx -t
sudo systemctl reload nginx

7. Verify the fix

Re-test the affected endpoint:

curl -w "%{time_total}\n" -o /dev/null -s https://example.com/problem-endpoint/
curl -I https://example.com/health/

Then watch logs and host health:

sudo journalctl -u gunicorn -f
sudo tail -f /var/log/nginx/error.log
top
free -m

Validate:

timeout errors dropped
latency is acceptable
workers are not restarting repeatedly
CPU and memory stayed within limits
database connections and query times are healthy

If possible, test in staging first with controlled load.

8. Recovery and rollback

If the problem started after a deploy, roll back the release before broad timeout changes.

Safe recovery options:

Revert the Gunicorn config file, systemd override, or service unit you changed.
Revert the Nginx config too if proxy timeouts were changed during the incident.
Test changed configs before reloading services.
Restart or reload only the affected services.
Roll back the app release if the issue began after code changes.
Temporarily disable the heavy endpoint or traffic source if needed.

Example rollback flow:

sudo cp /etc/systemd/system/gunicorn.service.bak /etc/systemd/system/gunicorn.service
sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo nginx -t
sudo systemctl reload nginx

That path is only an example. In many deployments, the rollback target may instead be:

a Gunicorn config file
a systemd drop-in under /etc/systemd/system/gunicorn.service.d/
an application release symlink
an Nginx site file under /etc/nginx/sites-available/

Confirm the previous stable behavior before making more changes.

When to script this

If your team handles repeated timeout incidents, convert the manual checks into a small runbook script: collect Gunicorn and Nginx logs, capture CPU and memory state, verify endpoint latency, and compare current timeout settings. A standard Gunicorn systemd unit and Nginx template also reduce configuration drift between servers.

Explanation

Gunicorn kills workers that stay silent longer than the configured timeout. This protects the service from permanently stuck processes, but it also exposes bad request design quickly.

Raising timeout can be correct for a known valid workload, but it often hides deeper issues:

blocking database access
unbounded external API waits
too few workers
heavy file handling in Django
background work done inside a request

Concurrency choices matter too. With sync workers, a blocked request consumes the worker until completion or timeout. Threads can help when requests wait on I/O, but they do not fix expensive CPU-bound code. If your app regularly needs long-running processing, the better design is usually a queue worker plus status polling or async completion.

Timeout tuning should also match the reverse proxy and the rest of the stack. If Nginx waits 60 seconds but Gunicorn kills workers at 30, clients may see inconsistent failures. If both are set very high, users may wait too long while the server remains unhealthy.

Edge cases / notes

Timeout only during deploys or immediately after restart

Check for heavy imports, startup hooks, or app initialization making workers slow to become ready. Also verify migrations are not run inside web startup.

Timeout only on admin or report endpoints

These often contain expensive queries, exports, or aggregation logic. Treat them as candidates for background jobs.

Timeout only under traffic spikes

Look at worker count, memory limits, database pool exhaustion, and host CPU saturation. A healthy route can still time out under undersized capacity.

Timeout in Docker or Kubernetes

Check container memory limits, probe settings, and whether the platform is restarting the container before Gunicorn logs show a clear timeout. Do not assume Gunicorn is the first failure point.

Host-level OOM kills

If dmesg shows OOM events, fix memory pressure first. Gunicorn may appear unstable when the kernel is killing workers underneath it.

Internal links

For background, see What Gunicorn Workers Do in Django Production.
If you need a baseline app server setup, read Deploy Django with Gunicorn and Nginx.
For long-running tasks, use How to Configure Celery for Long-Running Django Tasks.
For incident response workflow, see How to Read Django, Gunicorn, and Nginx Logs During Production Incidents.