Gunicorn Worker Timeout in Django: Root Causes and Fixes
A Gunicorn worker timeout in Django production usually means a worker process stopped responding before Gunicorn’s timeout limit expired.
Problem statement
A Gunicorn worker timeout in Django production usually means a worker process stopped responding before Gunicorn’s timeout limit expired. In logs, this often appears as WORKER TIMEOUT followed by a worker restart.
That is not the same as:
- a reverse proxy timeout from Nginx or Caddy
- a slow PostgreSQL query or lock by itself
- a failed deploy where Gunicorn never started cleanly
- an OOM kill from the host kernel
The real deployment problem is that requests are hanging or taking too long somewhere in the request path. Raising the timeout blindly can hide bad application behavior and increase recovery time under load.
Quick answer
If you see a Gunicorn worker timeout in Django, do this first:
- Confirm the timeout is coming from Gunicorn, not only from Nginx.
- Find whether the problem affects one route or the whole app.
- Check for slow views, expensive queries, external API calls, wrong worker settings, memory pressure, or static/media requests hitting Django.
- Fix the bottleneck first.
- Only increase Gunicorn
timeoutfor known valid long requests, and keep proxy timeouts aligned. - Verify with logs, controlled test requests, and resource checks.
- Keep a rollback path for config and deploy changes.
Step-by-step solution
1. Confirm that the timeout is coming from Gunicorn
Check recent Gunicorn logs:
sudo journalctl -u gunicorn -n 100 --no-pager
sudo journalctl -u gunicorn -f
Look for messages like:
[CRITICAL] WORKER TIMEOUT (pid:12345)
[INFO] Booting worker with pid:12346
Then compare with Nginx logs:
sudo tail -f /var/log/nginx/error.log
sudo tail -f /var/log/nginx/access.log
If Nginx shows upstream failures around the same time, correlate the timestamp. A 504 from Nginx does not automatically mean Gunicorn timed out first.
Also test whether the issue is route-specific:
curl -I https://example.com/health/
curl -w "%{time_total}\n" -o /dev/null -s https://example.com/problem-endpoint/
If only one endpoint is slow, focus on application logic for that route. If all routes fail, check worker count, startup failures, CPU, memory, or database availability.
Verification check: confirm at least one of these is true before changing config:
- Gunicorn logs show
WORKER TIMEOUT - one or more requests stall until Gunicorn restarts a worker
- proxy logs clearly match Gunicorn failures by time and route
2. Triage common root causes
Slow Django views or blocking code
Look for views doing heavy work inline:
- report generation
- large queryset iteration
- file processing
- synchronous HTTP calls
- expensive template rendering
If DEBUG=False, you can still add targeted timing logs around known problem views:
import time
import logging
logger = logging.getLogger(__name__)
def report_view(request):
start = time.monotonic()
try:
# expensive logic
...
finally:
logger.info("report_view duration=%.2fs", time.monotonic() - start)
Long database queries, locks, or missing indexes
Check PostgreSQL for slow or blocked queries:
SELECT pid, now() - query_start AS duration, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY query_start ASC;
If a route triggers many ORM queries, reduce query count first with select_related(), prefetch_related(), or better filtering. If one query is slow, inspect its execution plan and indexing.
External API calls blocking workers
If a Django view calls another service synchronously, set request timeouts. Do not let calls hang indefinitely.
import requests
response = requests.get(
"https://api.example.com/data",
timeout=(3.05, 10),
)
Use connect and read timeouts explicitly. Add retries carefully, and only for idempotent operations.
Insufficient workers or wrong worker class
Inspect the running process:
ps aux | grep gunicorn
systemctl status gunicorn
ss -ltnp | grep ':8000'
Review whether you are using too few workers for your traffic pattern, or a sync worker model while doing blocking I/O heavily.
CPU or memory pressure
Check host health:
top
free -m
vmstat 1
dmesg -T | grep -i -E 'killed process|out of memory|oom'
If the host is swapping heavily or the kernel is killing workers, the problem may not be Gunicorn timeout at all.
Large file handling or bad static/media routing
If Django is serving static or media files in production, fix that first. Static files should usually be served by Nginx or object storage, not by Django request workers.
Startup tasks, migrations, or heavy imports
If timeouts happen right after deploy or restart, check whether worker startup is slow because of:
- heavy import side effects
- app startup code doing network calls
- migrations run in the wrong phase
- cache warmups inside web startup
Background work inside web requests
If the request starts long-running work, move it to Celery, RQ, or another job system, then return a task status or completion callback path.
3. Inspect current Gunicorn runtime configuration
Review how Gunicorn is started. Common systemd example:
[Service]
User=django
Group=www-data
EnvironmentFile=/etc/myproject/gunicorn.env
WorkingDirectory=/srv/myproject/current
ExecStart=/srv/myproject/venv/bin/gunicorn \
--bind 127.0.0.1:8000 \
--workers 3 \
--threads 2 \
--timeout 30 \
--graceful-timeout 30 \
myproject.wsgi:application
Or a Gunicorn config file:
bind = "127.0.0.1:8000"
workers = 3
threads = 2
worker_class = "gthread"
timeout = 30
graceful_timeout = 30
max_requests = 1000
max_requests_jitter = 100
Check:
timeoutgraceful_timeoutworkersthreadsworker_class- bind address
- environment file path
- service user permissions
If you use an environment file, keep secrets readable only by the service user or root.
Verification check: document the current known-good config before editing it.
4. Fix application-level causes before raising timeout
Start with the slow endpoint, not the Gunicorn number.
- Reduce ORM query count.
- Add indexes for real slow queries.
- Remove blocking network calls from request-response flow.
- Move long-running work to a queue.
- Serve static and media outside Django.
A production Nginx example should include complete proxy headers and TLS directives:
upstream django_app {
server 127.0.0.1:8000;
}
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
location /static/ {
alias /srv/myproject/shared/static/;
}
location /media/ {
alias /srv/myproject/shared/media/;
}
location / {
proxy_pass http://django_app;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Port $server_port;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
proxy_redirect off;
}
}
If Django is behind Nginx and you rely on forwarded HTTPS headers, make sure Django is configured to trust the proxy correctly:
# settings.py
SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")
If the route is legitimately long-running, redesign it before simply waiting longer.
5. Apply safe Gunicorn changes when needed
If a request is valid but sometimes exceeds 30 seconds, a measured timeout increase may be reasonable.
Example:
timeout = 60
graceful_timeout = 30
workers = 3
threads = 2
worker_class = "gthread"
Use caution:
syncworkers are simple and predictable, but one blocked request ties up one worker.gthreadcan help if requests spend time waiting on I/O.- async worker classes require compatible app behavior and should not be adopted casually during an incident.
Validate the exact config file you changed and keep a copy of the previous known-good version before restarting services.
Reload carefully:
sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo systemctl status gunicorn
If your unit supports reload safely, confirm it has ExecReload first:
systemctl cat gunicorn
sudo systemctl reload gunicorn
Rollback: if restart makes behavior worse, restore the previous config and restart Gunicorn again.
6. Align reverse proxy timeout settings
Review Nginx timeouts so they are intentional, not arbitrary:
location / {
proxy_pass http://django_app;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
Do not raise every timeout value together just to stop visible errors. That can turn a fast failure into a long outage per request.
A useful rule is:
- Gunicorn
timeoutshould reflect the longest acceptable in-app request time. - Nginx
proxy_read_timeoutshould be compatible with that value. - long-running work should still be moved out of the web request path when possible
If you change Nginx, test and reload it explicitly:
sudo nginx -t
sudo systemctl reload nginx
7. Verify the fix
Re-test the affected endpoint:
curl -w "%{time_total}\n" -o /dev/null -s https://example.com/problem-endpoint/
curl -I https://example.com/health/
Then watch logs and host health:
sudo journalctl -u gunicorn -f
sudo tail -f /var/log/nginx/error.log
top
free -m
Validate:
- timeout errors dropped
- latency is acceptable
- workers are not restarting repeatedly
- CPU and memory stayed within limits
- database connections and query times are healthy
If possible, test in staging first with controlled load.
8. Recovery and rollback
If the problem started after a deploy, roll back the release before broad timeout changes.
Safe recovery options:
- Revert the Gunicorn config file,
systemdoverride, or service unit you changed. - Revert the Nginx config too if proxy timeouts were changed during the incident.
- Test changed configs before reloading services.
- Restart or reload only the affected services.
- Roll back the app release if the issue began after code changes.
- Temporarily disable the heavy endpoint or traffic source if needed.
Example rollback flow:
sudo cp /etc/systemd/system/gunicorn.service.bak /etc/systemd/system/gunicorn.service
sudo systemctl daemon-reload
sudo systemctl restart gunicorn
sudo nginx -t
sudo systemctl reload nginx
That path is only an example. In many deployments, the rollback target may instead be:
- a Gunicorn config file
- a
systemddrop-in under/etc/systemd/system/gunicorn.service.d/ - an application release symlink
- an Nginx site file under
/etc/nginx/sites-available/
Confirm the previous stable behavior before making more changes.
When to script this
If your team handles repeated timeout incidents, convert the manual checks into a small runbook script: collect Gunicorn and Nginx logs, capture CPU and memory state, verify endpoint latency, and compare current timeout settings. A standard Gunicorn systemd unit and Nginx template also reduce configuration drift between servers.
Explanation
Gunicorn kills workers that stay silent longer than the configured timeout. This protects the service from permanently stuck processes, but it also exposes bad request design quickly.
Raising timeout can be correct for a known valid workload, but it often hides deeper issues:
- blocking database access
- unbounded external API waits
- too few workers
- heavy file handling in Django
- background work done inside a request
Concurrency choices matter too. With sync workers, a blocked request consumes the worker until completion or timeout. Threads can help when requests wait on I/O, but they do not fix expensive CPU-bound code. If your app regularly needs long-running processing, the better design is usually a queue worker plus status polling or async completion.
Timeout tuning should also match the reverse proxy and the rest of the stack. If Nginx waits 60 seconds but Gunicorn kills workers at 30, clients may see inconsistent failures. If both are set very high, users may wait too long while the server remains unhealthy.
Edge cases / notes
Timeout only during deploys or immediately after restart
Check for heavy imports, startup hooks, or app initialization making workers slow to become ready. Also verify migrations are not run inside web startup.
Timeout only on admin or report endpoints
These often contain expensive queries, exports, or aggregation logic. Treat them as candidates for background jobs.
Timeout only under traffic spikes
Look at worker count, memory limits, database pool exhaustion, and host CPU saturation. A healthy route can still time out under undersized capacity.
Timeout in Docker or Kubernetes
Check container memory limits, probe settings, and whether the platform is restarting the container before Gunicorn logs show a clear timeout. Do not assume Gunicorn is the first failure point.
Host-level OOM kills
If dmesg shows OOM events, fix memory pressure first. Gunicorn may appear unstable when the kernel is killing workers underneath it.
Internal links
For background, see What Gunicorn Workers Do in Django Production.
If you need a baseline app server setup, read Deploy Django with Gunicorn and Nginx.
For long-running tasks, use How to Configure Celery for Long-Running Django Tasks.
For incident response workflow, see How to Read Django, Gunicorn, and Nginx Logs During Production Incidents.
FAQ
What does WORKER TIMEOUT mean in Gunicorn for Django?
It means a Gunicorn worker stopped responding within the configured timeout window, so Gunicorn killed and replaced it.
Should I just increase the Gunicorn timeout value?
Usually no. First confirm whether the request is slow because of app code, database queries, external APIs, worker starvation, or host pressure. Increase timeout only for known valid long-running requests.
Why does Gunicorn timeout happen only on one Django endpoint?
That usually points to route-specific logic such as expensive queries, file generation, blocking API calls, or report-style processing in the request path.
Can Nginx cause what looks like a Gunicorn worker timeout?
Yes. Nginx can return upstream timeout errors even when Gunicorn itself is not logging WORKER TIMEOUT. Always compare both log sources before changing settings.
When should long-running Django work move to a background job system?
When the task is not required to finish before returning the HTTP response, or when it regularly approaches app server timeout limits. Report generation, imports, exports, and third-party API workflows are common examples.