Fix Database Connection Errors in Django Production
Database connection errors in Django production usually appear right after a deploy, after a restart, or after an infrastructure change such as rotating secrets, moving PostgreS...
Problem statement
Database connection errors in Django production usually appear right after a deploy, after a restart, or after an infrastructure change such as rotating secrets, moving PostgreSQL, enabling SSL, or changing firewall rules. The app may fail at startup, return 500 errors under traffic, or only break on database-backed views.
Common production symptoms include:
psycopg2.OperationalError: could not connect to serverconnection refusedconnection timed outpassword authentication failed for usercould not translate host name- SSL errors such as
no pg_hba.conf entry ... SSL off - intermittent failures from connection exhaustion
The main risk is changing Django settings too early without proving whether the problem is credentials, DNS, networking, PostgreSQL readiness, SSL requirements, or connection limits.
Quick answer
Check these in order:
- identify the exact error from logs
- confirm Django is reading the expected production database settings
- test DNS and TCP reachability from the app host or container
- test a direct PostgreSQL login with the same host, user, database, and SSL mode
- verify PostgreSQL is listening, allows your source IP, and has available connections
Do the checks from the actual app runtime environment before editing DATABASES.
Step-by-step solution
Identify the exact database connection error
Start with logs. Do not guess from the browser error page.
Check Django and Gunicorn/Uvicorn logs
For systemd-managed Gunicorn:
journalctl -u gunicorn -n 100 --no-pager
For Docker:
docker logs <container_name> --tail 100
If PostgreSQL is self-hosted and managed by systemd:
journalctl -u postgresql -n 100 --no-pager || journalctl -u "postgresql*" -n 100 --no-pager
Classify the error:
- refused: host reachable, but nothing accepts connections on that port
- timed out: packets are blocked or the route is broken
- authentication failed: username/password is wrong, or access is denied
- DNS failure: hostname is wrong or not resolvable from production
- SSL error: provider requires TLS or certificate validation is wrong
Also note whether the failure happens:
- only at app startup
- only during migrations
- only during live traffic
- intermittently after running for a while
That distinction matters later.
Verification check: save the exact error string before changing anything.
Confirm Django is using the expected production database settings
A common cause is that the process is not loading the expected production environment variables.
Inspect DATABASES configuration safely
A typical Django PostgreSQL config:
import os
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": os.environ["DB_NAME"],
"USER": os.environ["DB_USER"],
"PASSWORD": os.environ["DB_PASSWORD"],
"HOST": os.environ["DB_HOST"],
"PORT": os.environ.get("DB_PORT", "5432"),
"CONN_MAX_AGE": int(os.environ.get("DB_CONN_MAX_AGE", "60")),
"OPTIONS": {
"sslmode": os.environ.get("DB_SSLMODE", "require"), # set explicitly per environment
},
}
}
Check where those values come from.
For systemd:
[Service]
EnvironmentFile=/etc/myapp/myapp.env
For Docker Compose:
services:
web:
environment:
DB_NAME: appdb
DB_USER: appuser
DB_PASSWORD: ${DB_PASSWORD}
DB_HOST: db.example.internal
DB_PORT: "5432"
DB_SSLMODE: require
Verify the running process sees the expected non-secret values:
systemctl cat gunicorn
systemctl show gunicorn --property=EnvironmentFiles
Inside a container:
docker exec -it <app_container> sh
env | grep -E '^(DB_NAME|DB_USER|DB_HOST|DB_PORT|DB_SSLMODE)='
Do not print passwords into shared logs or paste them into tickets.
Verification check: confirm host, port, database name, username, and SSL mode match your intended production database.
Rollback note: if you edit env files, keep a copy of the last known-good version before restarting services.
Test connectivity from the application host
Test from the app host, not your laptop.
Check DNS resolution
getent hosts db.example.internal
If this fails, the hostname is wrong or not resolvable in that environment.
Test TCP reachability
nc -vz db.example.internal 5432
If you get succeeded, the port is reachable. If it times out, look at firewall or routing. If it is refused, PostgreSQL may not be listening on that interface.
Use psql with the same parameters
Set the password only in the current shell:
export PGPASSWORD='your-password'
psql "host=db.example.internal port=5432 dbname=appdb user=appuser sslmode=require"
unset PGPASSWORD
Then run:
SELECT current_database(), current_user;
If you use Docker, run the test inside the container:
docker exec -it <app_container> sh
Then the same getent, nc, and psql checks.
Verification check: if psql fails with the same error Django shows, the problem is below Django.
Verify the database server is accepting connections
If the app host cannot connect, inspect PostgreSQL.
Check PostgreSQL status and readiness
On the database host for a self-hosted database:
systemctl status postgresql
Run pg_isready from the app host or container when testing network reachability; run it on the DB host only when validating local service readiness.
pg_isready -h <db_host> -p 5432
Confirm PostgreSQL is listening correctly
On the database host:
ss -ltn | grep 5432
If you need to map the socket to a process, run the same command with elevated privileges.
Check postgresql.conf for a deliberate listen address, for example:
listen_addresses = '10.0.0.10,127.0.0.1'
port = 5432
Avoid listen_addresses='*' unless the database is restricted by private networking and firewall rules.
Check access rules
For self-hosted PostgreSQL, review pg_hba.conf. Example:
host appdb appuser 10.0.0.0/24 scram-sha-256
For managed databases, review network access rules, security groups, VPC rules, or allowlists.
Verification check: after any access rule change, rerun pg_isready and the psql login from the app environment.
Rollback note: document any firewall or pg_hba.conf change so it can be reversed if you widened access too far.
Fix credential and authentication errors
If logs show authentication failure:
- confirm the username is correct
- confirm the password was rotated everywhere
- confirm the user can access the target database
Special characters often break .env parsing or shell commands if quoted incorrectly. Keep environment files simple and avoid trailing spaces.
Re-test with psql, but avoid storing the password in shell history. Prefer a temporary PGPASSWORD export in the current session rather than passing the secret inline in a saved command.
If needed, validate access from PostgreSQL:
SELECT current_database(), current_user;
If the login works to one database but not another, the user or database mapping is wrong.
Fix SSL and managed database connection issues
Managed PostgreSQL often requires SSL. A frequent production case is correct host and credentials, but missing sslmode.
Example Django config:
"OPTIONS": {
"sslmode": "require",
}
If your provider requires certificate verification:
"OPTIONS": {
"sslmode": "verify-full",
"sslrootcert": "/etc/ssl/certs/db-ca.pem",
}
Make sure the CA file exists and is readable by the app user.
If your provider requires client certificates, add the provider-specific sslcert and sslkey settings as documented for libpq and your PostgreSQL service.
Verification check: rerun psql with the same sslmode and certificate settings before restarting the app.
Fix connection limit and stale connection problems
If the app works briefly and then fails, check for exhausted connections.
In PostgreSQL:
SELECT count(*) FROM pg_stat_activity;
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
SHOW max_connections;
Review Django and Gunicorn together:
- each worker process can open database connections
- higher
CONN_MAX_AGEcan reduce reconnect churn, but also keeps connections open longer - too many workers against a small database can exhaust
max_connections
If Gunicorn is set too high for database capacity, reduce workers or add a pooler such as PgBouncer where appropriate.
This problem is common after scaling app workers without scaling database capacity.
Fix startup ordering and deploy timing issues
Sometimes the database is healthy, but the app starts before it is reachable.
Typical cases:
- PostgreSQL service still starting
- containerized database not yet ready
- migrations run before connectivity exists
- app health checks begin too early
A safe restart sequence is:
- confirm database readiness with
pg_isready - confirm direct login with
psql - restart the app service
Examples:
systemctl restart gunicorn
or
docker compose restart web
Avoid repeated blind restarts during an incident, especially with many app workers, because reconnect storms can make connection-limit problems worse.
If the issue started after deploy, verify whether a new release changed env loading, startup commands, or migration timing.
Also separate connectivity failures from schema problems: if the database connection succeeds but requests fail after deploy, check whether migrations were applied correctly before treating it as a connection issue.
Verify the fix safely in production
Once the direct database test passes, verify from Django with a minimal read-only check.
Minimal query:
python manage.py shell -c "from django.db import connection; cursor = connection.cursor(); cursor.execute('SELECT 1'); print(cursor.fetchone())"
Optional database shell check, only if the runtime has the PostgreSQL client installed and the app environment is loaded correctly:
python manage.py dbshell
Additional deploy checks:
python manage.py check --deploy
Then test one real read-only app path or health endpoint if available. Watch logs and error rate for several minutes to catch intermittent failures.
Explanation
This runbook works because it isolates the layers in the order they fail:
- application logs identify the error class
- runtime config proves what Django is actually using
- host-level tests separate Django problems from network problems
- PostgreSQL checks confirm service readiness and access control
- capacity checks catch issues that only appear under worker load
The main alternative is jumping straight into Django settings changes, which often hides the real cause. For example, changing HOST will not fix a security group block, and changing passwords will not fix required SSL.
When to automate this
If your team repeats these checks during every deploy or incident, convert them into a reusable script or deployment template. Good first candidates are environment validation, pg_isready checks, a psql smoke test, and post-deploy health checks that confirm a database-backed endpoint works. That reduces risky manual changes during outages.
Edge cases and notes
- IPv4 vs IPv6: a hostname may resolve to IPv6 first in production, while PostgreSQL only listens on IPv4.
- Unix socket vs TCP: local environments may connect through a socket, but production uses TCP with different auth rules.
- Read replica mistakes: the app may be pointed at a replica that rejects writes or has different access controls.
- Container service names:
dbmay work in Compose locally but not in another network or orchestration setup. - Idle timeout layers: NAT gateways, proxies, or managed network layers can drop long-lived idle connections.
- Migrations after deploy: if the app boots before migrations finish, some errors may look like connection problems but are actually schema mismatches. Check both.
- Minimal container images:
psql,nc, ordbshellmay be missing even when database connectivity is fine. Install the client tools or test from a debug container in the same network. - Keep DB changes minimal during incident response: avoid broad PostgreSQL tuning changes until basic connectivity is stable.
Internal links
For safe secret and environment handling, see Django production environment variables.
If you need the full stack wiring, review deploy Django with PostgreSQL, Gunicorn, and Nginx and configure PostgreSQL for Django production.
If the issue started right after a release, use rollback a Django deployment safely to revert app or config changes without guessing.
For broader release hardening, review the Django deployment checklist for production.
FAQ
Why does Django connect locally but fail in production?
Local setups often use different hostnames, socket connections, relaxed auth, no SSL, or no firewall restrictions. Production usually adds remote networking, stricter access rules, secret loading, and managed database TLS requirements.
What does “connection refused” vs “timeout” mean?
connection refused usually means the host is reachable but nothing is listening on that port, or the service rejects the connection immediately. timeout usually means packets are being dropped by a firewall, route, or network policy.
Should I set CONN_MAX_AGE to a high value?
Not automatically. Persistent connections can reduce reconnect overhead, but they also keep connections open longer. Set it based on database capacity, worker count, and any idle timeout behavior in your network path.
How do I test database access without exposing credentials?
Run the test from the app host, export PGPASSWORD only in the current shell, use psql with the same connection parameters as Django, and unset the variable afterward. Avoid echoing secrets into logs, screenshots, shared terminals, or shell history.
When should I add a connection pooler?
Add a pooler when worker count, concurrency, or many short-lived requests make PostgreSQL connection limits a recurring issue. It is usually more useful after you have confirmed the problem is connection pressure rather than DNS, auth, or SSL misconfiguration.