Fix Database Connection Errors in Django Production

Database connection errors in Django production usually appear right after a deploy, after a restart, or after an infrastructure change such as rotating secrets, moving PostgreS...

Problem statement

Database connection errors in Django production usually appear right after a deploy, after a restart, or after an infrastructure change such as rotating secrets, moving PostgreSQL, enabling SSL, or changing firewall rules. The app may fail at startup, return 500 errors under traffic, or only break on database-backed views.

Common production symptoms include:

psycopg2.OperationalError: could not connect to server
connection refused
connection timed out
password authentication failed for user
could not translate host name
SSL errors such as no pg_hba.conf entry ... SSL off
intermittent failures from connection exhaustion

The main risk is changing Django settings too early without proving whether the problem is credentials, DNS, networking, PostgreSQL readiness, SSL requirements, or connection limits.

Quick answer

Check these in order:

identify the exact error from logs
confirm Django is reading the expected production database settings
test DNS and TCP reachability from the app host or container
test a direct PostgreSQL login with the same host, user, database, and SSL mode
verify PostgreSQL is listening, allows your source IP, and has available connections

Do the checks from the actual app runtime environment before editing DATABASES.

Step-by-step solution

Identify the exact database connection error

Start with logs. Do not guess from the browser error page.

Check Django and Gunicorn/Uvicorn logs

For systemd-managed Gunicorn:

journalctl -u gunicorn -n 100 --no-pager

For Docker:

docker logs <container_name> --tail 100

If PostgreSQL is self-hosted and managed by systemd:

journalctl -u postgresql -n 100 --no-pager || journalctl -u "postgresql*" -n 100 --no-pager

Classify the error:

refused: host reachable, but nothing accepts connections on that port
timed out: packets are blocked or the route is broken
authentication failed: username/password is wrong, or access is denied
DNS failure: hostname is wrong or not resolvable from production
SSL error: provider requires TLS or certificate validation is wrong

Also note whether the failure happens:

only at app startup
only during migrations
only during live traffic
intermittently after running for a while

That distinction matters later.

Verification check: save the exact error string before changing anything.

Confirm Django is using the expected production database settings

A common cause is that the process is not loading the expected production environment variables.

Inspect `DATABASES` configuration safely

A typical Django PostgreSQL config:

import os

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": os.environ["DB_NAME"],
        "USER": os.environ["DB_USER"],
        "PASSWORD": os.environ["DB_PASSWORD"],
        "HOST": os.environ["DB_HOST"],
        "PORT": os.environ.get("DB_PORT", "5432"),
        "CONN_MAX_AGE": int(os.environ.get("DB_CONN_MAX_AGE", "60")),
        "OPTIONS": {
            "sslmode": os.environ.get("DB_SSLMODE", "require"),  # set explicitly per environment
        },
    }
}

Check where those values come from.

For systemd:

[Service]
EnvironmentFile=/etc/myapp/myapp.env

For Docker Compose:

services:
  web:
    environment:
      DB_NAME: appdb
      DB_USER: appuser
      DB_PASSWORD: ${DB_PASSWORD}
      DB_HOST: db.example.internal
      DB_PORT: "5432"
      DB_SSLMODE: require

Verify the running process sees the expected non-secret values:

systemctl cat gunicorn
systemctl show gunicorn --property=EnvironmentFiles

Inside a container:

docker exec -it <app_container> sh
env | grep -E '^(DB_NAME|DB_USER|DB_HOST|DB_PORT|DB_SSLMODE)='

Do not print passwords into shared logs or paste them into tickets.

Verification check: confirm host, port, database name, username, and SSL mode match your intended production database.

Rollback note: if you edit env files, keep a copy of the last known-good version before restarting services.

Test connectivity from the application host

Test from the app host, not your laptop.

Check DNS resolution

getent hosts db.example.internal

If this fails, the hostname is wrong or not resolvable in that environment.

Test TCP reachability

nc -vz db.example.internal 5432

If you get succeeded, the port is reachable. If it times out, look at firewall or routing. If it is refused, PostgreSQL may not be listening on that interface.

Use `psql` with the same parameters

Set the password only in the current shell:

export PGPASSWORD='your-password'
psql "host=db.example.internal port=5432 dbname=appdb user=appuser sslmode=require"
unset PGPASSWORD

Then run:

SELECT current_database(), current_user;

If you use Docker, run the test inside the container:

docker exec -it <app_container> sh

Then the same getent, nc, and psql checks.

Verification check: if psql fails with the same error Django shows, the problem is below Django.

Verify the database server is accepting connections

If the app host cannot connect, inspect PostgreSQL.

Check PostgreSQL status and readiness

On the database host for a self-hosted database:

systemctl status postgresql

Run pg_isready from the app host or container when testing network reachability; run it on the DB host only when validating local service readiness.

pg_isready -h <db_host> -p 5432

Confirm PostgreSQL is listening correctly

On the database host:

ss -ltn | grep 5432

If you need to map the socket to a process, run the same command with elevated privileges.

Check postgresql.conf for a deliberate listen address, for example:

listen_addresses = '10.0.0.10,127.0.0.1'
port = 5432

Avoid listen_addresses='*' unless the database is restricted by private networking and firewall rules.

Check access rules

For self-hosted PostgreSQL, review pg_hba.conf. Example:

host    appdb    appuser    10.0.0.0/24    scram-sha-256

For managed databases, review network access rules, security groups, VPC rules, or allowlists.

Verification check: after any access rule change, rerun pg_isready and the psql login from the app environment.

Rollback note: document any firewall or pg_hba.conf change so it can be reversed if you widened access too far.

Fix credential and authentication errors

If logs show authentication failure:

confirm the username is correct
confirm the password was rotated everywhere
confirm the user can access the target database

Special characters often break .env parsing or shell commands if quoted incorrectly. Keep environment files simple and avoid trailing spaces.

Re-test with psql, but avoid storing the password in shell history. Prefer a temporary PGPASSWORD export in the current session rather than passing the secret inline in a saved command.

If needed, validate access from PostgreSQL:

SELECT current_database(), current_user;

If the login works to one database but not another, the user or database mapping is wrong.

Fix SSL and managed database connection issues

Managed PostgreSQL often requires SSL. A frequent production case is correct host and credentials, but missing sslmode.

Example Django config:

"OPTIONS": {
    "sslmode": "require",
}

If your provider requires certificate verification:

"OPTIONS": {
    "sslmode": "verify-full",
    "sslrootcert": "/etc/ssl/certs/db-ca.pem",
}

Make sure the CA file exists and is readable by the app user.

If your provider requires client certificates, add the provider-specific sslcert and sslkey settings as documented for libpq and your PostgreSQL service.

Verification check: rerun psql with the same sslmode and certificate settings before restarting the app.

Fix connection limit and stale connection problems

If the app works briefly and then fails, check for exhausted connections.

In PostgreSQL:

SELECT count(*) FROM pg_stat_activity;
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
SHOW max_connections;

Review Django and Gunicorn together:

each worker process can open database connections
higher CONN_MAX_AGE can reduce reconnect churn, but also keeps connections open longer
too many workers against a small database can exhaust max_connections

If Gunicorn is set too high for database capacity, reduce workers or add a pooler such as PgBouncer where appropriate.

This problem is common after scaling app workers without scaling database capacity.

Fix startup ordering and deploy timing issues

Sometimes the database is healthy, but the app starts before it is reachable.

Typical cases:

PostgreSQL service still starting
containerized database not yet ready
migrations run before connectivity exists
app health checks begin too early

A safe restart sequence is:

confirm database readiness with pg_isready
confirm direct login with psql
restart the app service

Examples:

systemctl restart gunicorn

docker compose restart web

Avoid repeated blind restarts during an incident, especially with many app workers, because reconnect storms can make connection-limit problems worse.

If the issue started after deploy, verify whether a new release changed env loading, startup commands, or migration timing.

Also separate connectivity failures from schema problems: if the database connection succeeds but requests fail after deploy, check whether migrations were applied correctly before treating it as a connection issue.

Verify the fix safely in production

Once the direct database test passes, verify from Django with a minimal read-only check.

Minimal query:

python manage.py shell -c "from django.db import connection; cursor = connection.cursor(); cursor.execute('SELECT 1'); print(cursor.fetchone())"

Optional database shell check, only if the runtime has the PostgreSQL client installed and the app environment is loaded correctly:

python manage.py dbshell

Additional deploy checks:

python manage.py check --deploy

Then test one real read-only app path or health endpoint if available. Watch logs and error rate for several minutes to catch intermittent failures.

Explanation

This runbook works because it isolates the layers in the order they fail:

application logs identify the error class
runtime config proves what Django is actually using
host-level tests separate Django problems from network problems
PostgreSQL checks confirm service readiness and access control
capacity checks catch issues that only appear under worker load

The main alternative is jumping straight into Django settings changes, which often hides the real cause. For example, changing HOST will not fix a security group block, and changing passwords will not fix required SSL.

When to automate this

If your team repeats these checks during every deploy or incident, convert them into a reusable script or deployment template. Good first candidates are environment validation, pg_isready checks, a psql smoke test, and post-deploy health checks that confirm a database-backed endpoint works. That reduces risky manual changes during outages.

Edge cases and notes

IPv4 vs IPv6: a hostname may resolve to IPv6 first in production, while PostgreSQL only listens on IPv4.
Unix socket vs TCP: local environments may connect through a socket, but production uses TCP with different auth rules.
Read replica mistakes: the app may be pointed at a replica that rejects writes or has different access controls.
Container service names: db may work in Compose locally but not in another network or orchestration setup.
Idle timeout layers: NAT gateways, proxies, or managed network layers can drop long-lived idle connections.
Migrations after deploy: if the app boots before migrations finish, some errors may look like connection problems but are actually schema mismatches. Check both.
Minimal container images: psql, nc, or dbshell may be missing even when database connectivity is fine. Install the client tools or test from a debug container in the same network.
Keep DB changes minimal during incident response: avoid broad PostgreSQL tuning changes until basic connectivity is stable.

Internal links

For safe secret and environment handling, see Django production environment variables.

If you need the full stack wiring, review deploy Django with PostgreSQL, Gunicorn, and Nginx and configure PostgreSQL for Django production.

If the issue started right after a release, use rollback a Django deployment safely to revert app or config changes without guessing.

For broader release hardening, review the Django deployment checklist for production.

FAQ

Why does Django connect locally but fail in production?

Local setups often use different hostnames, socket connections, relaxed auth, no SSL, or no firewall restrictions. Production usually adds remote networking, stricter access rules, secret loading, and managed database TLS requirements.

What does “connection refused” vs “timeout” mean?

connection refused usually means the host is reachable but nothing is listening on that port, or the service rejects the connection immediately. timeout usually means packets are being dropped by a firewall, route, or network policy.

Should I set `CONN_MAX_AGE` to a high value?

Not automatically. Persistent connections can reduce reconnect overhead, but they also keep connections open longer. Set it based on database capacity, worker count, and any idle timeout behavior in your network path.

How do I test database access without exposing credentials?

Run the test from the app host, export PGPASSWORD only in the current shell, use psql with the same connection parameters as Django, and unset the variable afterward. Avoid echoing secrets into logs, screenshots, shared terminals, or shell history.

When should I add a connection pooler?

Add a pooler when worker count, concurrency, or many short-lived requests make PostgreSQL connection limits a recurring issue. It is usually more useful after you have confirmed the problem is connection pressure rather than DNS, auth, or SSL misconfiguration.