SUMMARY
This error occurs when a managing system cannot communicate with one of its sources or managed systems.
ISSUE
This issue appears in two forms:
When clicking on a system in the navigation pane, the error message pops up "Database on system with hostname ______ is not responding"
Alternatively, in the navigation pane of the legacy UI, you may see an '!' next to the system. Hovering over the system causes a tooltip to pop-up stating "This system was incompletely loaded".
RESOLUTION
- From the command line, ping the source system to verify that the target is able to communicate with the source.
- Verify that the source system is listed as "is_active" in the target's database. This command will automatically correct the database if the source is listed as inactive.
psql -U postgres bpdb -c "update bp.systems set is_active='t'"
- Verify that you can communicate with the source system database, and that the source system has the target system listed as a manager. (HOSTNAME in the sample below should be replaced by the actual hostname of the managed system)
psql -U postgres bpdb -h HOSTNAME -c "select * from bp.managers"
If you cannot communicate with the source database after all steps have been completed, then there is likely an error in /usr/bp/data/pg_hba.conf on the source, or /usr/bp/data/pg_service.conf on the target.
Only the last part of pg_hba.conf is configurable, and should look like this:
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
host bpdb +bpexch,wguest 0.0.0.0/0 md5
hostssl bpdb postgres 172.17.3.1/32 trust
These lines are required at minimum but there may also be additional entries.
pg_service.conf should appear as follows:
[localhost]
user=postgres
connect_timeout=5
[connpooldb]
user=postgres
dbname=pgbouncer
host=localhost
port=6432
connect_timeout=5
[upsilon]
user=postgres
host=localhost
port=6432
sslmode=prefer
connect_timeout=30
[HOSTNAME]
user=postgres
host=HOSTNAME
connect_timeout=3
sslmode=prefer
Where HOSTNAME is again replaced by the hostname of the managed system. Make any needed corrections to these files, restart the database, and then repeat step 2 to resolve the issue.
CAUSE
This error occurs when communication is interrupted between the two systems, often due to network outages.
Damage to the pg_hba and pg_service files is usually caused by an improper dump and reload of the database.