Geo secondary replicate-geo-database command fails after 30 minutes
Overview
The gitlab-ctl replicate-geo-database
command repeatedly fails part way through copying the database from the primary instance to the secondary instance after running for approximately 30 minutes.
Description
The gitlab-ctl replicate-geo-database
command is run on a Geo secondary to initialize a read-only database replica from the primary database. One of the steps performed by this command is to run the PostgreSQL pg_basebackup
command, and depending on the size of the database and the speed of the network between sites this may take several hours to complete.
By default the gitlab-ctl replicate-geo-database
command will timeout after 30 minutes and fail with a non-specific error, leaving the secondary database in a partially copied and unusable state.
Impacted offerings:
- GitLab Self-Managed
Resolution
Include the --backup-timeout=<timeout_seconds>
option with the gitlab-ctl replicate-geo-database
command to extend the default timeout an allow the copy of the database time to complete.
For example, setting the timeout to 12 hours:
sudo gitlab-ctl \ replicate-geo-database \ --host=<primary_node_hostname> \ --slot-name=<secondary_slot_name> \ --sslmode=verify-ca \ --force \ --skip-backup \ --backup-timeout=43200
Cause
The default timeout of 30 minutes (1800 seconds) does not cater for larger databases and/or slower network connections between primary and secondary sites.
Symptom
The gitlab-ctl replicate-geo-database
command fails part way through the pg_basebackup
step at around the 30 minute mark, with an error similar to:
1238492/6876734 kB (18%), 0/1 tablespace (...postgresql/data/base/16401/20498) ---- End outout of PGPASSFILE=/var/opt/gitlab/postgresql/.pgpass /opt/gitlab/embedded/bin/pg_basebackup -h <primary_ip> -p 5432 -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -P -X stream -S <slot> ---- Ran PGPASSFILE=/var/opt/gitlab/postgresql/.pgpass /opt/gitlab/embedded/bin/pg_basebackup -h <primary_ip> -p 5432 -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -P -X stream -S <slot> returned from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/mixlib-shellout-3.2.7/lib/mixlib/shellout.rb:270:in `run_command'