High number of 500 errors after zero-downtime upgrade
Description
- High volume of 500 errors occurring after zero-downtime upgrade of GitLab between 2 or more minor releases
- Logs show
PG::UndefinedColumn
orPG::UndefinedTable
ERROR:...does not exist
Environment
-
Impacted offerings:
- GitLab Self-Managed
-
Impacted versions:
- Any zero-downtime upgrade that skips a minor version
Solution
Cause
Zero-downtime upgrades can only be done one minor release at a time. As stated in documented requirements and considerations, you can only upgrade one minor release at a time. If you skip releases, database modifications may be run in the wrong sequence and leave the database schema in a broken state.
Additional information
You can trace the errors in the logs using the correlation ID returned in the UI. Current errors will be logged in /var/log/gitlab/gitlab-rails/application_json.log
.
For example, if a zero-downtime upgrade was performed from 17.3.5 to 17.5.3
, you would encounter an error related to the application_settings.sign_in_text_html
column which was
removed in 17.4
: gitlab-org/gitlab!161594 (diffs).
The error in the log would show:
{"severity":"WARN","time":"YYYY-MM-DDTHH:MM:SSZ","correlation_id":"01JEGT5324V0RNYWHB048BHXNG","message":"Cached record for ApplicationSetting couldn't be loaded, falling back to uncached record: PG::UndefinedColumn: ERROR: column application_settings.sign_in_text_html does not exist\nLINE 1: ...ol\", \"application_settings\".\"usage_ping_enabled\", \"applicati...\n ^\n"}