GitLab runners not picking up jobs after upgrade to GitLab 17.11.x
Description
- After upgrading to GitLab 17.11.x, registered runners stop picking up jobs despite showing as online.
- CI/CD pipeline jobs remain in `Pending` status indefinitely.
- The following error appears in the PostgreSQL logs:
ERROR: duplicate key value violates unique constraint "index_ci_runner_machines_on_runner_id_and_system_xid" DETAIL: Key (runner_id, system_xid)=(X, Y) already exists.
- Runners may also report 404 responses when attempting to request jobs.
Environment
-
Impacted offerings:
- GitLab Self-Managed
-
Impacted versions:
- GitLab 17.11.0 to 18.0
Solution
- Create a database backup before proceeding.
- Connect to your PostgreSQL database:
- For Omnibus installations:
sudo gitlab-psql
- For external databases (e.g., RDS): Use your preferred database connection method, such as:
/opt/gitlab/embedded/bin/psql -p 5432 -h your-database-host -d your-database-name --username your-username
- For Omnibus installations:
- Check how many problematic records exist in the `ci_runner_machines_archived` table by running this read-only query:
SELECT COUNT(*) FROM ci_runner_machines_archived WHERE NOT EXISTS ( SELECT 1 FROM ci_runner_machines WHERE ci_runner_machines.runner_id = ci_runner_machines_archived.runner_id AND ci_runner_machines.system_xid = ci_runner_machines_archived.system_xid);
- If the query returns records (a number greater than 0), delete the problematic records:
DELETE FROM ci_runner_machines_archived WHERE NOT EXISTS ( SELECT 1 FROM ci_runner_machines WHERE ci_runner_machines.runner_id = ci_runner_machines_archived.runner_id AND ci_runner_machines.system_xid = ci_runner_machines_archived.system_xid);
- Verify that registered runners begin picking up pending jobs.
Cause
During GitLab 17.11.x upgrade, a database migration issue can create a conflict in entries between the ci_runner_machines
and ci_runner_machines_archived
tables. This results in unique constraint violations when runners try to register for jobs.
The issue occurs because the ci_runner_machines_archived
table contains records with the same (runner_id, system_xid)
key pairs that exist in the ci_runner_machines
table. This violates the unique constraint index_ci_runner_machines_on_runner_id_and_system_xid
.
Additional Information
- The error can occur after upgrading from GitLab 17.8.x or earlier to GitLab 17.11.x.
- If you're using an external PostgreSQL database (such as AWS RDS), you might need to check the database logs in your database management console to see these specific errors.