Docker Runner jobs fail with "No space left on device" error
Description
CI/CD jobs using Docker-based executors can fail with the error message No space left on device
. The error can appear during various stages of the job, but mostly during the script execution.
Environment
-
Impacted offerings:
- GitLab Dedicated
- GitLab Self-Managed
Solution
- Check disk space and inode usage in the build container and the Runner host:
df -h df -i
- Also check on the Runner host specifically for
/var
or/var/lib/docker
, most likely where the Dockeroverlay
filesystem will be mounted:# Check disk space df -h /var/lib/docker # Check inodes df -i /var/lib/docker
- Perform a Docker prune on the Runner instance host (if necessary)
- Check the CI/CD job log for which command or job sub-stage receives the error. Investigate whether the job is reporting a disk space problem on a remote system, or on the Runner instance host.
- Check if the error message indicates which service/command failed
- Verify the remote service has sufficient disk space
- Check logs of the remote service for space-related issues
Cause
The No space left on device
error (ENOSPC) is returned by the Linux kernel when a write operation fails because it cannot allocate space on the filesystem. This can happen due to:
- Physical disk space exhaustion
- Inode exhaustion
- Filesystem quotas being reached
In containerized environments, this error can be more complex to diagnose because:
- The error might originate from the Runner instance host rather than the container
- Docker's overlay filesystem manages space differently than traditional filesystems
- Remote Docker services can propagate their own space-related errors back to the build job
Additional Information
Docker-based executors are those that interact with Docker containers, such as:
docker
docker-autoscaler
docker-machine
Remember that Docker containers share the host's kernel - they don't run their own kernel like virtual machines do. This means when you see this error, it's most likely coming from the Linux kernel, not from within the container itself. Which Linux host is returning this error depends on what command or job sub-stage has failed, and whether that relies on a remote system.
- Example: If using
terraform
commands that use a remote TFE backend, and the command downloads large files (like providers), the remote backend needs sufficient disk space. This can be even more complex if the remote backend is a cluster of multiple hosts, and the commands run in a Dockerized environment.
A CI/CD job has the following data inside its container, which requires space on the host:
- Downloaded and extracted cache and artifacts
- Cloned repository
- Docker image data
- Dependencies
- Any data that is written/downloaded by the job's script(s)