Published : 2022-01-12

Remove old Gitlab CI job logs

Gitlab stores 1 log file per job created in CI/CD. These are files and considered artifact of the job.

Artifacts on a filesystem

On a standard filesystem store, a simple find command is sufficient to remove logs older than given days:

find <gitlab-artifact-dir> -mtime +365 -name job.log -delete

Artifacts on a S3-like storage

It’s a bit complicated, we cannot use find here. We will use s3cmd and some bash to process the files on S3.

The following code extract is commented, i won’t explain how it works outside of it :)

#! /bin/bash

# List only root folder for memory efficience
s3cmd -c .s3cfg ls s3://<bucket-name>/  | while read -r line; do

  # Retrieve folder name
  subFolder=$(echo "${line}" | awk '{print $2}')
  echo "Working on: ${subFolder}"

  # List all files in the folder
  s3cmd -c .s3cfg ls --recursive "${subFolder}" | while read -r lineS3File; do

    createDate=$(echo "${lineS3File}" | awk '{print $1" "$2}')
    createDate=$(date -d"${createDate}" +%s)
    olderThan=$(date -d"-365days" +%s)

    # Process only files older than 365 days
    if [[ $createDate -lt $olderThan ]]; then

      fileName=$(echo "${lineS3File}" | awk '{print $4}')

      # Retrieve file name from full path and ensure it's a job.log file
      if [[ "${fileName##*/}" == "job.log" ]]; then

        echo "Removing more than 1 year job.log file: $fileName"
        s3cmd -c .s3cfg rm "${fileName}" && echo "Removed."

      fi
    fi
  done
done