Post

Deleting Zero Byte Files in Ceph

Summary

I have been battling a long-standing issue with a subset of ceph clusters that my team looks after that have a rgw (Rados Gateway) bucket that contains millions of zero byte objects. Each time a node was added or removed from the cluster or pg autoscaler was triggered, IOPS in the cluster would slow to a crawl while these zero byte objects were re-balanced. The overall size of the bucket was only 5MB, but the number of files was between 30 and 50 million.

I tried a few different options to delete the bucket, including:

1
2
radosgw-admin bucket rm --bucket=dlp-events-old --purge-objects
s3cmd -c bucket-config.cfg del s3://dlp-events-old --recursive --force

Leaving these running in screen over a weekend had no effect.

Solution

My solution to this was to get a listing of every file in the rados pool used for rgw, crave out the files that pertained to the bucket I wanted to delete, and iterate over those files to remove them. My process for getting the list of files looked like this:

1
2
3
4
radosgw-admin bucket stats --bucket=dlp-events-old | jq -r '.marker'
rados -p rook-rgw.buckets.data ls > all_obj # Get a list of all objects
sed -i /^<marker from first command>/!d' all_obj # Remove everything that isn't what needs to be deleted
split -l 1000000 all_obj dlp-events- # Split all_obj file into 1 million line chucks

This was the prep work. The rados command is able to quickly get a list of all files in the rados pool used for rgw. The listing contains the marker for the bucket prepending each file in the bucket. I then used sed to remove any lines from the file that weren’t being targeted for deletion. Lastly, I needed to split the file up into smaller chunks so I could loop over the contents with a simple bash script.

Next, the bash script.

1
2
3
4
5
6
7
8
9
10
#!/bin/bash
_index=$1
_counter=0

for _l in $(cat dlp-events-$_index); do
  rados -p rook-rgw.buckets.data rm $_l
  (( _counter++ ))
  _mod=$(( _counter % 50000 ))
  if [[ $_mod == 0 ]]; then echo $(date -u) $_counter; fi
done

I then ran this script in parallel, passing each million line chunk into it.

1
for _i in {00..40}; do ./script.sh dlp-events-$_i > dlp-events-$_i.log; done

I could then monitor the progress of each iteration by checking the log file. Every 50,000 files that were deleted, the script would print out the time and counter value.

Lastly, I needed to delete the index related to the bucket.

1
radosgw-admin bi purge --bucket=dlp-events --yes-i-really-mean-it

At this point all the data related to the bucket is gone, however the storage bucket still exists. In it’s current state, it can’t yet be deleted. Attempting to delete the bucket would throw and error saying the bucket doesn’t exist. That wasn’t true because running radosgw-admin bucket list still listed the bucket. My fix here was to force the bucket to be re-sharded with:

1
2
radosgw-admin reshared add --bucket=dlp-events --num-shards 1 --yes-i-really-mean-it
radosgw-admin reshard process

Only then could I delete the bucket with:

1
radosgw-admin bucket rm --bucket=dlp-events

Finish Line

This was a lengthy process that took 2-3 days per ceph cluster I was working on. At the start, I created a new bucket and renamed the old one. This was done so that the users of the bucket could continue their operations without me impacting them. This was definitely a fun challenge I managed to solve without having to resort to extreme methods of wiping the entire cluster and starting over from zero.

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.

© Kris Crawford. Some rights reserved.

Using the Chirpy theme for Jekyll.