Bulk Delete Duplicate Files Using czkawkas Results File

czkawka tool to find duplicates works pretty well. I ran it against my NAS as I want to migrate data from that aging device to a new device and figured I would take the time to clean up many duplicate files. I have a previous post from the other day that details some of my journey with it so far and use against NAS devices.

The one thing I can’t figure out in czkawka is if there is an ability to, for each group of duplicates, delete everything but the first file listed in that group. What I’ve had to do is manually put checkmarks on the list. Fine and dandy for maybe a couple hundred duplicates, but I have about 15k duplicates in about 12k groups accounting for about 100GB of wasted space. That’s a LOT of clicking.

czkawka allows me to save a list of the duplicate files found. Here’s a sample structure of that results file.

-------------------------------------------------Files with same hashes-------------------------------------------------Found 14934 duplicated files which in 12706 groups which takes 96.99 GiB.

---- Size 1.76 GiB (1888739918) - 2 files
C:\nas_photography\1\IMG_7467.MOV
C:\nas_photography\1\New folder\DCIM\924BJHKG\IMG_7466.JPG

---- Size 748.34 MiB (784696092) - 2 files
C:\nas_photography\1\IMG_7524.MOV
C:\nas_photography\1\New folder\DCIM\924BJHKG\IMG_7523.JPG

---- Size 688.21 MiB (721643510) - 2 files
C:\nas_photography\1\IMG_7656.MOV
C:\nas_photography\1\New folder\DCIM\924BJHKG\IMG_7655.JPG

Here’s my attempt to “automate” this. I think I put about an hour of work into this (mostly fiddling with WSL trying to get the Windows share mount permissions working — which I didn’t get working).

#!/bin/bash

lineno=0
while IFS= read -r line
do
  if [[ "$line" =~ ^---- ]]; then
    lineno=1
  elif [[ "$line" =~ ^$ ]]; then
    lineno=0
  else
    ((lineno++))
    if [ "$lineno" -gt "2" ]; then
      echo del /q \""$line"\" >> windows_delete_duplicates.bat # For windows
      # echo rm -f \""$line"\" >> linux_delete_duplicates.sh # For linux
    fi
  fi
done < results_duplicates.txt

#czkawka