You meant well, you intended to be a good file custodian, but somewhere along the way things got out of hand and you’ve got duplicate photos galore. Don’t be afraid to delete them and lose important photos, read on as we show you how to clean safely.
Deleting duplicate files, especially important ones like personal photos, makes a lot of people quite anxious (and rightfully so). Nobody wants to be the one to realize that they deleted all the photos of their child’s first birthday party during a hard drive purge gone wrong.
In this tutorial we’re going to show you how to go beyond the limited reach of tools which simply compare file names and file sizes. Instead we’ll be using a program that combines that kind of comparison with actual image analysis to help you weed out not just perfect 1:1 file duplicates but also those piles of resized for email images, cropped images, and other modified images that might be cluttering up your hard drive.
What Do I Need?
For the following tutorial you’ll need the following tools:
- Visipics (Windows XP or above / WINE compatible)
- An internal or external hard drive to backup the entire collection you’ll be cleaning
We can’t emphasize the second entry in the list enough; it’s reckless to unleash any file-weeding application upon your files without a proper backup in place to restore files in case of error (user, application, or otherwise).
Backing Up Your Files and Best Practices
We just mentioned this, but it’s important enough to merit a separate entry in the guide. You must backup your files before continuing. Ideally this means copying all your image directories (no matter how cluttered or poorly organized they are) onto an external hard drive which can be disconnected from the primary machine during the image weeding process. At minimum you should at least copy the image directories to another hard drive within your machine and/or to another directory on the disk you’re working on.
Whatever you choose to do (or can do, based on the hardware you have on hand) you should not proceed unless there is, at minimum, a copy of every photo you’re working with in a location that will not be touched by the application we’re using.
In addition to making sure you’re only working with one set of files (and the other is properly backed up) the other critical thing you want to do is to decide which directory is going to be the home directory and which directory is going to be the dupe directory.
Let’s say, for example, that you have a pile of photos in C:\Pictures\ and C:\Picture Dump\. Any duplicate file finder you use will find the dupes in either directory. What you don’t want to do is to start deleting duplicates from both directories as this breaks apart the sets/collections you have.
If there is a folder called 2011 Birthday in both folders, with the same files in both folders, if you don’t pay attention to the process and delete 5 dupes from the first 2011 Birthday folder and 5 dupes from the second one, you’ll end up with a split collection that is even messier than the original pile of dupes you had on your hands.
Always check to see if there is a cluster of duplicate files and remove as many of them as you can, from the duplicate directory, while leaving the home directory’s files intact. This way, when you’re done, you’ll have the lest amount of work to do reincorporating the lost files in the secondary directory into your now dupe-free and mostly clean home directory.
Before continuing, ensure your files are backed up and that you have established which directory is going to be your home directory—the place where the files will remain untouched while the duplicates elsewhere will be purged.
Install and Configure VisiPics
VisiPics is a small, free, and easy to install app. Simply download it, run the installer, and accept the license agreement. Once the installer is done the application will launch.
To configure VisiPics you need to specify which directories you wish to scan and how strictly you wish VisiPics to compare the files. Visipics is not a simple duplicate file-finder—it doesn’t restrict itself to simply comparing names, file sizes, or file hashes. Visipics specifically uses image analysis algorithms to compare photos and will (depending on the settings you select) even offer two photos as duplicates that are different sizes and resolutions but otherwise the same image.
First, let’s pick our directories. For the purpose of this demonstration we’ll be selecting two directories that we know have duplicate files in them. In our My Documents folder we have a folder called \Picture Dump\. We took this folder and copied the images to the E:\ drive to create our duplicate set. By clicking on File –> Add Folder (or by using the folder browser pane and the Add Arrow button) we can easily add the two folders to VisiPics like so:
Now would be a good time to mention that VisiPics has a Project function which allows you to save all your settings in between sessions. If you’ve spent a bit of time selecting folders (or later, tweaking settings), you’ll definitely want to take a moment to go to File –> Save Project and secure the resulting VSP project file in a place it won’t get accidently deleted.
Once you have your folders selected, you can then move the folders up or down in the list in order to create prioritization for the auto-select tool. Your home directory should be the directory at the top—use the up and down arrows at the right side of the folder list to change the position of the folders. You can see the rules for Auto-Select by clicking on the Auto-Select tab. The default is to select uncompressed files, lower resolution files, and smaller files, first. You can uncheck any of these options to alter the behavior of the duplicate finder. Note: Auto-Select will never actually automatically select files unless you click the Auto-Select button.
Once you have the directories picked out and prioritized, you can run your initial test run. No files will be deleted, this test run will simply allow you to see if you need to adjust your filter settings for better results. Go ahead and press the green play arrow in the middle of the interface panel to begin the process. Depending on how many files you have this may take anywhere from a few minutes to an hour or more with large 20,000+ file collections.
In the case of our test run, we have two directories. One on the C drive and one on the E drive. We purposely altered some of the files on the E drive (reduced the file size, altered the dimensions, and so on) to double check Visipics’ search algorithms. Visipic found all the duplicate files, including the files with different sizes, resolutions, and file names.
More importantly, when we used the Auto-Select button, it accurately picked out the duplicate files from the non-prioritized directory first while still respecting the Auto-Select rules that instructed it to also flag the lower-quality files for deletion like so:
Now that you have your files scanned, and you’ve hit Auto-Select to see the files that are VisiPics’ best choices, you have several options. You can bulk delete or move the fills all at once by clicking the Move and Delete buttons in the Actions section located on the right hand side of the interface. We’d, however, recommend not firing off with the Delete button unless you’ve taken a moment to look over the results and confirm that the files are the ones you want deleted.
Move allows you to take all the duplicate files and move them somewhere new, essentially creating a backup of the dupes. If you’ve pretty sure VisiPics has selected the best files but you want to error on the side of caution, move the files to a secondary directory or drive.
Finally, the safest way to use Visipics (although it is by far the most time consuming) is to go down the list and check each file by hand. While this is the surest way to ensure there are no accidental deletions, on a large collection it is very time consuming. If you’re trying to sort out a mess of 15,000 duplicate photos we’d recommend using the Move function to back them up (or rely on the original backup you created earlier in the tutorial) and simply check the first few hundred images to ensure Visipics has sorted them according to your settings—after the initial check, let the application handle deleting the dupes.
If you do opt to hand-check the entire list of files, we’d strongly suggest taking advantage of the previously mentioned Save Project function so that you can save the entire process at any point and return to it later without having to rescan or reflag your photos.
Regardless of how much hand-checking or automation you use, when you’re done you’ll have a tidied directory with the highest quality versions of your images—without a duplicate in sight.
Have a tip, trick, or tool for ferreting out duplicate files? Share your knowledge in the comments below.
Jason Fitzpatrick is warranty-voiding DIYer and all around geek. When he's not documenting mods and hacks he's doing his best to make sure a generation of college students graduate knowing they should put their pants on one leg at a time and go on to greatness, just like Bruce Dickinson. You can follow him on Google+ if you'd like.
- Published 06/26/12