Photo Jockey HELP
Jump to Photo Jockey Home Page
Jump to Help table of contents...
Tip # 57
(Duplicates) SEARCHING FOR DUPLICATE IMAGES
Click here to show the Search For Duplicate Images Screen.
If you have lots of photos and you do lots of editing and moving and copying photos, it's possible that you could end up with a bunch of duplicate images. If you are interested in knowing where all your duplicate images are, then this is the function you need.
From the main menu, you can click on the FILE menu, then click on the "Search For Duplicate Images" menu item. There is also a button on the Quick Tools panel as well for quick access.
General Description
This tool will search any part of your system for duplicate images. There are two types of duplicates that can be found.
Identical: These are images that are physically the same. They are EXACT copies of each other. The file size, date and time will be the same, and the image dimensions will be the same.
Appearance: These duplicates are NOT identical, although they will APPEAR the same. The file size, date and time can be different, and the image dimensions can be different.
Once you have located your duiplicate images, you can then delete the ones you no longer want. You can also rename and move these duplicates.
DUPLICATES THAT ARE NOT DUPLICATES:
HINT: If you have used mounted drives/volumes or assigned network folders to drive letters or used the old obsolete SUBST command to fake a drive letter pointing to a real folder on a real drive, you have the possibility of seeing duplicates that are NOT duplicates.
For example:
Y: Points to \\computerA\photos\bestpics
Z: Points to \\computerA\bestpics
If ComputerA has a bunch of shared folders and 2 of the shared folders really point to the same folder, then Y: and Z: are really the same thing.
Also, if you have for example:
R: Points to C:\MyPhotos\GoodStuff (via the SUBST)
Then R: is really the same thing as the folder C:\MyPhotos\GoodStuff.
In any case, you do NOT want to see duplicates reported that show files in Y: being duplicated on the Z: drive, because if you delete one of the images, it actually deletes 2 of the images. So, even though you THINK you are getting rid of a duplicate, you are in fact deleting a single image that is NOT duplicated. And you will need a backup in order to recover that file.
SOLUTION TO PROBLEM:
So, Photo Jockey implements a technology that allows it to KNOW which of the folders that are being compared are really pointing to the SAME PHYSICAL data. We call these MIRRORED folders. Photo Jockey will eliminate all mirrored folders from the searching process. Thus you will never have the situation above of false reporting. False reporting is where it thinks 2 images are duplicated and that one of them can be safely deleted (when in fact 2 images are pointing to the SAME IMAGE and neither of them can be deleted without deleting the other).
A complete list of the ignored MIRROR FOLDERS (if any) are shown in the search results report list.
SEARCHING OPTIONS
Before you click the "Start Search" button, you can select various options to perform the search in the way you want.
Where To Search:
Where to search: This option allows you to specify WHERE in your system to search for duplicate images. All selections can search sub-folders as well.
Current Folder: uses the folder that the main Photo Jockey screen is displaying.
Current Drive: uses the current drive that the main Photo Jockey screen is displaying.
All Local Drives: uses all drive letters (A-Z) that are physically attached to the system. Network drives are IGNORED.
All Drives(Local/Network): uses all drive letters (A-Z).
Specific Drives: you are able to select WHICH DRIVES TO SEARCH. You are also allowed to select which network folders to search as well, if you want to search certain folders connected via your network.
Specific Folder: you are able to choose which drive/folder to search.
My Documents: The main folder in which it's a good idea to store all of your personal files. Makes it easier for backup purposes and organizational purposes too.
Use Exclusions: will allow you to specify what folders to exclude from any searches. This is useful if you have some folders that contain many duplicates that you don't really care about, OR if you have some folders with LOTS of images that you don't care if they are duplicated elsewhere. By excluding these folders you save time in the search process and reduce the size of the list of duplicates. Click the "Exclusion List" button to edit the list of excluded folders.
Search SubFolders Too: Normally, you would WANT this option checked. This allows the current folder and it's sub-folders to be searched. This "can" produce a larger results list than if you don't check it. Un-checking this option restricts the search to just a SINGLE folder. Typically the only reason you would have this option un-checked is if you are searching by "Image Appearance" and you are only interested in images looking the same in ONE folder. If you are NOT searching by "Image Appearance", it is UNLIKELY that you will find any duplicates.
Compare Tagged: This allows you determine if any of the tagged files are duplicated in any of the areas where you are searching. This is useful if you have a main repository of images and you periodically add images to it. And you don't want to add any duplicate images to it. Also, the searching goes a little faster with this method.
Specific Drives: If you had chosen "Specific Drives" from the top list of choices, then this button option will appear. This allows you to specify which drives and or network folders you would like to search for duplicates. There is a pulldown list next to this button to show you what drives you have selected.
Custom Folder: If you had chosen "Specific Folder" then this option will appear. This allows you to specify what drive/folder you would like to search for duplicates.
How To Search:
How to search: This option allows you to specify how duplicates are found. You can choose from the following methods:
Quick: Duplicate Name/Date/Time/Size
Quick: Duplicate Date/Time/Size
Slow: Duplicate Image Appearance
Slower: Duplicate EXACT copies of each other (Dup Finder)
Name/Date/Time/Size Duplicates: Finds all images that match on name, date, time & size. Typically these matching images will be IDENTICAL images.
Date/Time/Size Duplicates: This method will more than likely be IDENTICAL images and will find duplicates on your system even if they were renamed because they will match of Date, Time & Size.
Images Appearing Duplicate: Searches based on how the image appears. So, if you had a low quality JPG image of small dimension and a high quality BMP image of large dimension (of the same scene), this method will show the 2 images as duplicates. Brightness and rotation can be different and they will STILL be shown as duplicates!
EXACT Copies of each other (Dup Finder): This is great for finding all duplicates that REALLY ARE EXACT COPIES of each other. The files are the same size and contain the same data. A typical use for this is to find all the duplicated downloaded images from your camera. Most people re-download their images by mistake and thus have lots of dups on their hard drive that are in fact the same files. Learn more about Exact Copies Search
Image Appearance:
How Close To Match: This option is used to determine how close 2 images should appear before being considered duplicates of each other. Exact Match will only show dups that are nearly the same. Close Match will find images that even have slight differences. Such as one image may have no copyright notice on it and another may have a copyright notice on it and they will still be considered duplicates. Loose Match will match against more images because the strictness is reduced.
Allow Brightness Differences: Let's say you have some images that are duplicates, but they have different brightnesses. Normally these duplicates images would not be found. This is because the images are too different to be considered a match. If you use this option, then images of different brightnesses but same basic appearance will now be reported as duplicates. By default this is checked.
Allow Wrong Coloring: Let's say you have some images that are duplicates, but are NOT the same coloring. IE: One is black & white and the other is in color and a third is in color but tinted differently. This is the option you want checked to be able to find those types of duplicates. Basically when the image comparison is being done, the color portion of the image is not compared. If you are searching thousands of images, this option can cause quite a few false dups. These are reported duplicates that don't even look similar.
Close Dimensions: This limits the duplicates found. In order to be considered duplicates, 2 images' dimensions must be within 20% of each other. For example: If you had 3 images that looked the same and were 640x480 and 700x520 and 320x240, then only the 640x480 and 700x520 would be found as dups. This is because the 320x240 was too different in image dimension. This can be a useful option.
Only Rotated Matches: Let's say that you have a bunch of images that you rotated or flipped or mirrored. And you want to be able to find these duplicates as well. If you were to do a search that also searched if rotated, mirrored, flipped, then the search would take at least 8 times as long. Since this is not desirable, we've provided a simpler and faster method. By finding duplicates ONLY if they were rotated or mirrored or flipped, the search time is dramatically reduced. And the found duplicates list is much smaller. You can then delete the images you no longer want and then re-do the search WITHOUT this option checked to find all the rest of the normal duplicates.
Clear All Fingerprints: This removes all fingerprints from the database. Of course, the next time you click "Search", it will have to take the time to rebuild the fingerprints. Again, once rebuilt, any further searching will go much faster.
Exact Duplicates (Dup Finder):
Searching for Exact Duplicates is useful for many things. A good example is for people that copy images from their camera memory card and then forget to erase the memory card. Then take a bunch more photos and then download those new photos PLUS THE OLD PHOTOS onto their system. They now have duplicate images that are exact copies of each other (the old photos are on their system twice). Some downloading software that is smart will rename the downloaded files so that the ones that are duplcaites have different names than the ones already on their system. For example: IMG_0920.JPG and IMG_0920B.JPG might possibly be the result of such a scenerio. So, looking for identical names isn't going to find every duplicate. Also, it's possible that as these images are downloaded to your system that the date/time stamp is made the current date/time. That means that IMG_0920.JPG might have a date of 07/04/2007 and the IMG_0920B.JPG might have a date of 07/28/2007. This means you can't just look for files that have the same date/time stamp.
Filenames can be different: This is useful in detecting duplicate files even though their filenames may be different. See the example given above about digital camera owners.
Time stamps can be different: This is useful in detecting duplicate files even though their file date & time stamps may be different. See the example given above about digital camera owners.
Search movies as well: This is useful if you want to find duplicate movies too, instead of just duplicate images.
Only search images larger than 640x480: If your search results is finding tons of duplicates and you want to narrow the duplicates down to images that take up the most space, you can limit the search to just larger images instead of any sized image.
JPG Images containing EXIF Information: This is useful in detecting duplicate files ONLY IF they contain EXIF Information. This is great if your results are showing you tons of duplicate sets of images of photos you didn't even take. You should then checkmark this option and re-run the search. Then only images containing Exif information will be shown. See, practically all digital camera's images have Exif information stored in them. So, by checking this option, you should only see duplicate images created by digital cameras. Although, it's possible these images were NOT taken by your camera.
Clear All Fingerprints: This removes all fingerprints from the database. Of course, the next time you click "Start Search", it will have to take the time to rebuild the fingerprints. Again, once rebuilt, any further searching will go much faster.
START SEARCH BUTTON:
This button searches for duplicates based on the various options you have set. During the search process, images are fingerprinted. The fingerprinting takes time, once images are fingerprinted, then the next time you do a search, they will NOT need to be fingerprinted again. So, the first search will be slower than subsequent searches. After the searching is done, a list of any found duplicates will be displayed on the right side of the window in the Search Results panel.
SEARCH RESULTS PANEL
This panel shows a list of all found duplicates. The duplicates are shown in groups. As you click on a filename, the image is shown in a preview window. In the preview window, you can DELETE or MOVE or RENAME the file. If you DBL-CLICK the file, you can bring up a Preview Window of the image.
You can also easily use the thumbnail list that appears at the bottom to view the images that are considered duplicates.
SAVE LIST
This allows you to save your results list if you don't want to review all the duplicates right away. Once you save the results, you can later load the results and continue with your review of duplicates.
LOAD LIST
If you had saved your results, you can later load your results to continue to review all of your duplicates.
FIND
If the results list is large and difficult to find a particular filename, then you can use the FIND option to locate a filename.
CUT LINES
If your results list is large, then as you review your list, it would be nice to remove lines from the list that you no longer care to look at. As you review your duplicates and delete the ones you no longer need and rename and move the files you need to rename and move, then you can remove those lines from the results list so that your list can become smaller and easier to manage.
IDENTICAL
This checks to see if 2 selected files are IDENTICAL. You can highlight 2 images using the CTRL keys. In order to be identical, they can't just look the same, they have to be the same byte for byte. It would have to be as though you had copied the file.
SIDE-BY-SIDE
This opens up 2 preview windows side-by-side for 2 selected files. You can highlight 2 images using the CTRL keys. There is a 3rd window that shows up on the bottom. It shows you the differences between the 2 images. The is very handy in determining which of 2 files is of better quality so you know the one to delete. Typically you would delete the one of least quality.
TIPS
How to speed up search:
Pre-Build Database
: If you are searching by image appearance, it would be a good idea to do a complete search the first time. ie: Search by all local drives. This will take the LONGEST time to complete (maybe several hours if you have lots of images). But all your subsequent searches will then go much faster.
Exclude Folders From Search: Also, you can EXCLUDE more folders from the search, that are of no interest to you, by using the Exclusion list. In the "Where To Search" panel, you NEED to check the "Use Exclusions" checkbox. Then you NEED to click on the "Exclusion List" button and add all the folders to the list that you DO NOT WANT to have searched. Then click the search button and the search will go faster.
CONSIDERATIONS
Files Not Searched:
In order to produce a results list that is not filled with a bunch of duplicates that you don't care about, there are certain file types that are NOT searched. These are *.ICO, *.EMF, *.WMF..This is because these file types are not used for photos. They are mostly very small ICONS and clipart. Also, if you are searching by "Image Appearance", then images with dimensions less than 64x64 will be ignored. This is because typically photos are much larger.
Files Not Really Duplicates:
Because of the method used to determine if files are duplicates, it's possible that you will get some duplicates reported that in fact don't even look similar. When searchnig by "Image Appearance" this happens because the logic is based on speed and in order to process so fast, you can end up with a few that shouldn't have matched (false positives). The percentage of this happening is very low, so if you see it happen, don't worry. Basically, the method of checking works best on photos. It doesn't work as great on images of lots of solid colors such as line art. If you have a bunch of line art or non-photos, these can show up as false dups. If you use the "Allow Wrong Coloring" option, this can increase false dups too. It's just a side-effect of being able to find more true duplicates. If you are searching by "Date, Time & Size", it's possible to get false dups when working with groups of images whos dates & times are the SAME. Some companies set all of the files in a package to have the same date & time for all the files. This increases the likelyhood of false dups.
Files Not Really Different Files:
If you are on a network or have a redundant folder that is being searched, it is possible that the results list would show 2 images as being duplicates, when in fact the 2 different images shown are actually pointing to the VERY SAME FILE. To be clearer, let's say you have drive X: mapped to C:\MyPhotos\Paris and you do your search. The results list could show 2 files being duplicates:
C:\MyPhotos\Paris\IMG001.jpg
X:\IMG001.jpg
You may think at first glance that these are 2 different files that appear to be duplicates of each other. And you may decide to delete one of them. Since they both point to the same file on your drive, if you delete one of them, then they will both be gone. If you notice this, use your RECYCLE BIN to recover the mistakenly deleted file.
Searching By Appearance Is Slow:
This will happen the first time you do a search. This is because all the files being searched have to be fingerprinted. Once fingerprinted, the search will go much faster. This is because the fingerprints are saved into a database for later retrieval.
Duplicates NOT Found:
If you know that certain files should appear in the results list and they do not, then some of your options may not be set properly. For example: Searching the wrong areas (Where To Search). Also, if you have "Compare Tagged" checked, then it limits the results to any duplicates found that happen to match any of your tagged files. Also, if you have "Exclusions" checked, then maybe your exclusion list contains the folders that have the duplicates you want to find. Also, if you are searching by "Image Appearance", then if you have "Close Dimensions" or "Only Rotated Matches" checked, then you results list will be filtered to be smaller than what it would have been otherwise.
Jump to Help table of contents...