Notes on exiftool
Introduction
These are my notes on using exiftool to organize a jumbled mess of digital images. This is written more as a personal memory jogger, unlikely to be of value to anyone else.
General
Before getting to exiftool, here are a couple of handy one-liners.
Print all of the file extensions present in $DIR, one per line:
find "$DIR" -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' \ | sort -u
- Print all of the file extensions present in $DIR, one per line, along
- with a count of the number of files with that extension:
find "$DIR" -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' \ | sort | uniq -c | sort -rn
Organizing Photos Based on Date
If a photo's ultimate location will be based in some way on a date, the date to be used in determining the location will have to come from somewhere. The most obvious approach is to use one of the dates that should be present in the photo's EXIF data. I use DateTimeOriginal, which is the date on which an image was taken -- either by a traditional film camera or a digital camera.
CreatDate is subtly different -- it is the date that the digital image was created. This is either the date that a digital photo was taken or the date that a traditional photo was scanned.
Unfortunately, not all images will have a DateTimeOriginal or even a CreateDate. Therefore, before we can reorganize, our first step is to ensure that each image has a valid DateTimeOriginal.
A Cautionary Word
If duplicate detection is a part of your workflow, give ample thought to choosing the best time to organize by date. My experience is that this is best as a last or nearly last step. This is because the process I describe here changes EXIF data. A photo with changed EXIF data will no longer be detected as a duplicate by most detection tools (e.g., rmlint, fim, fdupes, etc.). On the other hand, if your detection tool looks at image similarity, perhaps using a perceptual hash, this may not be a concern.
Getting a Date to Use
If DateTimeOriginal does not exist, where will a date come from? Sources of possible dates, in increasing order of desireability (for my purposes) are: MDItemFSCreationDate, GPSDateTime, and CreateDate.
Note that MDItemFSCreationDate is macOS-specific. Be aware that the drive containing the images must be indexed by Spotlight for any of the MDItem tags to be available. Otherwise, exiftool will return file not found errors. FileCreateDate is apparently the Windows equivalent. Unix does not maintain file creation time.
With exiftool, the most recent valid tag assignment is the final value of the tag. My scheme for determining DateTimeOriginal follows. DateTimeOriginal is initially set to time the image file was created, as we know this exists. Then, if a GPS time is available, it is used. Finally, if CreateDate is available, it is used.
exiftool -r -if '(not $DateTimeOriginal)' \ '-DateTimeOriginal<MDItemFSCreationDate' \ '-DateTimeOriginal<GPSDateTime' \ '-DateTimeOriginal<CreateDate' \ "$PHOTODIR"
Next, a quick sanity check on dates. I have encountered image files that, for unknown reasons, have a DateTimeOriginal newer than the FileModifyDate (e.g., DateTimeOriginal of 7/4/2000 and FileModifyDate of 1/1/2000). The following corrects any such illogical dates:
exiftool -r -if '($FileModifyDate lt $DateTimeOriginal)' \ '-DateTimeOriginal<FileModifyDate' \ "$PHOTODIR"
If either of the previous steps modify a file, the modified file will have the initial name of the original file and the original file will have _original appended to its name. For example, if foo.jpg is modified, exiftool will leave in its wake two files: foo.jpg and foo.jpg_original.
It is possible to tell exiftool to remove any _original files, but I prefer to do this as a separte step, after I have a chance to examine the results.
find "$PHOTODIR" -name \*_original -delete
Harmonizing File Extensions
Often, a single image type may be present with different file name extensions. E.g., one may have photos ending in both .jpg and .jpeg. I chose to enforce consistent entensions.
The following command will convert .jpeg to .jpg. We'll ignore case conversion for the moment, because my macOS is configured to be case insensitive.
exiftool -r -filename=%f.jpg -ext jpeg "$PHOTODIR"
If you have a case sensitive filesystem, the following would take care of both changing jpeg to jpg and lowercasing and uppercase extensions:
exiftool -r -filename=%f.jpg -ext jpg -ext jpeg \ -if '$filename!~/\.jpg$$/' "$PHOTODIR"
exiftool can be used to do this, but there are other tool (e.g., find, the Perl rename module) that will likely be much faster if extension renaming is your only goal.
Reorganize
These examples will reorganize photos by placing copies of the originals into a new directory structure with the following overall format: new/ext/yyyy/mm/dd/hh/file.ext. It's not difficult to move rather than copy. Note that DateTimeOriginal, which we set earlier, is used to determine the directory structure for each image.
Notice that some of the parameters in the expression supplied to '-d' require two '%' signs, one which escapes the other. Also note the use of %le, which will convert a file extension to lowercase, if necessary. See <http://owl.phy.queensu.ca/~phil/exiftool/filename.html>.
If you'd like to try a dryrun before doing anything, the following invocation only prints what it would do to stdout -- it does not actually copy, move, rename, or otherwise modify the filesystem.
exiftool -r '-testname<DateTimeOriginal' \ -d 'new/%%le/%Y/%m/%d/%H/%%f%%-c.%%le' "$PHOTODIR"
This command will actually reorganize things:
exiftool -r '-filename<DateTimeOriginal' \ -d 'new/%%le/%Y/%m/%d/%H/%%f%%-8c.%%le' "$PHOTODIR"
A new structure containing copies of the original photos will be created in ./new.
Be sure to check in both the new and original directories for any files that were left behind due to exiftool errors.
Other Invocations
-
Make a copy of all images in the directory ./SRC and place them in the directory ./DEST, maintaining the subdirectory structure.
exiftool -r -o . -directory=DEST/%d SRC
The -o . option sets the directory to ./ and also forces copies of the images to be made, rather than changing them in place. This directory setting is subsequently overridden by the -directory= option, which sets the output directory to ./DEST with the relative path of the image under ./SRC appended.
E.g., the original image ./SRC/foo/bar/baz.jpg is copied to ./DEST/SRC/foo/bar/baz.jpg. I'm sure there's a way to remove the unnecessary SRC subdirectory in the output, but I haven't attempted to determine how to to this.
To better understand how this works, consider another example.
exiftool -r -o . -directory=DEST/%d `pwd`/SRC
This has a different result than the previous example. The fully qualified path is used for the subdirectory structure under ./DEST. If $PWD is /Users/khe/example, then the original image ./SRC/foo/bar/baz.jpg will be copied to ./DEST/Users/khe/example/SRC/foo/bar/baz.jpg.
-
Find all duplicate files, where duplicate is defined as a file name containing '-nnnnnnnn' (a dash followed by eight digits) preceding the three character extension.
exiftool -r -FileName -if '$FileName =~ /[0-9]{8}\..../' dupes/src-new/jpg/1970
-
Get image sizes info.
# dimensions of every image exiftool -r -T -progress: -imagesize -directory \ -filename photos > sizes.all # counts of image sizes, ordered by frequency sed 's/x/ /' < sizes.all \ | awk '{printf "%12d %6d %6d\n", $1*$2, $1, $2}' \ | sort -n | uniq -c | sort -n > sizes.frequency # counts of image sizes, ordered by image size sort -n -k 2 sizes.frequency > sizes.size
-
exiftool processing based on image size/dimensions.
# images less than 256x512 in size exiftool -if '$imagesize and ($imagewidth<256 and $imageheight<512)' \ -filename -r -T src-new