Skip to main content

A Digital Image Workflow Tool

This note describes some simple software I've written called photos, which to collects the photos and videos I take and stores them in a repository for further organization and backup.

As of 2016, the vast majority of the photos and videos (images) that I take are with my iPhone, although I do still take a few photos with a DSLR, etc. Due to many painful experiences in the past, I refuse to rely upon Apple's photo management software (i.e., iPhoto, Photos, etc.) for storage and management and I have found no third party software that I trust for long-term use. Because of this, I created photos.

photos is intended to be used to archive images immediately after they are taken (or soon thereafter). It is not well-suited for use as a tool to scour one's hard drive, archiving any and all images that it finds. See my liberator project for an explanation of why this is the case.

photos only adds images to the repository -- it never deletes or otherwise modifies the original photos being archived. In a nutshell, my archival workflow is:

  • Run photos to archive flash card or Apple Photos / iCloud.
  • Confirm that desired images were archived to the repository.
  • Backup repository both locally and to the cloud.
  • Confirm the backups were successful.
  • Only when the previous steps are complete do I reformat flash cards or delete iCloud photos.
  • Optionally organize the repository into albums.

Features

The feature list for photos software is short and simple:

  • Manual or automated collection of photos and videos, collectively referred to as images, from either Apple's Photos or any user-specified location.
  • Collected images are stored in a repository on a single machine and later shared and backed up to the cloud. In other words, there is a single master copy of the images located on a local filesystem -- photos is not a cloud-based management application.
  • When images are initially stored in the repository, they are organized by either EXIF date (for photos) or the file creation date (for videos).
  • Subsequent to their initial storage, images can be reorganized into an arbitrary directory hierarchy. Directories in the hierarchy may correspond to albums shared via third-party service.
  • Within a directory, image names must be unique. If there is a filename collision when storing images in the repository, a sequence number is appended to the original file name so as to create a unique name.
  • Once archived, images may be arbitrarily renamed by the user, though extensions are important.
  • Photos are expected to be 'well formed', i.e., with proper EXIF data.[LIB]
  • When duplicate photos or videos are encountered, they are not added to the repository. Duplicate images are defined as those images with identical content (i.e., filename, date, time, etc., are not considered).
  • The repository can be synced to cloud storage services such as Amazon Prime Photos[APP], Google Photos, Backblaze, Dropbox, etc.
[LIB] For dealing with problematic photos, trivially different duplicate photos, identification of thumbnail images without corresponding photos, etc., I have written additional photo software called liberator which uses perceptual hashes to identify photos based on image similarity. See ... for more information.
[APP] I personally chose Amazon Prime Photos (APP) because a) it is included in Amazon Prime membership, b) it offers unlimited photo storage at the original resolution, c) it supports the creation of albums based on the directory structure in which photos are stored, and d) it offers sharing of albums. Prime Photos is fairly spartan and seems to fall short in many areas when compared to Flickr and others, but it's the only service I found which covered all of the non-negotiable fundamentals I required at a reasonable price.

Repository Structure

The repository location and structure used by photos is configurable. This describes the repository under the default configuration.

The repository is located in ~/Media, which contains directories for photos and videos. The process of collecting images and storing them in the repository is termed 'archiving'. During archiving, images are stored by year and month (configurable). After the first archive operation, the repository might look like the following:

repository ~/Media/ ~/Media/photo/ photo/~/Media/->photo/ video/ video/~/Media/->video/ log/ log/~/Media/->log/ info.db info.db~/Media/->info.db 2016/ 2016/photo/->2016/ 2015/ 2015/photo/->2015/ 2014/ 2014/photo/->2014/ dirYears ...photo/->dirYears videoyears ...video/->videoyears 01/ 01/2016/->01/ 02/ 02/2016/->02/ dirMonths ...2016/->dirMonths 12/ 12/2016/->12/ 2015months ...2015/->2015months 2014months ...2014/->2014months 01files IMG_4351.jpgIMG_4352.jpgIMG_4353.jpg...01/->01files 12files IMG_4789.jpgIMG_4790.jpgIMG_4791.jpg...12/->12files

Archiving stores images in the leaves of the repository tree, organized by year and month. Duplicates are not archived.

About Duplicate Images

photo will not archive duplicate images. Two or more photos are considered to be identical when the contents of their image is identical; photos that differ by one pixel or more are considered to be unique. File names, times, other file metadata, as well as EXIF information, are not used when comparing photos. Duplicate videos are identified in a similar manner.

When photo processes and image for archiving the SHA1 hash of the image content is calculated and converted into a digest. If a previously archived image has the same SHA1 digest, the photo currently being processed is skipped as a duplicate. Skipped photos are logged; more about that later.

This behavior can have unintended consequences. For example, assume that, for some reason, there are two copies of an image which differ only in their EXIF data. Perhaps one image is the true original and the other has invalid EXIF data for some reason. If the latter is archived before the former, the former will be rejected as a duplicate and the image with the invalid EXIF data will remain in the repository. This is the behavior that I desire, but it may not be suitable for you. See the my liberator project for more details.

In the interest of efficiency, the SHA1 digests that are calculated are subsequently stored in the file ~/Media/info.db.

Repository Organization

Subsequent to archiving, the user may choose to further organize the images. For example, assume that the user wished to place IMG_4789.jpg and IMG_4790.jpg in a Christmas directory which could be used by a cloud service (e.g., APP) to create a shareable album.

This could be done by creating the directory ~/Media/photo/2016/Christmas and moving the chosen photos into that directory.

dirstruct ~/Media/ ~/Media/photo/ photo/~/Media/->photo/ 2016/ 2016/photo/->2016/ 2015/ 2015/photo/->2015/ 2014/ 2014/photo/->2014/ dirYears ...photo/->dirYears 01/ 01/2016/->01/ 02/ 02/2016/->02/ dirMonths ...2016/->dirMonths 12/ 12/2016/->12/ Christmas/ Christmas/2016/->Christmas/ 01files IMG_4351.jpgIMG_4352.jpgIMG_4353.jpg...01/->01files 12files IMG_4791.jpg...12/->12files CFiles IMG_4789.jpgIMG_4790.jpgChristmas/->CFiles

Alternatively, one could create the directory ~/Media/photo/2016/12/Christmas and move the chosen photos into that directory.

The choice of image organization is up to the user and post-archive directory structures are completely arbitrary. Some additional ways to organize Christmas photos are:

Create ~/Media/photo/Christmas
To contain all Christmas photos, regardless of year.
Create ~/Media/photo/Christmas/2016
To contain all 2016 Christmas photos.

You get the idea.

Photos should be moved rather than copied, otherwise, duplicate images will result. More about duplicate images later.

At this point, it should be obvious that any given image can only appear in a single album. This is a significant limitation for many users. I am able to live with this limitation in exchange for the simplicity it brings.

Duplicates

photo will not archive duplicate photos.

TODO: Complete this document.