I can't help myself ... I'm still trying to pack too much "information" into what I write, it's still too geeky by far, so I'll try and adhere to the spirit of my recent pledge by writing a really short summary and then my longer "mad scientist"/business version with all the gory details.
The Really Really Short Version
I'm changing our agency image filenames for really good reasons. What our photographers do will continue unchanged. Without re-compressing them, we'll rebuild our images with cleaned-up metadata and reload them to PhotoShelter and to Amazon S3 for backup.
The "Long Winded", gory details version
I'm trying to cleanup our images and image metadata to continue to make Picade an industry leader in using metadata, and to eliminate problems with Digital Asset Management (I'm going to have to skip the D__ acronym - the blogging software doesn't like it and it won't display this posting otherwise!!!) systems, and to eliminate problems with our photographers getting properly paid and protecting their work. Now while we have a relatively few thousand images online is the time to do this, rather than later when we have hundreds of thousands or even millions of images online.
Digital computers are relatively rigid, and essentially do exactly what they are told to do (even if that is not exactly what their owners intend) every time they do it with the "same input", and therefore sometimes "do not like" input they are given to deal with because of rigid two-valued "Boolean logic" embedded in their programming that does not accept input variations created by humans.
Humans on the other hand frequently do things differently every time they do them, and are very much more flexible than computers and "recognize" close matches to an expected situation (so called "fuzzy logic" ), and therefore sometimes as a result can make mistakes with regard to working within rigid specifications or guidelines for input to computers.
Picade photographers and Picade customers are humans and make human type mistakes, and as a result we have to mitigate against the results of these types of mistakes when working with Picade computer programs that get their input from these human beings. Further, Picade customers and vendors utilize their own separate computer systems and programs that get their input from human beings and from Picade which have unknown to Picade requirements, and we have to mitigate against problems with those systems and our images as best we can.
One of the principal business plan requirements for Picade to exist and function successfully in the current marketplace was to eliminate as much as possible of the agency "business overhead" of human employees by having the photographers of Picade do as much of the agency work as possible, having "automatic" computer processes handle as much of the rest as possible, thereby leaving essentially only a basic management, marketing, and negotiation process to be "hired out" to paid human beings.
Principal among the agency "work" items to be handled by individual photographer's was the preparation of images for the website and delivery to end customers who would license the images, something that in other smaller agencies is responsible for as much as 75-80% of the total agency "overhead".
This requires the Picade photographers to properly prepare their images for final delivery to customers, embedding in the images sufficient copyright licensing and creator identification metadata to prevent "orphan works" issues, as well as automating image metadata input into Picade's and customers DAM computer systems to enable both the Picade website search routines to function, and to enable proper management of the images within a marketing and licensing context.
Among other things, this process is extremely equitable in sharing costs and work among Picade photographers: everybody's images benefit exactly the same, and no one bears the cost burden of processing or preparing images for another photographer who uploads more images than they do, because some photographers by the nature of their work generate very few images, while others generate very large numbers of images.
There have been some significant overall agency workflow errors resulting with this approach, some of which have been mitigated against by redundancy built into the original specifications for Picade photographers, and some of were not foreseen to be a problem or which subsequent events have caused to become a problem which have required significant manual processing to overcome.
Currently the image filename we have Picade photographers create for images uploaded to Picade's PhotoShelter MU account is composed of three parts and is supposed to fit a rigid, 21 total characters long filename format plus the ".jpg" filetype suffix:
- The 4 character Picade photographer Id
- A single character agency exclusivity flag: ['E'|'N']
- Exactly 16 characters for the photographer's personal ImageId, padded on the left as needed with zeros, using only the specific character set [0..9,A..Z,'-']
- The four character ".jpg" filetype suffix
The issues with the images
Now at 7000 images uploaded to PhotoShelter MU for Picade there have already been:
- A single case of an image incorrectly tagged with an incorrect 5 character photographer Id in the filename - if this information is not correct we can never properly pay the photographer for their share of licensed uses!
- 45 cases where the image filename was the photographer's own raw filename with no preparation at all (including not having the photographer's Id and exclusivity flag character embedded - see item #1 above about payment)
- Nearly a thousand images where the image filename contained characters other than the approved character set of [0..9,A..Z,'-'] or that was incorrectly formatted, potentially causing problems with end users internal DAM systems, and definitely with Picade's.
- Nearly 1500 images where there were incorrectly formatted or missing photographer contact metadata embedded.
- About 400 images that were uploaded more than once, in once case, 5 times
A Picade photographer may upload an image to Picade's MU account on PhotoShelter more than once and we have to properly deal with this both in our own record keeping and on PhotoShelter to track metadata changes.
On PhotoShelter, when a photographer uploads images to their personal account, they are given the option to "overwrite" a previously uploaded image of the same filename.
When a photographer copies an image multiple times from their personal account on PhotoShelter to Picade's MU account on PhotoShelter, they do not get the same "overwrite" option, with the result that Picade will have multiple copies of the same image in its MU account that will appear multiple times in search query results for customers, and that requires substantial human effort to first identify, then fix the multiple uploaded image issues.
The Picade photographers may also make errors in naming the file (viz: the filename is too long or too short, may contain "disallowed" characters, or even be improperly formatted), or changes in the filename for a given image (viz: making an 'Exclusive' image 'Non-Exclusive' or vice-versa, with a matching change in the 5th character of their filename, but with the actual image remaining the same) becuase of changes in agency exclusivity.
Actually determining that a given image is uploaded under different filenames, or different images are uploaded under the same filename based solely on the visual aspects of the image is very difficult and consumes a lot of computer resources and programming to do automatically, and as we get more and more images we will in fact likely have to implement such a solution, but at present we are still using the standard issue "Mark I, Mod I, Eyeball, Human" to do so.
What we will be doing about the issues
As a result of these issues and other considerations external to Picade's and Picade's photographers workflow, it is now necessary to change and simplify the filenames of the images we present to the general public for licensing consideration, thereby requiring renaming of the images and thereby simplifying some issues of the existing problematic photographer image naming: we can control the exact character set and length of a delivered filename so as not to cause problems for an end customer or for ourselves.
We will not change Picade photographer requirements for naming image files for upload to Picade: these requirements contain necessary information redundancy to help eliminate human errors and need to be preserved.
On the other hand, for online imageid display and delivery to customers and for internal marketing management we are going to be renaming all Picade images in the following manner:
- The 4 character Picade photographer Id
- A single character agency exclusivity flag: ['E'|'N']
- A 7 character unique numeric ImageId encoded in base 32 that uses a specially selected character set to eliminate multiple human and computer input related errors. The character set we will use is "0123456789ABCDEFGHJKLMNPQRSTVWXY", and a six character "number" in base 32 using this character set can encode 36,507,222,015 possibilities before we run out of numbers, and a seven character "number" can encode 1,168,231,104,511 unique possibilities: it will be a very long time before we exceed those limits!!!
- The four character ".jpg" (or possibly in the future: ".tif") filetype suffix
That will mean that we will maintain internally the Picade photographer's own originally created and uploaded image filename in our Digital Asset Management system as image metadata, but will present a different, more easily read one to customers that will eliminate certain types of common errors. We will also make available to Picade photographers a cross reference between the ImageId that they uploaded, and the new ImageIds that Picade will be utilizing, and all reports of licensing to photographer's from Picade will contain both references.
For example, a recently uploaded image:
uploaded to Picade's PhotoShelter MU account by Picade photographer
Richard Anderson
as the 6888th unique image for Picade will become internally:
and will eventually get downloaded to end users as:
www.picade.com-1028EA000757.jpg
or
www.picade.com-1028EA000757.tif
This image file renaming alone will require that we rewrite the embedded
metadata in the images (without jpeg image re-compression/re-encoding by using the
Bert Bos
wrtjpgxmp
tool I recently
wrote
about) to account for the embedded imageid title metadata attribute.
At the same time we will take the opportunity to write into each image a
uniformly formatted set of photographer contact and licensing metadata
to correct missing or poorly formatted metadata, and to eliminate Picade
specific metadata or camera and software processing metadata (added by
some Raw image processing programs: 1100 lines of unneeded metadata text
in the case of an image recently uploaded), and default
whitespace padding from the images delivered to customers and stored online.
Among other things this will save a few thousand bytes of storage and bandwidth
for each image stored online, which in the long run really adds up.
We will upload these rewritten and slimmed down images to our backup archive
on Amazon S3, and finally to Picade's PhotoShelter MU account after first
deleting the existing photographer named images (or subsequent Picade renamed images),
thereby cleaning up online issues with our images on PhotoShelter in terms
of duplicates, and guaranteeing proper image metadata for delivery to customers.
This is a lot of extra work to do over what we originally envisioned
when Picade was founded, but it will result in far fewer problems with
our Digital Asset Management system, our customer's Digital Asset Management systems, and with PhotoShelter's
systems.
Once we implement this as part of our standard agency workflow, and get
the current relatively small set of images re-processed and re-uploaded,
it will not add a lot of overhead work for the future. I would hate to have
to think about reprocessing hundreds of thousands of images, or even
millions (which of course where we want to get to as an agency <GGGGG>).
-30-
Michael Beasley (who just wants to get back out shooting pictures!)