On the virtual hat tree in my virtual office there are many virtual hats that I wear, which is a way of saying that I often function in many different professional roles while working for clients (which includes myself: the hat of a client is on that virtual hat tree too).
Today I'm going to be putting on several of those hats to begin to talk about the new world that as working photographers we now find ourselves in, and especially what it means with respect to backup and reliability. It's long and geeky and technical but fundamental, and I'll try and make it as simple, understandable, and as painless as possible so we don't have to repeat it. Unfortunately, I'm not a Hemingway, so bear with me please!
For example (wearing the hat of the Chief Financial Officer of the firm "Me, Myself, and I, LLC"): clients do not like it when you really mess up and lose or fail to produce something that is important to them, and they tend not to pay their invoices and to spread the word around that you are not a reliable business provider. This generally makes your business owner(s) unhappy, because they tend to be losing money, and your business employee(s) unhappy as well, because their paychecks tend to be thinner or non-existent.
As a result (grabbing the hat of the location oriented photographer from the tree) I tend to shoot with multiple camera bodies, bracket exposures, and when shooting film I shoot test Polaroid's, shoot the same setup on multiple bodies and multiple rolls of film, snip test rolls, and split the take and process half the take before committing the other half of the take to the tender mercies of the underpaid and overworked guy back in the darkroom (who might be me wearing yet another hat).
When shooting still life in the studio (grabbing another hat), I tend to leave the set setup in the studio until the film finally come sback from the lab the way I want it, or I shoot brackets and do sheet tests or roll snip tests and split processing of the film take just as if I was on a location shoot.
As working photographers, we all do these things, because we've learned (sometimes the extremely hard way) that they are a functional necessity for getting paid and continuing to be able to work.
(Grabbing the hat, slide rule, and pocket protector of the graying nerd ÜberGeek) Why as photographers should we work in the digital world any differently than we do in the analog film world??
I'm not saying anything new when I say that one of the major paradigm shifts in our universe as working photographers has been the arrival in full force of digital photography and all the new and changing tools and technologies that are inherent in that new way of working and creating. It is not going to go away, much as many of us might like it to, it is a permanent change in the way we have to work, and we need to deal with it appropriately.
At another time, I will talk about digital shooting workflows to implement similar safeguards to the practices mentioned above, but today I'm going to begin to talk about what we do with our images after we create them and we have them "back in the office/studio/...": filing them away so we can reliably get them back at a later time
Back in the halcyon days of the 1990's, Brian Seed published a subscription newsletter on the stock photography industry called Stock Photo Report, and I wrote some initial tutorial articles on digital imaging, with the idea we would try and get ourselves and Brian's subscribers ahead of the coming digital imaging wave. We also tried to start a new subscription newsletter or magazine called Digital Imaging, but we were ahead of our time and it did not succeed in getting off the ground.
In 1992, and later in 1998, Brian wrote in his newsletter about "Archival and Long Term Storage of Photos", and mentioned things like the dangers of formaldehyde and hydrochloric acid vapors from filing enclosures and what they could do to your color film.
In the 1998 article Brian mentioned US$400 CD writers, and US$1/disc CDRom media storing 650mb of digital images saying "Digital imaging has come of age ...", and offering some guidance on how to store and handle writable CDRoms for long term use as a way of protecting your digital images. You all know what media costs are now, and how much larger the media are, and how many more "things" we now create and store digitally.
(Putting on the hat and long white robe of a "Geezer"/prophet/sage/seer/"Consultant" for the coming "sermon on the mount") Digital images are very different from physical images in many ways, and can theoretically last indefinitely with no degradation, but one of the ways that they are not as good, not as archival as physical images is that they are infinitely more "fragile" and much easier to lose or destroy than a physical piece of film.
There are two components about that last statement that are key to our (and your own) stock photography business:
- The "volatility" of digital files (both images and other appropriate software or "data").
- Finding digital files on your storage media once they have been created and saved.
and I will address item #2 in yet another posting, but I want to focus now on item #1.
We all should know by now how easy it is to delete a digital file accidentally, or to have digital disk media fail for a variety of reasons and become "unreadable". With the increasing capacity of storage media available in a given small physical volume, it is also easy to physically lose through theft, fire, flood or other hazard significant parts of your digital image "archive" in a single incident.
Like many, I came to photography professionally from a different initial starting place: I was going to be a professional physicist working on the extreme outer edges of high energy particle physics.
Physics is about understanding and mathematically describing how the Universe seems to work, and one of the requirements for working in that field was and is a very good understanding and use of computers: your "experiments" might involve literally trillions of billions (10e12 * 10e9 = 10e21 = 1,000,000,000,000,000,000,000) of "events" that would give you maybe millions of "event data points" to work with to maybe find a single "event" that would prove or disprove a theoretical assumption about how the universe actually seems to work. To sift through the data from such an experiment in a relatively efficient manner, you have to effectively and efficiently use a computer.

My first 'personal computer' at Columbia University, later at Tulane, then IIT: the IBM 650 with 2000 words of 12,500 RPM magnetic drum memory (random access time of 2.496 ms!), photo courtesy IBM.
As a result, I started working with computers very early on, literally decades before most photographers started using them other than as a photographic subject, and very early learned the fundamental tenants of backup:
- All digital files are very "volatile" and easily destroyed or deleted.
- All digital filesystems and digital storage media that contain them are both volatile AND degrade over time.
- Digital storage media containing your digital files can be lost, destroyed, or become unreadable.
- The hardware used to access digital files and digital storage media may become obsolete over time, or fail, and therefore become unavailable for use.
- Software (including operating systems) used to create, read, and write digital files on digital media is dependent on the hardware used at the time, and may itself change over time in such a manner that it can no longer reliably read files previously created by it.
- Only files that are backed up can possibly be recovered if damaged, lost, or destroyed.
The answer to the first three items above is common sense redundancy: multiple "backup" copies of important digital files on multiple media located in appropriate physically separate location(s). How many "backups" you make, how you make them, how you organize and catalog them, and how and where you store them is dependent upon what risks you are trying to avoid and mitigate against and how easily you want to be able to find and retrieve a specific backed up file.
Items 4 and 5 are less commonly thought about by most people, but should enter into your backup planning as well: you may want to have spare current hardware and software, or to move or "migrate" your "static" backups to new hardware and or software as time passes so that the files themselves are still (easily if at all) accessible at a later date or if you have catastrophic hardware failure.
Item 6 is also common sense: no backup "system" or workflow is going to help you recover a file, if the file is not backed up in the first place by the constant, consistent, and appropriate application of your backup system/workflow to your digital work.
Besides the work I am currently doing building Picade's business systems and websites, I've recently been building some new server systems for myself and clients, and as some of you may know or have already guessed from the preceding, I'm very vocal about system reliability and recovery planning, and backup of critical files (like our Picade archive of master digital images files): a favorite saying of mine is "belt, suspenders, and hold onto your pants with both hands" in describing backup systems and system recovery planning.
What follows is based on practical real world experience rather than theory in keeping my trousers up.
For servers and for workstations where functional reliability and instant availability for business use are essential, for storage I use RAID 1 [disk mirroring], preferably hardware based, for my operating system and virtual memory cache filesystems, and hardware RAID 5+"online hot spare" for my data filesystems that hold anything that I cannot afford to lose or that would take longer than a single day to recreate in total from scratch.
As reliable as these two RAID type filesystems are at preventing data loss from the usual types of computer hardware failures, they will not protect you from catastrophic system hardware failure (think lightning strike), fire, theft, or the 747 or asteroid that falls out of the sky and onto your house or studio. They also will not protect you when, in a moment of weariness you overwrite some irreplaceable file: oops!
That means I make backups of the critical stuff: periodically both in an automated fashion onto other systems on my network, and onto some form of portable media and store them somewhere "offsite" so that anything that takes out my office will not destroy my files.
Offsite is relative: if you are worried about an earthquake, a "Dirty Bomb" terrorist strike, or a Chernobyl type event contaminating a major metropolitan area or region, you want to have your offsite backups hundreds if not thousands of miles away, otherwise a few blocks away may suffice. (A sotto voce note: for Picade, we have three primary imagefile archive locations: our office in Chicago, and at PhotoShelter's facilities in New York and San Francisco, and I may add at least two more for good measure in the near future.)
When creating backups, I only use the highest quality media, and I always verify them by reading back the backup and comparing the backed up items to the originals: if they compare 100% identically you've got an acceptable backup, anything less is worthless because you can't later trust the backup to be able to reliably retrieve anything if any single item is not reliably retrieved.
The offsite verified backups are stored in a sheltered, dust and moisture free "shirt sleeves" environment along with the complete minimum necessary software and hardware to use the backups in restoring a system. Periodically, the backups are evaluated again to make sure they are still retrievable, and if necessary, migrated to new media or systems or converted for new software environments.
Thus endeth the sermon on backup.
Notice that I didn't talk about operating systems, manufacturers, or technology save in the most generic manner possible: the abridged sermon on backup is applicable to EVERY computer system that we use, even our iPods or other personal media players!
In the past, I tended to use tape based backup systems: 9-track, QIC-150, Exabyte 8505, DAT, DLT, and LTO Ultrims in both single tape and autoloader "library" systems.
Unfortunately, tape systems which have been the stalwart of backup since the dawn of the computer age are rapidly falling behind the size of the data systems that need to be backed up completely, and the amount of data that must be backed up in a single day. Today it is not terribly uncommon to find individual creatives, who have disk filesystems of several terabytes, who need to backup every day more data than a tape drive can write to and read back from tape in 24hrs.
Tape is slow, and frankly, not very reliable: a few tens or a hundred passes of the media over the read/write heads and you begin to have media flaking and "dropouts" where you loose information permanently. Accessing a backup on tape is very slow because tape is a linear sequential, not random access sequential medium, so access to the last file written to tape from the head of the tape is always dramatically slower than accessing a file on the head of the tape.
The latest high speed, high capacity tape drives and libraries necessary to keep up with the burgeoning disk systems now cost nearly as much or more than the computers that they backup, and the media costs are nearly as expensive as the hard disks as well: a basic HP LTO-3 400GB tape drive costs US$2,998.00, and a Maxell 400GB LTO-3 Ultrim tape to use in the drive costs US$69.95, but a Seagate Baracuda 7200.10 ATA-6 400GB hard drive can be had for only US$89.96 (Frys.com, 17Oct2006, I bought 10 of them for US$899.60 delivered).
The HP drive can read/write uncompressible data (like a tiff, psd, or jpeg imagefile) at 80 megabytes/second, so backing up a complete 400 GB hard drive takes a minimum of ~10,000 seconds (2 hrs 47 minutes) for a full verified backup if the host system can sustain such a data transfer rate.
Since my existing 2TB (2 terabyte) RAID5 image archive systems [based on 7200 rpm SATA150 drives] can deliver a sustained raw read rate of about 64 mb/s [they are designed and configured for low cost and high reliability rather than maximum performance], and the OS and the backup software must do some work, it is more likely that backing up completely and verifying the 2 TB RAID archive to LTO-3 tape will take about 2 * 5 * 1.25 times as long as the above estimate for a single 400GB hard drive.
Put another way, it takes about 34 hrs 43 minutes minimum and 10 tape swaps for a full verified backup of the entire 2TB volume exclusive of the tape swap time, and means a person has to be available about every 3 hours to do the tape swap. Adding a tape library/autoloader to the tape drive to automate tape swapping will easily more than double the cost of the tape system (about US$7500 for an HP LTO-3 library system and drive).
On the other hand of course we have the current crop of large capacity hard drives, viz: a 7200 rpm Seagate SATA2 750gb hard drive (currently available for US$280). With these hard drives I can build a very capable host computer system holding 4TB of usable space for about US$3500, or upgrade my existing image archive RAID5 systems from 2TB to 4TB formatted capacity for about US$2240.
If I needed to backup my entire RAID5 system at once, given the cost of an "adequate" 2TB tape system, and what it would cost to duplicate my entire RAID system at 4TB capacity (including host computer system which I can use for other distributed processing tasks) AND upgrade my existing RAID5 system to 4 TB, for my needs it's a no-brainer: it costs less to use RAID5 hard drives for 4 TB capacity upgrade of my existing system AND for backup than to use the tape drive and carousel hardware for 2TB backup only.
Since most of the capacity of the RAID5 system as used by a stock photographer does not all change at once, it is both feasible and reasonable to do incremental backups of new or changed imagefiles to externally mounted hard drives that can be swapped to offsite storage: I currently use 400 GB Seagate drives for this purpose and can easily add new capacity incrementally at relatively low cost (say about US$150 for a Seagate 400 GB hard drive in an external USB2/Firewire400 enclosure). Total offsite backup cost for my 2 TB RAID5 in this manner is just about US$750, and for both offsite and local backup just US$1500 and is what I currently do. The only caveat is that you should do a complete read of the backup drive every six months or so to "exercise" the drive and keep it properly functional and accessible according to the geeks at the drive manufacturers.
HTH.
Michael Beasley