Amazon Deep Archive is a Glacier. It’s not an iceberg for tape.
My news alert system was sent into orbit this week by announcements coming out of Amazon’s re:Invent conference in Las Vegas. I keep abreast of all the latest headlines pertaining to the tape industry and Amazon’s Glacier Deep Archive reveal certainly triggered a flood of articles, some of them negative about the prospects for the tape industry.
Look beneath the headlines
But before tape writes its obituary and signs off with a valedictory ‘so long and thanks for all the bits’, I think it’s worth taking a closer look at the details of Amazon’s presentation. If you look behind the headlines, I think things aren’t as dark for tape as they have been portrayed.
The headline that every commentator focused upon was the much lower cost of storing the data. Glacier Mark I is Amazon’s original cloud archival storage offering and costs $0.004/GB/month ($4/TB/month). Glacier Deep Archive (GDA) would undercut this by a significant amount, with costs of $0.00099/GB/month. That’s a 75% reduction. Not too shabby.
On the other hand, every storage technology has been driving down the costs of actually holding the bits and tape certainly hasn’t been standing still. With each new iteration of LTO tape technology, the cost per GB is substantially lowered. At launch, HPE’s LTO-8 List Price undercut the equivalent LTO-6 value by the same amount that Deep Archive surpasses Glacier Mark I.
So I don’t really see anything new or different here, although I admit that “Amazon cloud storage service keeps pace with magnetic media areal density improvements” isn’t much of a headline.
But (and it’s a ‘But’ written in capital letters the size of those that look down from the Hollywood hills) what really matters, and why all the headlines presuming the demise of tape are acutely premature, are the retrieval costs!
Beware of building your archive over a sinkhole
As Glacier Deep Glacier storage pricing pretty much sums up, storage from the point of view of a cloud provider is effectively a commodity. The profitability comes from charging customers to access their data and it’s here that end users need to be careful when they consider their long term archival options. For the unwary, retrieval costs are the sinkhole into which the new, improved dream home of cloud storage will inevitably fall without careful analysis.
In reality, customers need to access their archive data with a degree of frequency. Not with the urgency of a backup application, but with a reasonable degree of flexibility. Amazon’s own TCO example for Glacier Mark I proposes 10% retrieval rate per month, although in HPE’s experience, the requirement to access archive data is typically much higher.
Amazon’s retrieval pricing is complicated because Glacier has three levels of service, but roughly, the cost of retrieving just 10% of the data using the middle, or “Standard” class is 25% of the cost of the storage. Increase the retrieval percentage to 30% and the cost of recovering your own data becomes 75% of the cost of storing it, as you can see from the example I’ve created below.
We don’t know anything about the retrieval costs for Glacier Deep Archive because Amazon said nothing at all about them. Maybe improvements in storage density will permit Amazon to reduce these charges. And then again, maybe they won’t. For now, assuming the same price structure, I believe that the combined cost of storage and retrieval for Glacier Deep Archive will still be more expensive than an equivalent tape solution over the same ten year period. The only difference is that an even higher percentage of cost will be in retrieval, not storage.
For archive TCO, consider tomorrow as well as today
And of course, we haven’t spoken about the real iceberg on the horizon: data growth. A 1 PB archive growing at 50% CAGR will be 40 PB in ten years time. Accessing just 10% of a 40 PB archive at today’s retrieval prices will cost hundreds of thousands of dollars. Proponents argue that moving archive data to the cloud reduces the burden on CAPEX budgets, but will OPEX budgets support six or even seven figure sums for retrieval? Storage may continue to be commoditised but cloud vendors have to make money somewhere.
One of the advantages tape will always have over a cloud service is that accessing 1% or 100% of an archive makes no difference. Retrieving just a fraction or pulling all of it basically costs the same amount of money. Even factoring in the cost of offsite storage and IT administration, LTO tape is still likely to be cheaper than services like Glacier and its successor.
(And I haven’t time or space to get into Early Deletion Fees where you are charged extra for pulling data from the archive if it’s been stored for less than 90 days.)
In summary, although Glacier Deep Archive is an interesting development, the ‘tape is dead’ headlines, like all those that preceded them, may well be premature. I agree that cloud services have value for what I call ‘short term archiving’ where you are storing (in comparison to a full archive) relatively small amounts of data to complement backup services and extend Recovery Point Objectives beyond the limitations of on-prem secondary storage e.g. HPE’s CloudBank solution that broadens the capabilities of the HPE StoreOnce array.
But when it comes to storing all of your archive data - petabytes and exabytes of infrequently accessed, but essential information - for a very long time, tape still seems to have a compelling cost advantage over cloud. And, with a restore promise of twelve hours, is it possible that far from being the destroyer of tape, GDA will be using lots of it, in a big way?
Using tape to kill tape. Is that a headline or a logic puzzle? You be the judge.