Dodging the archive data TCO trap!

How using LTO tape can lower the long term cost of archiving data


This article is part of a series examining the value of tape in the era of cloud and object storage. I concluded my last entry by stating that tape had compelling advantages over the cloud for archiving over periods measured in years and decades. In my next couple of posts, I want to look in detail at the evidence for that claim.

Now, critics are often quick to cite the upfront costs of tape hardware as a reason for moving their archive data to the cloud. And over a short timeframe, it's true that the cloud will always be cheaper than any alternative form of cold data storage - such as tape or Hadoop Distributed File System (HDFS). That's because of the almost complete absence of capital expenditure (CAPEX) costs. For sure, there is an up front investment in putting the data in the cloud, and then a monthly cost for storing it there, but at the outset these charges are far lower than the tens of thousands of dollars required to purchase a tape library. It’s this flexibility that makes a service like HPE CloudBank such a good investment for the ‘deep backup’ role I discussed previously.

But obviously, the benefit of a CAPEX investment means the customer owns the solution. There is no ongoing monthly rental fee, as with the cloud. And what analyses of cloud versus tape TCO often overlook is the potentially significant cost of getting that data back again.

Azure Archive Blob Tier Price List
Source: Microsoft Azure Archive Blob Tier Price List, March 2019
One of the challenges with TCO comparisons is that there are many variables that one can model, each bringing different degrees of influence to the result. No two customers are ever alike. But if one considers just the primary cost of storing and retrieving data using a service like Azure, then within five years, accessing a small percentage of archive data on a regular basis - let’s say 3% of 200 TB - will cumulatively be more costly than accessing it from tape, assuming the archive grows at a typical 40% CAGR.

In the example above, the cumulative cost of storing and retrieving that 3% of data using the public cloud is more than $100k across five years. In comparison, an HPE StoreEver MSL6480 tape library using LTO-7 with 1 PB of storage will cost half that amount. And this takes into account ancillary tape costs like offsite vaulting and IT headcount to manage the process in-house instead of outsourcing.

In reality, however, 3% is a pretty conservative estimate since many customers require far higher percentages of data to be retrieved frequently. The gap is even wider if one increases the retrieval rate to a mere 5%. And obviously, the greater the volume of data, the greater the cost of retrieving it will become. Consider that 200 TB today will be over 1 PB in six years’ time, and 4 PB in ten years if it grows at 40% CAGR.

A recent study by analysts, ESG, found that almost half of respondents continued to leverage tape for long-term storage. Perhaps the reason for this is that ESG’s independent analysis estimated that an LTO-8 tape solution could offer an expected cost of ownership 66% lower than an all-cloud alternative over a ten year period.

“Over an extended period, LTO technology offers a reliable and cost-effective data retention solution compared with current generations of all-disk and all-cloud storage solutions. The latest generation, LTO-8, offers the reliability that organizations expect from tape media, while offering increased capacity and performance at a lower price per TB.” ESG, Quantifying the Economic Benefits of LTO-8 Technology

In summary, customers who intend to put all of their archive data into some kind of public cloud would be well advised to model the TCO for different retrieval rates based not just on the data they hold today, but the data they expect to have in the future. In most cases, they will find that tape remains far and away the most cost effective solution for the long term storage and retrieval of the infrequently accessed, unstructured information that eventually constitutes about 60% of total business data.

