To some extent, the topics of the previous articles in this series – cybersecurity or total cost of ownership - align closely with recent studies investigating IT priorities. On the face of it, reliability might seem to be a bit of a given: businesses don’t backup data for fun and it’s “data protection”, not “data ‘we’ll do the best we can, boss, but we can’t promise anything’.
That’s one of the reasons I will sometimes describe HPE StoreEver as being in the insurance business. Or the “sleep soundly at night” business. Nobody wants to buy insurance but if you need it, and you haven’t got it, it’s too late!
So the subject of this blog is going to be tape reliability. If you are putting your data onto an HPE LTO tape cartridge, how confident can you be that you will get it back, whether that’s three months, three years or three decades from now?
Let's bust some myths!
First of all, let’s tackle the elephant in the room and bust a few myths: a lot of people think tape is unreliable and prone to failure. The reasons for this are many and varied but even in 2022, it is possible to find blogs on the internet which state “According to a recent survey [my italics] of IT Managers, nearly half of the respondents experienced a high unsuccessful backup rate using tape drives”.
That “recent survey” quoted in 2022, however, was actually first cited in an Infostor report from November 2004! That's right, as recent as in eighteen years ago. Back then, nobody had heard of fake news or alternative facts but since then, claims about tape unreliability increasingly begin to resemble social media exaggeration.
A number of years ago, it was an easy Google search to uncover the apparently damning statistic ‘Gartner is reporting that 71% of tape restores fail’. Then, as now, Gartner were one of the leading technology analysts and naturally, such a potent fact seemed to make the argument for moving away from tape seem pretty conclusive.
Except it wasn’t. Because it wasn’t true! Gartner never said that 71% of tape restores fail. W. Curtis Preston wrote a typically expert and detailed blog thoroughly debunking the claim (but not before a great many observers, many with tape alternatives to sell, latched on to this bogus statistic as if it were the Holy Grail).
Around this time, the US National Energy Research Scientific Computing Center (NERSC) began to migrate its existing tape infrastructure to a newer tape archive. The team at NERSC migrated a total of 40,489 tape cartridges during this time, which involved reading 22,065,763 metres of tape – the same distance as flying from San Francisco to Tokyo and then onto Paris and finally Nova Scotia. The tapes ranged in age from two to twelve years.
During this massive migration process, the NERSC IT group led by Jason Hick observed the centre’s actual tape data reliability within its active archive. The findings flew in the face of conventional wisdom: 99.9991 percent of tapes were 100 percent readable, representing a 0.00009 percent error rate. Of the more than 40,000 tapes that were read, only 35 contained some data that couldn’t be accessed. The unreadable data accounted for only 178 metres of the 22,065,763 total metres of tape.
Fast forward to the present, however, and you can still find arguments based around these origin myths of tape’s unreliability. To spare embarrassment, I am not going to name my source but it comes from a 2022 disk vendor website.
“However, tapes are increasingly susceptible to contamination and other physical damage, since an LTO-8 tape is now three times as long and half as thick (thin) as the first generation at around 1 km in length and 5.6 µm in thickness and offers around five times the data density.”
UBER: no, not the car company, but media reliability!
Tape can sometimes seem like an old-fashioned technology so let me use an old fashioned word to describe the previous quote: it’s gobbledygook.
It’s categorically untrue that because tapes are longer and thinner, they are more susceptible to damage. On the contrary, innovations in binder, lubricant and coating technologies means that these longer, thinner tape cartridges are no less reliable than their ancestors. The physical attributes of longer and thinner are irrelevant to the discussion about Uncorrected Bit Error Rate (UBER) which is what we should be focusing upon when we are talking about reliability in data protection.
And when we do look at UBER, what we see is that not only has tape reliability been improving through the generations, but that LTO-9 tape reliability is orders of magnitude better than that of SATA hard disks. Here we need to get a little bit technical to explain the differences.
Ultimately, all storage technologies have a raw UBER. In magnetic recording, the read head checks the information that has just been written to the media. This is an analogue signal that gets digitised and the read channel processes the signal and tries to determine if it’s a zero or a one. As the bits become smaller (to strive for higher capacity), the signal gets noisier and the read channel makes more errors than normal. HPE StoreEver LTO technology engineers around this using Error Correction Codes (ECC). Tape has two ECCs, C1 and C2, which are orthogonal (or at right angles) to each other and these combine to make the error rate much better than these raw bit error rates. With LTO, the areal density is low enough to allow data to be written in relatively large blocks, which means larger and more efficient ECCs.
Simply put, for every 10^20 bits that you write using HPE LTO-9, one of these bits will be an error. Sometimes in shorthand, it can be difficult to get the true measure of a statistic so consider that 10^20 is:
1 unrecoverable error in 100000000000000000000 bits of data!
With a SATA HDD, because areal density is high and bit and block sizes are smaller, it’s significantly harder to get the same reliability, so the error rate is typically only 10^15. I appreciate this might not seem like a very large difference but consider that this is actually four orders of magnitude, which means LTO technology is 10 x 10 x 10 x 10 x 10 times more reliable than disk: that’s one hundred thousand times better! In terms of actual data being stored or written, 10^15 means that there will be an uncorrectable error on the disk for every 125 TB that you write. Whereas 10^20 is one bad bit in 12.5 ZB or 12.5 billion TB. You are more likely to be eaten by a shark or win a multi-million dollar lottery payout than encounter an error that the LTO-9 ECC couldn’t resolve.
At these extremes of reliability, LTO technology is almost making unrecoverable errors in writing data (assuming all other operating factors - temperature, humidity, duty cycle - are in scope) a once in a lifetime event!
Can you change the laws of physics?
Of course, there are other factors that affect the reliability of magnetic media. In one of the earlier articles in the series, I explained how LTO tape has a much lower areal density than a hard disk drive of comparable capacity. Both tape and hard disk store data in similar ways: namely encoding information representing 1’s or 0’s in small magnetic domains on a thin film of magnetic material - aluminium platter or plastic tape.
The thin film of magnetic material is made up of grains or particles and the 1’s and 0’s of digital information are represented by transitions in the magnetic state of these grains: north to south or south to north. In a perfect world, the transition would be a perfectly straight line, but because the magnetic media is made up of these small grains, the transition has roughness, which causes noise. So as engineers shrink the size of the bits to increase capacity, they also have to shrink the magnetic grains to maintain a constant Signal to Noise Ratio (SNR).
The challenge is when engineers shrink the grains, there is less magnetisation energy so the encoded information becomes less stable over time - e.g. the 1’s and 0’s begin to flip their state leading to data corruption. In magnetic media recording, this is known as the superparamagnetic limit. In the past, HDD and tape vendors have tuned the material so it has higher coercivity, which means there is more magnetisation energy in a small domain. Higher coercivity maintains the stability of the metal particles but also means it’s more difficult to write the data in the first place! What you give with one hand, you take with the other. Getting the balance right requires tremendous engineering precision and innovation!
But even with their undoubted technical expertise (and although I am a tape geek, the incredible engineering ‘know how’ of HDD manufacturers is no less extraordinary than those of my associates in the LTO Program) HDD technology has reached the point where the grains can’t be made smaller without having to increase coercivity by such a degree that you would not be able write data anyway.
This is why new ‘booster’ technologies like HAMR and MAMR are now being talked about as a pathway to much higher areal densities and therefore higher capacity disk media. They enable HDD technology to overcome the challenge of the superparamagnetic limit but objectively, questions remain about how efficient and effective these approaches will be in delivering the much higher capacity points required in the era of big data. Generally speaking tape has a longer shelf life for holding on to 1’s and 0’s because it has better coercivity potential than HDDs to begin with. For this reason, the capacity points mapped out in the LTO roadmap for future generations are much higher than those outlined by hard disk vendors. In the end, Scotty was right: you cannot change the laws of physics.
Since the early days of LTO technology, HPE has been tracking the performance of eight original LTO-1 cartridges that have been kept under archival storage conditions. These cartridges were all manufactured in June 2003, and a full capacity (100 GB native) backup of data was performed in July 2003, using a HP LTO-1 drive.
The cartridges have since been stored in the recommended archival conditions for 18 years, and periodically the original data on those tapes has been restored using an LTO-2 drive.
In tests conducted last year (2021), this venerable LTO-1 media displayed practically zero degradation in terms of performance and reliability, with significantly better results reported than the minimum threshold required for the HPE LTO Ultrium media brand specification in 2022. All data was successfully retrieved for all eight cartridges, with stable transfer rates throughout. In addition, the lower level BER demonstrated significant margin remained, with no discernible difference from the initial read error rates. This indicates there has been no detrimental effect from archiving the cartridges. In everyday terms, these 2003 tapes are indistinguishable from ones that were made last week.
I only wish I could say the same about my own personal ability to withstand the passage of time!
Finally, some might argue that the advent of cloud technology makes the reliability (or not) of individual media types irrelevant. I suppose that partly depends on how confident you are in your cloud vendor’s processes when it comes to always using the most reliable components and to maintain the same best practice for data protection that IT teams inside companies have been traditionally tasked with. At the end of the day, bad things will always happen to good computers - the rule of “stuff happens” - whether your data is on-prem or off-prem.
And as numerous examples have shown, relying on cloud technology to be fool proof is likely to be tested by reality at some point. Even the largest cloud vendors suffer from hardware failure and human error: i.e. stuff happens. Although these outages are not always related to storage technology specifically they illustrate that there has to be data protection beyond the cloud. If something as gigantic and central to the whole Internet as AWS, for example, can suffer downtime, then just about anything can be regarded as a potential point of failure. The point is not that tape should be seen as the most reliable storage technology, more that its supreme reliability capabilities ought to give users confidence that it can be deployed as the last line of data protection for petabytes of their archive data: secure, scalable, cost effective and - remember, we’re in the insurance business - extremely reliable.
In the next article, I’ll be looking at tape’s environmental credentials and discussing how deploying HPE StoreEver might help you reduce the amount of energy you consume when storing archive data. But in the meantime, feel free to give me feedback in the comments here in LinkedIn or by following me on Twitter @tapevine. Thank you once again for reading!