The Archive Superhighway

You can build it with LTO tape!

download

In my last article, I compared the total cost of ownership for tape to that of the private cloud solutions, and examined why the challenges that data scientists face when managing and analysing data are not really related to the time to access archive data, where tape is often portrayed as deficient.

But in addition to its cost of ownership advantage, LTO tape also has some innovative tricks up its sleeve which defy conventional wisdom about speed or performance.

A haystack full of needles

On-prem object storage also requires a high performance medium like a flash disk or array to host and manage the index. Because data in an object storage system is not organised in directories, as with file-based storage, every file is self-referencing and can only be located using its metadata. Whilst this potentially creates a very fluid and intuitive search capability, it means that the potential to find individual files is entirely dependent on the quality of the index. It’s the metadata that allows the user to find the needle in the haystack, or rather, the right needle in a haystack of needles. And once the index contains millions or even billions of items, it is even more crucial that the metadata is hosted on high performance storage to avoid issues with latency or consistency. These can arise if files are updated in different places before the system has had the opportunity to capture the alteration to the original.

The combination of flash and object storage may provide faster and more simplified access to archive data than is possible with tape stored in an offline vault. But depending on the type of data - if the user has larger individual files that are more conducive to streaming versus clusters of smaller files located at random within the dataset - a nearline LTO tape system accessed using the native OS via Linear Tape File System may be no more complex, and provide better retrieval performance, than an object server built using relatively inexpensive SATA HDDs. And in future, the performance of tape and low cost HDD may actually widen.

An often overlooked feature of the HDD and tape roadmaps is that while the Information Storage Industry Consortium (INSIC) projects that HDD performance gains will slow down and may be as low as 5% CAGR by 2025, LTO tape data rate growth will be IRO 20-25% in the same timeframe. This makes tape better suited for retrieving huge quantities of unstructured data for transfer to a faster platform, like high performance disk or flash, for analysis. Simply put, to transfer very large amounts of data to or from a state of suspended animation, tape will not only be cheaper but faster than other alternatives that we currently know about.

Redundant Array of Independent Tape

And that innate advantage may be intensified thanks to a system of deploying tape that is now being deployed in some intensive high performance installations such as the US Oak Ridge National Computing Laboratory. Although these are not yet mainstream, in the future, the amount of data and throughput being discussed will filter down into midrange applications, just as surely as Formula One technology eventually finds its way into the mass-automobile market. This advanced tape system is called RAIT, Redundant Array of Independent Tape.

RAIT improves the throughput of large sequential files by creating multiple parallel data streams into the tape subsystem and, as with RAID, it can also provide various degrees of fault tolerance for higher availability.

RAIT levels are implemented through software, depending on the number of tape drives in the array configuration; how critical drive recovery is in the event of a fault; and how vital it is to maximise tape throughput. The data transfer rate is limited to the slowest drive in the stripe although ideally all RAIT drives will be of the same generation - e.g. all LTO-6, all LTO-7. As with RAID, RAIT can offer data redundancy without needing to create multiple copies. Striping and parity are the keys to RAID and RAIT implementations.

With RAIT data striping, data is distributed over multiple tape devices, so that the data blocks are processed in parallel and sent to tapes in multiple drives simultaneously. This dramatically increases throughput. Meanwhile, all of these physical striped drives are virtualised so as to appear as a single drive. A RAIT stripe can extend to up to 16 drives depending on the software used, just like sixteen lanes of vehicles on a physical highway.

Azure Archive Blob Tier Price List
Azure Archive Blob Tier Price List

Tape’s advantages for building the archive superhighway

To summarise, if we assume that access to all of the data, all of the time, is the fundamental requirement, it’s likely that a private cloud will be more convenient than tape or the public cloud. But such convenience comes at a price since the acquisition cost of an object storage system will be several times more expensive than the equivalent capacity stored on tape for the server utilisation and redundancy reasons mentioned above.

And such expensive convenience may not be a pre-requisite for the application anyway. For very large active archives, it may make most sense to use tape and private cloud in combination: an object storage solution in the front end of a system, underpinned by vast quantities of cheap, offline LTO tape, where data is moved for analysis or retrieval between different platforms as business needs dictate, using supercharged innovations like RAIT.

Incidentally, this is the principle behind hierarchical archive management solutions like HPE’s Data Management Framework. The automated functionality of HPE DMF allows efficient utilisation of storage infrastructure by removing stale data from defined data tiers and provides a virtual storage space that appears to be unlimited in size. Important data is automatically retrieved as needed, making storage look ”bigger on the inside“…..but already I’m pushing the limit of your coffee break! The HPE DMF is another interesting and exciting topic for another day (but if you want more information, you will find it here!)

Follow or contact us!

Sales Expert | Technical Support