Tape as Primary Storage for Large Scientific Data Sets
Research output: Book/Report › Ph.D. thesis › Research
Standard
Tape as Primary Storage for Large Scientific Data Sets. / Jensen, Klaus Birkelund Abildgaard.
The Niels Bohr Institute, Faculty of Science, University of Copenhagen, 2017.Research output: Book/Report › Ph.D. thesis › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - BOOK
T1 - Tape as Primary Storage for Large Scientific Data Sets
AU - Jensen, Klaus Birkelund Abildgaard
PY - 2017
Y1 - 2017
N2 - This work investigates how magnetic tape technology can be used toprovide efficient and reliable low-cost storage for large scientific data sets.The low cost is a direct implication of the fact that power consumption canbe reduced to a constant dependant only on the size of the tape system andinfrastructure and not the size of the data set. Thus, one conclusion fromthis work is that data from Big Science facilities can in fact be stored ontape and thus avoid an enormous energy consumption for data storage.My work is focused on the challenges that face scientists working withever increasing data volumes from large-scale scientific experiments andfacilities. I discuss what challenges are preventing tape from becominga primary tier in the high performance computing data center for suchdata. The work includes a literature study on tape technology in general,the data sets it can be made to support and a survey of state-of-the-arttape storage systems.I describe and motivate Tapr, a highly extendable parallel I/O gatewayand tape library management system optimized for high throughputdata streams with special semantic support for scientific data. I discussthe motivation as well as the core data models, transfer protocols, featuressuch as disaster recovery, inline simulation and possible supportfor retrieval latency prediction to strengthen the adoption of tape in aHigh Performance Computing environment as a primary storage tier. Ithen describe a storage backend for Tapr, a binary extension to the LinearTape File System (LTFS) that provides higher scalability and superiorfault-tolerance, in detail. The format extension is based on embeddingand is completely backward compatible with existing LTFS software. Iprovide an overview of the inner workings of the binary index and therecovery log that is integral to the format.Finally, I present a framework for generic redundant streaming basedon redundant I/O behaviors and describe possible extensions to it andhow it can be integrated into Tapr as well has how several I/O behaviorscan be composed into more powerful behaviors.
AB - This work investigates how magnetic tape technology can be used toprovide efficient and reliable low-cost storage for large scientific data sets.The low cost is a direct implication of the fact that power consumption canbe reduced to a constant dependant only on the size of the tape system andinfrastructure and not the size of the data set. Thus, one conclusion fromthis work is that data from Big Science facilities can in fact be stored ontape and thus avoid an enormous energy consumption for data storage.My work is focused on the challenges that face scientists working withever increasing data volumes from large-scale scientific experiments andfacilities. I discuss what challenges are preventing tape from becominga primary tier in the high performance computing data center for suchdata. The work includes a literature study on tape technology in general,the data sets it can be made to support and a survey of state-of-the-arttape storage systems.I describe and motivate Tapr, a highly extendable parallel I/O gatewayand tape library management system optimized for high throughputdata streams with special semantic support for scientific data. I discussthe motivation as well as the core data models, transfer protocols, featuressuch as disaster recovery, inline simulation and possible supportfor retrieval latency prediction to strengthen the adoption of tape in aHigh Performance Computing environment as a primary storage tier. Ithen describe a storage backend for Tapr, a binary extension to the LinearTape File System (LTFS) that provides higher scalability and superiorfault-tolerance, in detail. The format extension is based on embeddingand is completely backward compatible with existing LTFS software. Iprovide an overview of the inner workings of the binary index and therecovery log that is integral to the format.Finally, I present a framework for generic redundant streaming basedon redundant I/O behaviors and describe possible extensions to it andhow it can be integrated into Tapr as well has how several I/O behaviorscan be composed into more powerful behaviors.
UR - https://soeg.kb.dk/permalink/45KBDK_KGL/1pioq0f/alma99122355144405763
M3 - Ph.D. thesis
BT - Tape as Primary Storage for Large Scientific Data Sets
PB - The Niels Bohr Institute, Faculty of Science, University of Copenhagen
ER -
ID: 200495953