Tape as Primary Storage for Large Scientific Data Sets

Research output: Book/ReportPh.D. thesisResearch

Standard

Tape as Primary Storage for Large Scientific Data Sets. / Jensen, Klaus Birkelund Abildgaard.

The Niels Bohr Institute, Faculty of Science, University of Copenhagen, 2017.

Research output: Book/ReportPh.D. thesisResearch

Harvard

Jensen, KBA 2017, Tape as Primary Storage for Large Scientific Data Sets. The Niels Bohr Institute, Faculty of Science, University of Copenhagen. <https://soeg.kb.dk/permalink/45KBDK_KGL/1pioq0f/alma99122355144405763>

APA

Jensen, K. B. A. (2017). Tape as Primary Storage for Large Scientific Data Sets. The Niels Bohr Institute, Faculty of Science, University of Copenhagen. https://soeg.kb.dk/permalink/45KBDK_KGL/1pioq0f/alma99122355144405763

Vancouver

Jensen KBA. Tape as Primary Storage for Large Scientific Data Sets. The Niels Bohr Institute, Faculty of Science, University of Copenhagen, 2017.

Author

Jensen, Klaus Birkelund Abildgaard. / Tape as Primary Storage for Large Scientific Data Sets. The Niels Bohr Institute, Faculty of Science, University of Copenhagen, 2017.

Bibtex

@phdthesis{24f8e7562dbc4603af7a31f6735cb0d3,
title = "Tape as Primary Storage for Large Scientific Data Sets",
abstract = "This work investigates how magnetic tape technology can be used toprovide efficient and reliable low-cost storage for large scientific data sets.The low cost is a direct implication of the fact that power consumption canbe reduced to a constant dependant only on the size of the tape system andinfrastructure and not the size of the data set. Thus, one conclusion fromthis work is that data from Big Science facilities can in fact be stored ontape and thus avoid an enormous energy consumption for data storage.My work is focused on the challenges that face scientists working withever increasing data volumes from large-scale scientific experiments andfacilities. I discuss what challenges are preventing tape from becominga primary tier in the high performance computing data center for suchdata. The work includes a literature study on tape technology in general,the data sets it can be made to support and a survey of state-of-the-arttape storage systems.I describe and motivate Tapr, a highly extendable parallel I/O gatewayand tape library management system optimized for high throughputdata streams with special semantic support for scientific data. I discussthe motivation as well as the core data models, transfer protocols, featuressuch as disaster recovery, inline simulation and possible supportfor retrieval latency prediction to strengthen the adoption of tape in aHigh Performance Computing environment as a primary storage tier. Ithen describe a storage backend for Tapr, a binary extension to the LinearTape File System (LTFS) that provides higher scalability and superiorfault-tolerance, in detail. The format extension is based on embeddingand is completely backward compatible with existing LTFS software. Iprovide an overview of the inner workings of the binary index and therecovery log that is integral to the format.Finally, I present a framework for generic redundant streaming basedon redundant I/O behaviors and describe possible extensions to it andhow it can be integrated into Tapr as well has how several I/O behaviorscan be composed into more powerful behaviors.",
author = "Jensen, {Klaus Birkelund Abildgaard}",
year = "2017",
language = "English",
publisher = "The Niels Bohr Institute, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Tape as Primary Storage for Large Scientific Data Sets

AU - Jensen, Klaus Birkelund Abildgaard

PY - 2017

Y1 - 2017

N2 - This work investigates how magnetic tape technology can be used toprovide efficient and reliable low-cost storage for large scientific data sets.The low cost is a direct implication of the fact that power consumption canbe reduced to a constant dependant only on the size of the tape system andinfrastructure and not the size of the data set. Thus, one conclusion fromthis work is that data from Big Science facilities can in fact be stored ontape and thus avoid an enormous energy consumption for data storage.My work is focused on the challenges that face scientists working withever increasing data volumes from large-scale scientific experiments andfacilities. I discuss what challenges are preventing tape from becominga primary tier in the high performance computing data center for suchdata. The work includes a literature study on tape technology in general,the data sets it can be made to support and a survey of state-of-the-arttape storage systems.I describe and motivate Tapr, a highly extendable parallel I/O gatewayand tape library management system optimized for high throughputdata streams with special semantic support for scientific data. I discussthe motivation as well as the core data models, transfer protocols, featuressuch as disaster recovery, inline simulation and possible supportfor retrieval latency prediction to strengthen the adoption of tape in aHigh Performance Computing environment as a primary storage tier. Ithen describe a storage backend for Tapr, a binary extension to the LinearTape File System (LTFS) that provides higher scalability and superiorfault-tolerance, in detail. The format extension is based on embeddingand is completely backward compatible with existing LTFS software. Iprovide an overview of the inner workings of the binary index and therecovery log that is integral to the format.Finally, I present a framework for generic redundant streaming basedon redundant I/O behaviors and describe possible extensions to it andhow it can be integrated into Tapr as well has how several I/O behaviorscan be composed into more powerful behaviors.

AB - This work investigates how magnetic tape technology can be used toprovide efficient and reliable low-cost storage for large scientific data sets.The low cost is a direct implication of the fact that power consumption canbe reduced to a constant dependant only on the size of the tape system andinfrastructure and not the size of the data set. Thus, one conclusion fromthis work is that data from Big Science facilities can in fact be stored ontape and thus avoid an enormous energy consumption for data storage.My work is focused on the challenges that face scientists working withever increasing data volumes from large-scale scientific experiments andfacilities. I discuss what challenges are preventing tape from becominga primary tier in the high performance computing data center for suchdata. The work includes a literature study on tape technology in general,the data sets it can be made to support and a survey of state-of-the-arttape storage systems.I describe and motivate Tapr, a highly extendable parallel I/O gatewayand tape library management system optimized for high throughputdata streams with special semantic support for scientific data. I discussthe motivation as well as the core data models, transfer protocols, featuressuch as disaster recovery, inline simulation and possible supportfor retrieval latency prediction to strengthen the adoption of tape in aHigh Performance Computing environment as a primary storage tier. Ithen describe a storage backend for Tapr, a binary extension to the LinearTape File System (LTFS) that provides higher scalability and superiorfault-tolerance, in detail. The format extension is based on embeddingand is completely backward compatible with existing LTFS software. Iprovide an overview of the inner workings of the binary index and therecovery log that is integral to the format.Finally, I present a framework for generic redundant streaming basedon redundant I/O behaviors and describe possible extensions to it andhow it can be integrated into Tapr as well has how several I/O behaviorscan be composed into more powerful behaviors.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/1pioq0f/alma99122355144405763

M3 - Ph.D. thesis

BT - Tape as Primary Storage for Large Scientific Data Sets

PB - The Niels Bohr Institute, Faculty of Science, University of Copenhagen

ER -

ID: 200495953