A thesis submitted October 2015 for the degree of Doctor of Philosophy and defended February 5, 2016.
The PhD School of Science
Faculty of Science
Niels Bohr Institute, eScience, University of Copenhagen
Parallel and Scalable Sparse Basic Linear Algebra Subprograms
Sparse basic linear algebra subprograms (BLAS) are fundamental building blocks for numerous scientific computations and graph applications. Compared with Dense BLAS, parallelization of Sparse BLAS routines entails extra challenges due to the irregularity of sparse data structures. This thesis proposes new fundamental algorithms and data structures that accelerate Sparse BLAS routines on modern massively parallel processors: (1) a new heap data structure named ad-heap, for faster heap operations on heterogeneous processors, (2) a new sparse matrix representation named CSR5, for faster sparse matrix-vector multiplication (SpMV) on homogeneous processors such as CPUs, GPUs and Xeon Phi, (3) a new CSR-based SpMV algorithm for a variety of tightly coupled CPU-GPU heterogeneous processors, and (4) a new framework and associated algorithms for sparse matrix-matrix multiplication (SpGEMM) on GPUs and heterogeneous processors.
The thesis compares the proposed methods with state-of-the-art approaches on six homogeneous and five heterogeneous processors from Intel, AMD and nVidia. Using in total 38 sparse matrices as a benchmark suite, the experimental results show that the proposed methods obtain significant performance improvement over the best existing algorithms.