Automatic Parallelization of Scientific Application
Research output: Book/Report › Ph.D. thesis › Research
Standard
Automatic Parallelization of Scientific Application. / Blum, Troels.
The Niels Bohr Institute, Faculty of Science, University of Copenhagen, 2015. 130 p.Research output: Book/Report › Ph.D. thesis › Research
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - BOOK
T1 - Automatic Parallelization of Scientific Application
AU - Blum, Troels
PY - 2015
Y1 - 2015
N2 - In my PhD work I show that it is possible to run unmodified Python/NumPy code on modern GPUs. This is done by using the Bohrium runtime system to translate the NumPy array operations into an array based bytecode sequence. Executing these byte-codes on two GPUs from different vendors shows great performance gains.Scientists working with computer simulations should be allowed to focus on their field of research and not spend excessive amounts of time learning exotic programming models and languages. We have with Bohrium achieved very promising results by starting out with a relatively simple approach. This has lead to more specialized methods as I have shown with the work done with both specialized, and parametrizied kernels. Both have their benefits and recognizable use cases. We achieved clear performance benefits without any significant negative impact on overall application performance. Even in the cases where we were not able to gain any performance boost by specialization, the added cost, for kernel generation and extra bookkeeping, is minimal.Many of the lessons learned developing and optimizing the Bohrium GPU vector engine has proven to be valuable in a broader perspective, which has made it possible to generalize the developments and made them benefit the complete Bohrium project.
AB - In my PhD work I show that it is possible to run unmodified Python/NumPy code on modern GPUs. This is done by using the Bohrium runtime system to translate the NumPy array operations into an array based bytecode sequence. Executing these byte-codes on two GPUs from different vendors shows great performance gains.Scientists working with computer simulations should be allowed to focus on their field of research and not spend excessive amounts of time learning exotic programming models and languages. We have with Bohrium achieved very promising results by starting out with a relatively simple approach. This has lead to more specialized methods as I have shown with the work done with both specialized, and parametrizied kernels. Both have their benefits and recognizable use cases. We achieved clear performance benefits without any significant negative impact on overall application performance. Even in the cases where we were not able to gain any performance boost by specialization, the added cost, for kernel generation and extra bookkeeping, is minimal.Many of the lessons learned developing and optimizing the Bohrium GPU vector engine has proven to be valuable in a broader perspective, which has made it possible to generalize the developments and made them benefit the complete Bohrium project.
UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122859001105763
M3 - Ph.D. thesis
BT - Automatic Parallelization of Scientific Application
PB - The Niels Bohr Institute, Faculty of Science, University of Copenhagen
ER -
ID: 153607512