PhD defence by Morten Nørgaard Larsen – Niels Bohr Institute - University of Copenhagen

Niels Bohr Institute > Calendar > Activities 2013 > PhD defence by Morten ...

PhD defence by Morten Nørgaard Larsen

Parallel Libraries to support High-Level Programming


The development of computer architectures during the last ten years has forced programmers to move towards writing parallel programs instead of sequential ones. The homogenous multi-core architectures from the major CPU producers like Intel and AMD has led this trend, but the introduction of the more exotic - though short-lived - heterogeneous CELL Broadband Engine (CELL-BE) architecture contributed to this shift. Furthermore, the use of cluster computers made of commodity hardware and specialized supercomputers has greatly increased in both industry as well as in the academic world.  Finally, the general increase in the use of graphic cards for general-purpose programming (GPGPUs) have meant that programmers today must be able to write parallel programs that do not just utilize small number computational cores but perhaps hundreds or even thousands. However, most programmers will agree that doing so is not a simple task and, for many non-computer scientists, like chemists and physicists writing programs for simulating their experiments, the task can easily become overwhelming.

During recent decades, a lot of research effort has been put into how to create tools that will simplify writing parallel programs by raising the abstraction level so that programmers can focus on implementing their algorithms and not on the details of the underlying hardware. The outcome of this has ranged from ideas on automating the parallelization of sequential programs to presenting a cluster of machines as if they were a single machine. In between are a number of tools to help programmers to handle communication, share data, run loops in parallel, and handle algorithms mining huge amounts of data, etc. Even though most of them do a good job performance-wise, almost all of them require that programmers learn a new programming language, or at least force them to learn new methods and/or ways of writing code.

For the first part, this thesis will focus on simplifying the task of writing parallel programs for programmers, especially for large groups of non-computer scientists. I will begin by presenting an extension based on Communicating Sequential Processes (CSP) for a distributed shared memory (DSM) system for the CELL-BE. This extension consists of a channel model and a thread library for the CELL's specialized computational units, enabling them to run multiple (CSP) processes. Overall, the CSP model requires the programmer to think a bit differently, but at the same time the implemented algorithms will perform very well, as shown by the initial tests presented.

In the second part of this thesis, I will switch focus from the CELL-BE architecture to the more traditional x86 architecture and the Microsoft .NET framework. Normally, one would not directly think of the .NET framework when talking about scientific applications, but Microsoft has in the last couple of versions of .NET introduced a number of tools for writing parallel and high performance code. The first section examines how programmers can run parts of a program like a loop in parallel without directly programming the underlying hardware. The presented tool will be able to run the body of the method in parallel, including handling the consistency of any shared data which the programmer accesses within the loop body. Doing so includes implementing a DSM system along with the MESI protocol on top of the .NET framework. However, during the implementation and while testing, it became clear that the lack of information regarding what shared data a method accesses greatly limits overall performance. Moreover, the overhead of building a DSM system along with a consistency model on top of .NET became too large. Therefore, the work is repeated in another approach, which will force programmers to define what data a method will access when executed. Inspired by CSPs, I define a set of rules dictating how programmers should write a method, including input parameters, output values and accesses of shared data. These rules make it possible to get more information, which in turn allows us to build a new tool system that does not need a DSM system or a consistency model. However, programmers can still invoke methods, and the tool will transparently run the method in parallel on a platform consisting of workstations, servers and cloud instances. Overall, this increases the effort required by the programmers but greatly improves performance, as the initial tests show.