Andreas Haahr Larsen
A thesis for the degree of Doctor of Philosophy defended December 2018.
The PhD School of Science, Faculty of Science, X-ray and Neutron Science, Niels Bohr Institute, University of Copenhagen
Prof. Lise Arleth
Analytical tools for structure determination of protein complexes with small-angle scattering
Proteins perform a wide range of vital physiological tasks in a complex interplay with other biological components, such as signaling molecules, nucleotides and lipids. To better understand the role of the proteins, their structure must be surveyed, as their function and structure are strongly coupled. Advanced experimental techniques are vital to be able to probe such biological nanostructures. A key to better understanding therefore lies in the development of these techniques. Small-angle scattering (SAS) is one of these techniques and is successful at determining the low-resolution structure of proteins and protein complexes in solution. The current thesis deals with some of the recent challenges in biological SAS. One challenge is the investigation of membrane proteins. In vitro studies of membrane proteins require a system for solubilization of the proteins, where detergent is the most common. The scattering contribution from the detergents can be suppressed with contrast variation in small-angle neutron scattering (SANS) by use of specially synthesized "invisible" detergents as developed in our research group. These have zero scattering contribution in the full q-range when measuring in a D2O-based buffer. I have developed tools for fully exploring this method. One challenge was to correctly include a layer of densely packed Water around the proteins without adding water at the region of the detergents. This and many other features is implemented in the program CaPP, developed during my PhD. CaPP also calculates the theoretical pair distance distribution function, p(r), as well as the scattering for protein structures in the protein data bank (PDB) format. I show that the calculations in CaPP are rapid and accurate. Another issue we had to deal with when using the "invisible" detergents was protein aggregation. Aggregation may hinder correct structure determination from the data. We therefore applied and refined a method to take aggregation into account using analytical structure factors. It is essential to be able to assess if one hypothesized model describes data significantly better than others. The F-test was applied and proved useful in that context.
Aimed with these new tools and the "invisible" detergents, we studied three different membrane protein complexes: the AMPA-type glutamate receptor 2 (GluA2), the sarco/endoplasmic reticulum calcium ATPase (SERCA), and the holo-translocon (HTL). Both GluA2 and SERCA are key players in neurological diseases, a field that is still poorly understood. GluA2 was investigated in solution in different ligandinduced conformational states. Some of the investigated states had been solved at high resolution, and we verified that these compact forms were also the solution structures. Moreover, we discovered a more open form, resembling that of a previously found electron microscopy structure. SERCA was investigated in a state with unknown structure. Our SANS data provided experimental evidence that SERCA was in an equilibrium state between two known forms. For HTL, it was established, that the protein complex contained a lipid core. Moreover, we provided evidence for flexibility in the SecDF domain of HTL. A fourth protein system, and a key player in neurodegenerative diseases, α-synuclein (αSN), was also studies. Under the right conditions, αSN forms fibrils and we used SANS for dynamic studies of a hypothesized Exchange between αSN monomers in solution and monomers in the fibrils. The SANS data moreover confirmed the existence of a layer of densely packed water around the fibrils, but also showed that it was not more dense or extended than water layer formed around other proteins.
Finally, we developed a statistical tool that utilizes Bayesian statistics to include prior information about the investigated system in analytical modelling. The method was too immature to be applied to any of the scientific cases, but we showed that the method is very promising. The method e.g. automatically determines the most probable value for the regularization parameter that weighs the prior knowledge and new SAS data. The Bayesian method also provides a good measure for the information content in data.
In conclusion, the thesis expands the borders of what can be "seen" with SAS by the development of new analytical and statistical tools as exemplified with four challenging scientific cases of biologically relevant protein complexes.