Metamorphic Protein Structure Prediction

Author(s): Alex Felleson, Melanie Gipson

Mentor(s): Amarda Shehu, Computer Science

Abstract

Predicting a protein’s structure from its amino acid sequence alone has been a priority in the field of computational biology. But what about proteins that have more than one distinct structure? We analyzed the qualities of a few of these metamorphic proteins and created a list for future researchers. We also analyzed the performance and bias of one of the top existing structure prediction algorithms, trRosetta, to determine how existing algorithms deal with metamorphic proteins. We hope our work will help with the future development of algorithms that take metamorphic proteins into account.

Video Transcript

Hi, I’m Melanie. And I’m Alex. And we’re part of Dr. Shehu’s lab here at George Mason studying metamorphic protein structure prediction. Proteins perform thousands of functions, from measuring out your circadian rhythm to helping a virus fuse with a host cell. To understand proteins—how they function, what can go wrong, and how to bioengineer new ones—we have to know what they look like. But it’s expensive, time-consuming, and sometimes impossible to determine protein structures experimentally.  That’s where computational biology comes in. For the past 50 years, researchers have been working on algorithms that attempt to predict a protein’s structure given only the sequence of building blocks that make it up. Recently, this problem was considered solved. It’s great that we can finally reliably determine the structure of a protein computationally. But the issue is that the basic assumption of these algorithms—that a given sequence always folds into one and only one structure—does not hold for all proteins. Metamorphic proteins can take on more than one shape—each with a different function—and interconvert back and forth between their those shapes. Selecase is an enzyme that switches between active and inactive forms depending on its concentration, which is a rare method of self-regulation.  Here is the predicted model from one of the top structure prediction algorithms, trRosetta. Because it’s designed to find only one best prediction per protein, using this algorithm to determine the structure of this enzyme would mean we’d remain unaware of the inactive structure and miss out on understanding how this enzyme regulates its activity. If we want to fully understand metamorphic proteins, we need to know all their possible structures. That is, if we discover a new protein (like Covid-19’s spike protein) and want to see its structure, it’d be nice if we had an algorithm that could tell us if the protein is likely metamorphic and, if so, predict all of its structures. The first thing we did was assemble a list of known metamorphic proteins to share with the research community. This will be needed for training and testing new algorithms. We also analyzed how similar or different each protein’s multiple structures are from each other, which will determine how hard they are to predict. Here’s an example of a relatively easy metamorphic protein to deal with, Fragaceatoxin C. It has a water-soluble form, and when it comes into contact with a lipid membrane, the helix marked in yellow extends to form a pore that causes cell damage. Since the change is localized and relatively small and simple, it should be fairly easy for an algorithm to come up with both structures. Second, we ran an analysis on the trRosetta prediction algorithm. Since it only attempts to predict one structure per protein, we looked at which structure it outputs when given the sequence of a metamorphic protein, and whether it has a bias towards a certain type of structure. Here’s trRosetta’s prediction for the pore-forming protein. As you can see, it best matches the structure where the helix is retracted. Same for the Selecase example we showed earlier. trRosetta’s prediction is closer to the more compact structure. The best way to summarize a protein structure’s shape is a measure of compactness called the radius of gyration. Here we have plotted the radius of gyration for each of the known structure of a protein with black dots connected by a dotted line, and the radii of trRosetta’s predictions are shown by triangles. trRosetta actually gives 5 predictions, and the one reported with the highest confidence is plotted in red. As you can see from this plot, the previous examples are representative of trRosetta’s performance—it has a bias towards more compact structures, often coming up with a prediction more compact than either of the actual known structures. If trRosetta or a similar algorithm is to be adapted to predict more than one structure for these proteins, this bias will need to be considered and corrected for. This is also important data because we need to show what the existing algorithms can and cannot do in order to demonstrate a need for new ones. Hopefully our analyses will help with the development of algorithms that take metamorphic proteins into account and expand the capacities of computational biology. Thanks for watching!

Leave a Reply