An evolutionary approach outperforms a design approach in modeling protein sequence variation.

Over generations, evolution shapes proteins, leading to variation in their amino acid sequences both between and within species. Despite our ever-increasing knowledge of the physical constraints that guide protein structure, advanced modeling techniques don’t capture the site-specific variability observed in natural proteins. Bafflingly, complex models that account for physical influences on the positions of all atoms in a protein often perform worse than elementary models at recapitulating natural proteins’ variability.

In GENETICS, Jiang, Teufel, et al. provide evidence for a possible explanation: advanced modeling techniques don’t take into account the order of the steps by which protein sequences change. A popular protein-design suite called RosettaDesign, for example, deletes the amino acid side chains from a template structure, leaving only the peptide backbone, and then replaces them with new side chains all at once. After this dramatic step, additional changes are made to maximize the protein’s calculated stability.

Evolution works very differently. Sequence changes are usually made one amino acid residue at a time, meaning the effect of each alteration depends on how it fits with the existing sequence. Whether a sequence change will be fixed or lost depends on how it affects fitness, which is partly influenced by how it impacts protein stability—variations that make proteins prone to unfolding are typically not favorable.

When the group tested their new algorithm, which functions more similarly to evolution, on the same natural proteins, they found that its effects were different in several ways from those of RosettaDesign. In almost every case, their evolved sequences resembled natural ones more than designed proteins’ sequences did. This might not, at first, seem surprising, since the designed proteins started with a completely stripped peptide backbone and thus shouldn’t have been influenced as much by the natural starting sequences—but the researchers found that this wasn’t the reason. Even when they used a designed sequence as a template, the evolution-based simulation created sequences that better mimicked natural ones.

In protein design, the ability to build proteins with sequences unlike natural ones could be interpreted as a positive thing since it’s conceivable that these proteins would have a wider range of properties than those of proteins found in the wild. But despite the fact that the designed sequences had diverged more from the starting sequences, their site-specific variability was lower than that of the evolved sequences. This implies that, even though a greater number of sites were altered in the designed sequences, the changes were restricted to a smaller set of amino acid residues.

RosettaDesign and similarly sophisticated software have facilitated major advances in protein design, such as developing new enzymes and previously unseen protein folds, and Jiang, Tuefel, et al.’s findings don’t make these types of software obsolete. Different computational techniques fill different niches, and they evolve just as proteins do, with new variants continuously under development. By tweaking existing methods and studying the effects of new algorithms, we can improve how we use these techniques—and perhaps develop new ones with even better fitness than their ancestors had.


Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation
Qian Jiang, Ashley I. Teufel, Eleisha L. Jackson, Claus O. Wilke
GENETICS 2018 208: 1387-1395;


[wysija_form id=”1″]

Nicole Haloupek is a freelance science writer and a recent graduate of UC Berkeley's molecular and cell biology PhD program.

View all posts by Nicole Haloupek »