Biological Sciences

Water: the Key to Protein Folding


Milo Lin
Miller Institute for Basic Research in Science

University of California, Berkeley



Starting at the organism level and magnifying to the organ, cellular, and macromolecular scales, biology happens along the entire spectrum of sizes ranging from a meter to a billionth of a meter. Amazingly, life manages to be both complex and reproducible over this vast spectrum of lengths. The root of this complex robustness can be traced, at the smallest scale, to proteins, which are the cleaners, builders, motors, messengers and transporters of the cell. Proteins are molecular chains that acquire their specific functionality when folded into a unique three-dimensional shape, called the native fold.

Aside from the question of how proteins manage to accomplish their tasks at the molecular scale, there is an even deeper question: how do proteins find their native folds? If a protein were to try out all the possible ways to fold itself, it would take longer than the age of the universe to find the native fold. Yet, proteins typically fold within seconds, and many do so without the help of any cellular machinery. What is the physical basis for this remarkably fast self-organization?


As a physicist, I am most interested in how general physical properties influence or constrain biological function and/or evolution.  A good theory is one which explains complex phenomena in terms of simple underlying laws, independently reproduces experiments and/or observations, and predicts the outcomes of future experiments.  I use a combination of pen-and-paper theory as well as computer simulations  of atomic motions to study the effects of physical forces on molecular behavior. The former gives very general quantitative insights on the relevant mechanisms, whereas the latter allows for computational “experiments” to visualize and uncover complex behavior at atomic resolution. The theory can therefore be compared with both the simulation data as well as laboratory experimental data.


We have shown that water is the key to proteins’ ability to quickly find their native fold. Within each protein, there are regions of the protein chain that can form favorable interactions with water. These regions tend to remain on the exterior of the folded protein in order to make contact with water. Consequently, the other regions tend to bury themselves in the interior portion of the folded structure.

To prove an earlier conjecture, we mathematically showed that the number of possible ways a protein can fold itself while satisfying this constraint is small enough for a protein to try all of them within seconds, rather than the age of the universe as would be needed in the absence of water.

We derived a relation between the length of the protein chain and the number of distinct folds. By multiplying the number of distinct folds by the amount of time required for a random protein chain to find a new fold (known to be about 10 nanoseconds), we can also relate the protein length to the folding time, i.e. the expected time required to find the native fold. Because the folding time is limited to be at most minutes (cellular function and maintenance depend on proteins quickly folding upon synthesis), this result also enabled the prediction of a universal “maximum length” of protein domains; proteins longer than this length limit cannot fold quickly enough, and so must be composed of modular subunits, called domains, which are themselves smaller than the length limit. This result is in agreement with experimental folding rates and explains the length distribution of protein domains. While evolution plays its part within this framework to select, for each protein, the specific shape of the native fold which can carry out the protein’s function , these findings imply that the length and time scales ofprotein folding are dictated primarily by universal physical constraints (i.e. contact with water).

I would like to add increasing levels of detail to this general framework. This would entail going beyond the size-dependence of protein folding. For example, there is empirical evidence that proteins which have complicated topological folds take longer to fold; can a more detailed theory of folding quantitatively explain how, in addition to length, the native fold topology influences folding rates? In addition, my current research also aims to elucidate the physical mechanism of how proteins perform their functions once they have folded.

Comments are closed.