Introduction to Protein Folding

This is a basic description of what a protein is, what they become and the relevance of  molecular modelling, as I didn’t really explain this in that blog.

Cells are made of proteins. To build a protein, a cell uses the codes stored in its DNA to gather a group of Amino Acids (AA), a list if you will. How it gets this list (molecular genetics) is not important for this blog, just that we have a list (each letter represents an amino acid):

1 MKMSRLCLSV ALLVLLGTLA ASTPGCDTSN QAKAQRPDFC LEPPYTGPCK ARMIRYFYNA
61 KAGLCQPFVY GGCRAKRNNF KSSEDCMRTC GGAIGPWENL

Pancreatic trypsin inhibitor precursor [Bos taurus] – a very small protein

These are all compounds (Amino Acids (Gly, G) = G) and begin colliding and combing with each to form a final structure:

The final protein – http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

How it gets from the linear sequence to tertiary (3D) structure is the process we are interested in and is achieved using molecular modelling. This is where we need improvements in software.

The process
The Amino acids (molecules) collide and combine to form one structure. The molecules continue to react to each other moving through a series of states until the system reaches its lowest energy state, and hence forms a protein.

The human body finds a protein’s end state by going through an average of 10states depending on the length of the sequence, taking around 1-3 seconds. Currently a 150 AA sequence (quite small, some are above 1000) takes around 5 billion years to complete a fold, so things like software optimization and learning is required, as well as GPU improvement (and use). (Game Processing Units as opposed to CPU)

Current research is done for very small sections of a fold (like 10 Pico seconds), which is still very useful but entire folds would be invaluable.

Leave a comment