Protein Structure

The biological function of a protein is entirely determined by its three-dimensional (3D) structure. This makes knowledge of the structure of proteins fundamental for understanding and modifying their function.

The experimental techniques for determining the 3D structures of proteins (principally X-ray diffraction and NMR spectroscopy) are still somewhat limited and this can make experimental determination of structures very difficult or even impossible in many cases. On the other hand, advances in the techniques for DNA sequencing (Frangeul et al., 1999) have produced complete sequences for a large number of proteins, and many more than those for which the 3D structure is currently known (this has been referred to as the "sequence-structure gap" (Rost & Sander, 1996)). In spite of these difficulties, and due to the importance of knowing the 3D structure of proteins, many projects for large-scale determination of as many protein structures as possible for complete proteomes (structural genomics, Sali, 1998).

Since Anfinsen's experiments (Anfinsen, 1973) it is known that the 3D structure of a protein is, in general, determined solely by its amino-acid sequence. So far however, attempts to derive an algorithm able to reproduce the 3D structure of a protein using its amino-acid sequence as input have been unsuccessful. This is mainly due to the large number of interactions among amino-acids which determine the folding of a protein and make the system very complex. Moreover, the magnitude of these interactions is not always known and empirical values must be used (van Gunsteren et al, 1994).

Protein Structure Prediction

The importance of knowing how a protein folds, the difficulties in determining its structure experimentally and the lack of an algorithm for deriving this structure from the amino-acid sequence have given rise to the appearance of theoretical approaches for predicting protein structure from sequence.

The methods for predicting protein structure can be divided into ab initio and non-ab initio, depending on whether they use the protein sequence as the only input or whether they also employ additional information. Another possible division of protein structure prediction methods can be made according to the representation of the protein they use for prediction: one-dimensional methods (1D) associate a value with each amino-acid (the solvent state, the secondary structure state, etc); 2D methods work with information in the form of pairs of residues (contacts, for example); finally, 3D methods use three-dimensional representations of proteins, generally the spatial coordinates of atoms or residues.

1D predictions

The 3D structure of a protein involves local regular conformations collectively known as secondary structure (Pauling & Corey, 1951). Two main types of secondary structure are found in proteins: the alpha-helix and beta-strand. The secondary structure state of a residue is mainly determined locally, with all the information being in the sequence neighbourhood of that residue (Rooman et al, 1990). This makes the system less complex and therefore produces very accurate prediction methods. The accuracy of current secondary structure prediction methods is around 76% (Rost et al, 1994; Jones, 1999).

This situation is similar for solvent accessibility prediction and transmembrane helix prediction, where the accuracy is around 75% and 95% respectively.

Public servers for 1D predictions

2D predictions

Contacts predicted between residues can be used as distance constraints for predicting protein structure. The main method currently used for predicting residue contacts is based on "correlated mutations" (Goebel et al, 1994; Olmea & Valencia, 1997; Pazos et al, 1997b). Its accuracy is such that around 25% of contacts are correctly predicted. In spite of this low level of accuracy, these predicted contacts have been demonstrated to be very useful in filtering structural models (Olmea et al, 1999), driving ab-initio simulations (Ortiz et al, 1999) and predicting protein interaction surfaces (Pazos et al, 1997).

Public servers for 2D predictions

3D predictions

Leaving aside ab-Initio methods, which are still not applicable to normal-sized proteins and which involve the problems mentioned above), the main techniques used nowadays for predicting the full 3D structure of proteins are homology modelling and fold recognition.

Homology Modelling is based on the fact that similar sequences tend to fold into similar structures (Chotia & Lesk, 1986). If a sequence of unknown structure (the problem sequence) is similar to one whose structure is known (the template sequence), then the structure of the core of the template sequence is transferred to the problem sequence and then sidechains, loops and other problematic regions are dealt with using different techniques. Whilst the structural models produced by this technique are very accurate, the problem is that the range of applicability is limited, since obtaining a homology model for a sequence depends entirely on the existence of a protein of known structure with similar sequence.

Fold Recognition, also Fold recognition, also known as remote homology modelling or threading, is based on the fact that, even though similar sequences tend to fold into similar structures, the opposite is not necessarily true. There are proteins which have very different sequences but which still fold into similar structures, i.e. the space of structures is more restricted than the space of sequences. Threading algorithms therefore evaluate the fitness of the problem sequence for each of these possible structures according to different fitness criteria (matching of secondary structure, matching of residue environments, etc.). The average accuracy of threading methods in finding the correct fold for a problem sequence is around 50%. Due to this low level of accuracy, post-processing the results (for example, filtering them using other predictions or known experimental data), can significantly improve the prediction. (Pazos et al, 1999).

Public servers for 3D predictions


  • Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science. 181, 223-230.

  • Chothia, C., & Lesk, A. M. (1986). The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823-826.

  • Frangeul, L., Nelson, K. E., Buchrieser, C., Danchin, A., Glaser, P., Kunst, F. (1999). Cloning and assembly strategies in microbial genome projects. Microbiology. 145, 2625-2634.

  • Göbel, U., Sander, C., Schneider, R., & Valencia, A. (1994). Correlated mutations and residue contacts in proteins. Proteins. 18, 309-317.

  • Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 292, 195-202.

  • Olmea, O., Rost, B., & Valencia, A. (1999). Effective use of sequence correlation and conservation in fold recognition. J Mol Biol. 293, 1221-1239.

  • Olmea, O., & Valencia, A. (1997). Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding & Design. 2, S25-S32.

  • Ortiz, A., Kolinski, A., Rotkiewicz, P., Ilkowski, B., & Skolnick, J. (1999). Ab initio folding of proteins using restraints derived from evolutionary information. Proteins. S3, 177-185.

  • Pauling, L., & Corey, R. B. (1951). Configurations of Polypeptide Chains with Favored Orientations Around Single Bonds: Two New Pleated Sheets. Proc. Natl. Acad. Sci. USA. 37, 729-740.

  • Pazos, F., Helmer-Citterich, M., Ausiello, G., & Valencia, A. (1997). Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511-523.

  • Pazos, F., Olmea, O., & Valencia, A. (1997b). A graphical interface for correlated mutations and other structure prediction methods. Comp Appl Biol Sci. 13, 319-321.

  • Pazos, F., Rost, B., & Valencia, A. (1999). A platform for integrating threading results with protein family analyses. Bioinformatics. 15, 1062-1063.

  • Rooman, M. J., Rodriguez, J. & Wodak, S. J. (1990). Automatic definition of recurrent local structure motifs in proteins. J Mol Biol. 213(2), 327-336.

  • Rost, B., & Sander, C. (1996). Bridging the protein sequence-structure gap by structure prediction. Annu. Rev. Biophys. Biomol. Struct. 25, 113-136.

  • Rost, B., Sander, C., & Schneider, R. (1994). PHD - an automatic mail server for protein secondary structure prediction. Comp Appl Biol Sci. 10, 53-60.

  • Sali, A. (1998). 100,000 protein structures for the biologist. Nature Struct. Biol. 5, 1029-1032.

  • Van Gunsteren, W. F., Luque, F. J., Tims, D. & Torda, A. E. (1994). Molecular mechanics in biology: from structure to function taking into account of solvatation. Annu Rev Biophys Biomol Struct. 23, 847-863.

2002 ALMA Bioinformatics, SL. All rights reserved.