G-factor

The G-factor provides a measure of how "normal", or alternatively how "unusual", a given stereochemical property is. In PROCHECK-NMR it is computed for each residue's phi-psi, chi1-chi2 and chi1 values. It is essentially just a log-odds score based on the observed distributions of these stereochemical parameters.

The standards of "normality" used here have been derived from an analysis of 163 non-homologous, high-resolution protein chains chosen from structures solved by X-ray crystallography to a resolution of 2.0Å or better and an R-factor no greater than 20%. No two of the 163 chains shared a sequence homology greater than 35%, and all atoms having zero occupancy were excluded from the analysis.

The analyses provided the observed distributions of phi-psi, chi1-chi2 and chi1 values for each of the 20 amino acid types. These distributions were then divided into cells. For example, each residue type's Ramachandran plot of phi-psi values was divided into 45 x 45 cells. The numbers of observations in each cell were used to calculate the probability of a given residue type having a given phi-psi combination. The probabilities were, in turn, used to compute a log-odds score for each cell. Log-odds scores can be summed, rather than multiplied like probabilities; therefore, taking meaningful averages becomes possible.

When applied to a given residue, a low G-factor indicates that the property corresponds to a low-probability conformation. So, for example, residues falling in the disallowed regions of the Ramachandran plot will have a low (or very negative) G-factor. Similarly for unfavourable chi1-chi2 and chi1 values.

Thus, if a protein has many residues with low G-factors it suggests that something may be amiss with its overall geometry.