Data/Parameter Ratio

Have you got enough data for your parameters?

In any least square fitting, it is important to make sure that the number of parameters that are used to fit data is sufficiently small so that the data aren’t being over-fitted.

Underfitting, Just Right and Overfitting
Underfitting, Just Right and Overfitting

The IUCr has therefore set out some guidelines for the minimum data/parameter ratio that must be fulfilled for a publishable structure.

CheckCif guideline for the minimum Data/Parameter ratio

In Olex2, we display this ratio using a small battery symbol, and the number of ‘bars’ (as well as the colour) indicate the status of your structure.

With modern instrumentation and ‘standard’ structures, there should never really be a problem here and the battery should remain green and fully charged at any time.

As you add more and more parameters to your refinement, you will notice a decrease in that number. Let’s consider this carefully:

  • For each atom, we need to refine at leaset four parameters: $x$, $y$, $z$ and $U_{iso}$.

  • For each anisotropic atom we then need an additional 6 parameters for the ellipsoids.

  • If we refine hydrogen atoms freely, then they will also require four parameters each. But if we let them ride on their parent atom (constrained, AFIX) – then they require no parameters at all!

  • We can also save some parameters with more constraints: fixing a six-membered ring with AFIX 66 requires only 6 parameters while refining all atoms freely requires a whole lot more!

There is some discussion about how to calculate the number of data that are used in this calculation. At the time of writing, CheckCif uses the number of Laue-averaged reflections (based on the data in the FCF) for the calculation. For non-centrosymmetric space groups, this means that the number of data are actually roughly halved.

In CifPlus, we calculate the ratio as:

ratio = float(_reflns_number_gt)/float(_refine_ls_number_parameters)