About the author:
Daniel is CTO at rhome GmbH, and Co-Founder at Aqarios GmbH. He holds a M.Sc. in Computer Science from LMU Munich, and has published papers in reinforcement learning and quantum computing. He writes about technical topics in quantum computing and startups.
jrnl · home about list ventures publications LinkedIn Join my Slack

# Failure of k-fold cross validation - The parity problem

Sometimes cross validation may fail. In the following, a bit more intuition into an exercise that shows a pitfall of leave-one-out cross validation is presented.

For the reference of the task at hand, refer to this book, exercise 11.5.1.

Failure of k-fold cross validation:
Consider a case in that the label is
chosen at random according to P[y = 1] = P[y = 0] = 1/2. Consider a
learning algorithm that outputs the constant predictor h(x) = 1 if the parity
of the labels on the training set is 1 and otherwise the algorithm outputs the
constant predictor h(x) = 0. Prove that the difference between the leave-oneout
estimate and the true error in such a case is always 1/2.

The gist of the solution is as follows. Be advised, this is just for intuition. The rigorous proof can be found here.

Firstly, parity is defined like on binary numbers. A list of labels {x0,..,xn}\{x_0, .., x_n\} has the parity (inxi)mod2(\sum_i^n x_i) \bmod 2.

Imagine the sample consists of 3 items with the following labels: {1,0,0}\{1, 0, 0\}. This set has a parity of 1. Now, when taking one of the items out (leave-one-out) for cross validation, there are two cases:

  1. Labels are now {1,0}\{1, 0\}. The parity of this three-tuple is still 1. The predictor is trained on this and will always return 1. However, the validation set consists of {0}\{0\}, which has a parity of 0. Thus, the estimated error by CV is 1.

  2. Labels are now {0,0}\{0, 0\}. The parity of this three-tuple is 0. The predictor will always return 0. The validation set consists of {1}\{1\}. Again, estimated error of 1.

In essence, the estimated error will always be 1 (keep in mind, we take the average of all estimated errors).

Now, imagine the same for a sample of a much higher size, such as 1001, i.e. with 501 1's and 500 0's. The true error is then roughly 0.5: The predictor always predicts 1 due to the parity 1, but close to half of the samples are 0. You may keep going ad infinitum.

Finally, as the estimated error is 1.0 (as shown above), we thus get the difference of 0.5, as required in the exercise.

Published on