Abstract
Probabilistic failure prediction models are commonly estimated from non-random samples of business companies. The proportion of failure companies in such samples is often much larger than the proportion of failure companies in most real-world decision contexts. This so-called “choice-based sample bias” implies that calculated failure probabilities will be (more or less) biased. The purpose of the paper is to analyse this bias and its consequences for standard applications of probabilistic failure prediction models (for example probit/logit analysis) and in particular to investigate whether the bias can be eliminated without having to re-estimate the underlying statistical model. It is shown that there is a straightforward linkage between sample-based probabilities of failure and the corresponding population-based probabilities. Knowing this linkage, sample-based probabilities can be adjusted for the “choice-based sample bias”, provided that sufficiently large samples of randomly selected failure companies and randomly selected survival companies have been used in the estimation of the underlying model. Empirical observations in previous research are in line with the theoretical results of the paper.