Sample-efficient evidence estimation of score based priors for model selection

Frederic Wang1    Katherine L. Bouman1

1California Institute of Technology



Abstract

The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements y to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence p(y | M) under different models M that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose \method, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.

[Paper]

[Code]


Citation

F. Wang and K. L. Bouman, “Sample-efficient evidence estimation of score based priors for model selection,” in International Conference on Learning Representations, 2026.


Results: MNIST phase retrieval inverse problem

We use our method to compute the model evidence of different MNIST digits under the Gaussian phase retrieval (y = |Ax| + Ɛ, where A is i.i.d. Gaussian) and Fourier phase retrieval (y = |Fx| + Ɛ, F is the Fourier transform) inverse problems. For both inverse problems, we simulate measurements of all 10 MNIST digits and compute each of their evidences under all 10 MNIST digit priors.

Figure 1. Model evidences for Gaussian phase retrieval (left) and Fourier phase retrieval (right) for each (ground truth measurement, model) pair of MNIST digits. Our method assigns the highest evidence to the correct model for all cases. Posterior samples shown for dotted matrix entries below. For Gaussian phase retrieval, \method estimates higher model likelihood for visually similar digits, such as 4 and 9. For Fourier phase retrieval, both translations and reflections are invariances as seen in the posterior samples, so \method estimates a high likelihood of model 9 given a measurement of a 6.

Results: M87* black hole model selection and validation

We perform model selection by computing the evidence of the M87* observations from April 6, 2017 under five candidate black hole models: GRMHD, generic space images, RIAF, CelebA, and MNIST 0 digits.


Figure 2. Model evidence estimates on real M87* observations across 5 different priors. Our method concludes that, of these prior models, GRMHD is the most likely model. Unconditional and posterior samples for each prior are shown to the right.


We also compare the M87* evidence under the GRMHD prior to simulated measurements of in-distribution and out-of-distribution images as a proxy to see if the M87* black hole is in the distribution of GRMHD black hole simulations.

Figure 3. GRMHD model validation results on M87* observations by comparing to evidence of in-distribution measurements. Our method shows that the evidence of M87* observations have a z-score of about -0.81 compared to the evidence distribution of GRMHD measurements, indicating that M87* is statistically in-distribution of GRMHD. The evidence of simulated measurements of out-of-distribution images are also shown, demonstrating that measurements from out-of-distribution (OOD) images have OOD evidence. The mean reconstruction and posterior samples of the OOD measurements are shown on the right.

>