Weighted samples: the Samples
, MCMCSamples
, and NestedSamples
data frames
The anesthetic.samples.Samples
data frame at its core is a
pandas.DataFrame
, and as such comes with all the pandas functionality.
The anesthetic.samples.MCMCSamples
and
anesthetic.samples.NestedSamples
data frames share all the
functionality of the more general Samples
class, but come with some
additional MCMC or nested sampling specific functionality.
from anesthetic import read_chains, make_2d_axes
samples = read_chains("../../tests/example_data/pc_250")
General weighted sample functionality
The important extension to the pandas.Series
and
pandas.DataFrame
classes, is that in anesthetic the data frames are
weighted (see anesthetic.weighted_pandas.WeightedSeries
and
anesthetic.weighted_pandas.WeightedDataFrame
), where the weights form
part of a pandas.MultiIndex
and are therefore retained when slicing.
Mean, standard deviation, median, quantiles, etc.
The weights are automatically taken into account in summary statistics such as mean, variance, covariance, skewness, kurtosis, standard deviation, median, quantiles, etc.
x0_median = samples.x0.median()
x1_mean = samples.x1.mean()
x1_std = samples.x1.std()
x2_min = samples.x2.min()
x2_95percentile = samples.x2.quantile(q=0.95)
fig, axes = make_2d_axes(['x0', 'x1', 'x2'], upper=False)
samples.plot_2d(axes, label=None)
axes.axlines({'x0': x0_median}, c='C1', label="median")
axes.axlines({'x1': x1_mean}, c='C2', label="mean")
axes.axspans({'x1': (x1_mean-x1_std, x1_mean+x1_std)}, c='C2', alpha=0.3, label="mean+-std")
axes.axspans({'x2': (x2_min, x2_95percentile)}, c='C3', alpha=0.3, label="95 percentile")
axes.iloc[0, 0].legend(bbox_to_anchor=(1, 1), loc='lower right')
axes.iloc[1, 1].legend(bbox_to_anchor=(1, 1), loc='lower right')
axes.iloc[2, 2].legend(bbox_to_anchor=(1, 1), loc='lower right')
Creating new from existing columns
We can define new parameters with relative ease. For example, given two
parameters x0
and x1
, we can compute the derived parameter
y = x0 * x1
:
samples['y'] = samples['x1'] * samples['x0']
samples.set_label('y', '$y=x_0 \\cdot x_1$')
samples.plot_2d(['x0', 'x1', 'y'])
MCMC statistics
Markov Chain Monte Carlo (short MCMC) samples as the name states come from Markov chains, and as such come with some MCMC specific properties and potential issues, e.g. correlation of successive steps, a burn-in phase, or questions of convergence.
We have an example data set (at the relative path
anesthetic/tests/example_data/
with the file root cb
) that emphasizes
potential MCMC issues. Note, while this was run with Cobaya, we had to actually put in some
effort to make Cobaya produce such a bad burn-in stage. With its usual
optimisation settings it normally produces much better results.
Chains
When MCMC data is read in, anesthetic automatically keeps track of multiple
chains that were run in parallel via the 'chain'
parameter. You can split
the chains into separate samples via the pandas.DataFrame.groupby()
method:
from anesthetic import read_chains, make_2d_axes
mcmc_samples = read_chains("../../tests/example_data/cb")
chains = mcmc_samples.groupby(('chain', '$n_\\mathrm{chain}$'), group_keys=False)
chain1 = chains.get_group(1)
chain2 = chains.get_group(2).reset_index(drop=True)
For this example MCMC run the initial burn-in phase is very apparent, as can be seen in the following two plots.
fig, ax = plt.subplots(figsize=(5, 3))
ax = chain1.x0.plot.line(alpha=0.7, label="Chain 1")
ax = chain2.x0.plot.line(alpha=0.7, label="Chain 2")
ax.set_ylabel(chain1.get_label('x0'))
ax.set_xlabel("sample")
ax.legend()
fig, axes = make_2d_axes(['x0', 'x1'], figsize=(5, 5))
chain1.plot_2d(axes, alpha=0.7, label="Chain 1")
chain2.plot_2d(axes, alpha=0.7, label="Chain 2")
axes.iloc[-1, 0].legend(bbox_to_anchor=(len(axes)/2, len(axes)), loc='lower center', ncol=2)
Remove burn-in
To get rid of the initial burn-in phase, you can use the
anesthetic.samples.MCMCSamples.remove_burn_in()
method:
mcmc_burnout = mcmc_samples.remove_burn_in(burn_in=0.1)
Positive burn_in
values are interpreted as the first samples to
remove, whereas negative burn_in
values are interpreted as the last
samples to keep. You can think of it in the usual python slicing mentality:
samples[burn_in:]
.
If 0 < abs(burn_in) < 1
then it is interpreted as a fraction of the total
number of samples in the respective chain.
To see how remove_burn_in
has removed the burn-in samples in both
chains, see the plot in the following section, alongside an assessment of
convergence.
Gelman–Rubin statistic
Another important issue when it comes to MCMC samples is assessing convergence. In anesthetic we have implemented the modified Gelman–Rubin statistic as described in Antony Lewis (2013). For the underlying (more theoretical) accounts of this statistic, see e.g. Gelman and Rubin (1992) and Brooks and Gelman (1998).
Provided you have an MCMC run containing multiple chains, you can compute the
Gelman–Rubin R-1
statistic using the
anesthetic.samples.MCMCSamples.Gelman_Rubin()
method:
Rminus1_old = mcmc_samples.Gelman_Rubin()
Rminus1_new = mcmc_burnout.Gelman_Rubin()
The following plot shows how remove_burn_in
gets rid of burn-in samples.
Note the stark difference in the Gelman–Rubin statistic, as listed in the
legend, depending on whether burn-in samples were removed or not.
fig, axes = make_2d_axes(['x0', 'x1'], figsize=(5, 5))
mcmc_samples.plot_2d(axes, alpha=0.7, label="Before burn-in removal, $R-1=%.3f$" % Rminus1_old)
mcmc_burnout.plot_2d(axes, alpha=0.7, label="After burn-in removal, $R-1=%.3f$" % Rminus1_new)
axes.iloc[-1, 0].legend(bbox_to_anchor=(len(axes)/2, len(axes)), loc='lower center')
Nested sampling statistics
Anesthetic really comes to the fore for nested sampling (for details on nested sampling we recommend John Skilling, 2006). We can do all of the above and more with the power that nested sampling chains provide.
from anesthetic import read_chains, make_2d_axes
nested_samples = read_chains("../../tests/example_data/pc")
nested_samples['y'] = nested_samples['x1'] * nested_samples['x0']
nested_samples.set_label('y', '$y=x_0 \\cdot x_1$')
Prior distribution
While MCMC explores effectively only the posterior bulk, nested sampling
explores the full parameter space, allowing us to calculate and plot not only
the posterior distribution, but also the prior distribution, which you can get with the anesthetic.samples.NestedSamples.prior()
method:
prior_samples = nested_samples.prior()
Note
Note that the .prior()
method is really just a shorthand for
.set_beta(beta=0)
, i.e. for setting the inverse temperature parameter
beta=0
(where 1/beta=kT
) in the
anesthetic.samples.NestedSamples.set_beta()
method, which allows you to
get the distribution at any temperature.
This allows us to plot both prior and posterior distributions together. Note,
how the prior is also computed for the derived parameter y
:
fig, axes = make_2d_axes(['x0', 'x1', 'y'])
prior_samples.plot_2d(axes, label="prior")
nested_samples.plot_2d(axes, label="posterior")
axes.iloc[-1, 0].legend(bbox_to_anchor=(len(axes)/2, len(axes)), loc='lower center', ncol=2)
Note, how the uniform priors on the parameters x0
and x1
lead to a
non-uniform prior on the derived parameter y
.
Note further the different colour gradient in the posterior contours and the
prior contours. While the iso-probability contour levels are defined by the
amount of probability mass they contain, the colours are assigned according to
the probability density in the contour. As such, the lower probability density
in the posterior tails is reflected in the lighter colour shading of the second
compared to the first contour level. In contrast, the uniform probability
density of the prior distributions of x0
and x1
is reflected in the
similar colour shading of both contour levels.
Bayesian statistics
Thanks to the power of nested sampling, we can compute Bayesian statistics from the nested samples, such as the following:
Bayesian (log-)evidence
anesthetic.samples.NestedSamples.logZ()
Kullback–Leibler (KL) divergence
anesthetic.samples.NestedSamples.D_KL()
Posterior average of the log-likelihood
anesthetic.samples.NestedSamples.logL_P()
(this connects Bayesian evidence with KL-divergence aslogZ = logL_P - D_KL
, allowing the interpretation of the Bayesian evidence as a trade-off between model fitlogL_P
and Occam penaltyD_KL
, see also our paper Hergt, Handley, Hobson, and Lasenby (2021))Gaussian model dimensionality
anesthetic.samples.NestedSamples.d_G()
(for more, see our paper Handley and Lemos (2019))All of the above in one go, using
anesthetic.samples.NestedSamples.stats()
By default (i.e. without passing any additional keywords) the mean values for these quantities are computed:
bayesian_means = nested_samples.stats()
Passing an integer number nsamples
will create a data frame of samples
reflecting the underlying distributions of the Bayesian statistics:
nsamples = 2000
bayesian_stats = nested_samples.stats(nsamples)
Since bayesian_stats
is an instance of anesthetic.samples.Samples
,
the same plotting functions can be used as for the posterior plots above.
Plotting the 2D distributions allows us to inspect the correlation between the
inferences:
fig, axes = make_2d_axes(['logZ', 'D_KL', 'logL_P', 'd_G'], upper=False)
bayesian_stats.plot_2d(axes);
for y, row in axes.iterrows():
for x, ax in row.items():
if x == y:
ax.set_title("%s$ = %.2g \\pm %.1g$"
% (bayesian_stats.get_label(x),
bayesian_stats[x].mean(),
bayesian_stats[x].std()),
fontsize='small')
Nested Sampling GUI
We can also set up an interactive plot, which allows us to replay a nested sampling run after the fact.
nested_samples.gui()