*************************************************************************************
Weighted samples: the ``Samples``, ``MCMCSamples``, and ``NestedSamples`` data frames
*************************************************************************************
The :class:`anesthetic.samples.Samples` data frame at its core is a
:class:`pandas.DataFrame`, and as such comes with all the pandas functionality.
The :class:`anesthetic.samples.MCMCSamples` and
:class:`anesthetic.samples.NestedSamples` data frames share all the
functionality of the more general ``Samples`` class, but come with some
additional MCMC or nested sampling specific functionality.
.. plot:: :context: close-figs
from anesthetic import read_chains, make_2d_axes
samples = read_chains("../../tests/example_data/pc_250")
General weighted sample functionality
=====================================
The important extension to the :class:`pandas.Series` and
:class:`pandas.DataFrame` classes, is that in anesthetic the data frames are
weighted (see :class:`anesthetic.weighted_pandas.WeightedSeries` and
:class:`anesthetic.weighted_pandas.WeightedDataFrame`), where the weights form
part of a :class:`pandas.MultiIndex` and are therefore retained when slicing.
Mean, standard deviation, median, quantiles, etc.
-------------------------------------------------
The weights are automatically taken into account in summary statistics such as
mean, variance, covariance, skewness, kurtosis, standard deviation, median,
quantiles, etc.
.. plot:: :context: close-figs
x0_median = samples.x0.median()
x1_mean = samples.x1.mean()
x1_std = samples.x1.std()
x2_min = samples.x2.min()
x2_95percentile = samples.x2.quantile(q=0.95)
.. plot:: :context: close-figs
fig, axes = make_2d_axes(['x0', 'x1', 'x2'], upper=False)
samples.plot_2d(axes, label=None)
axes.axlines({'x0': x0_median}, c='C1', label="median")
axes.axlines({'x1': x1_mean}, c='C2', label="mean")
axes.axspans({'x1': (x1_mean-x1_std, x1_mean+x1_std)}, c='C2', alpha=0.3, label="mean+-std")
axes.axspans({'x2': (x2_min, x2_95percentile)}, c='C3', alpha=0.3, label="95 percentile")
axes.iloc[0, 0].legend(bbox_to_anchor=(1, 1), loc='lower right')
axes.iloc[1, 1].legend(bbox_to_anchor=(1, 1), loc='lower right')
axes.iloc[2, 2].legend(bbox_to_anchor=(1, 1), loc='lower right')
Creating new from existing columns
-----------------------------------
We can define new parameters with relative ease. For example, given two
parameters ``x0`` and ``x1``, we can compute the derived parameter
``y = x0 * x1``:
.. plot:: :context: close-figs
samples['y'] = samples['x1'] * samples['x0']
samples.set_label('y', '$y=x_0 \\cdot x_1$')
samples.plot_2d(['x0', 'x1', 'y'])
|
MCMC statistics
===============
Markov Chain Monte Carlo (short MCMC) samples as the name states come from
Markov chains, and as such come with some MCMC specific properties and
potential issues, e.g. correlation of successive steps, a burn-in phase, or
questions of convergence.
We have an example data set (at the relative path
``anesthetic/tests/example_data/`` with the file root ``cb``) that emphasizes
potential MCMC issues. Note, while this was run with `Cobaya
`_, we had to actually put in some
effort to make Cobaya produce such a bad burn-in stage. With its usual
optimisation settings it normally produces much better results.
Chains
------
When MCMC data is read in, anesthetic automatically keeps track of multiple
chains that were run in parallel via the ``'chain'`` parameter. You can split
the chains into separate samples via the :meth:`pandas.DataFrame.groupby`
method:
.. plot:: :context: close-figs
from anesthetic import read_chains, make_2d_axes
mcmc_samples = read_chains("../../tests/example_data/cb")
chains = mcmc_samples.groupby(('chain', '$n_\\mathrm{chain}$'), group_keys=False)
chain1 = chains.get_group(1)
chain2 = chains.get_group(2).reset_index(drop=True)
For this example MCMC run the initial burn-in phase is very apparent, as can be
seen in the following two plots.
.. plot:: :context: close-figs
fig, ax = plt.subplots(figsize=(5, 3))
ax = chain1.x0.plot.line(alpha=0.7, label="Chain 1")
ax = chain2.x0.plot.line(alpha=0.7, label="Chain 2")
ax.set_ylabel(chain1.get_label('x0'))
ax.set_xlabel("sample")
ax.legend()
.. plot:: :context: close-figs
fig, axes = make_2d_axes(['x0', 'x1'], figsize=(5, 5))
chain1.plot_2d(axes, alpha=0.7, label="Chain 1")
chain2.plot_2d(axes, alpha=0.7, label="Chain 2")
axes.iloc[-1, 0].legend(bbox_to_anchor=(len(axes)/2, len(axes)), loc='lower center', ncol=2)
Remove burn-in
--------------
To get rid of the initial burn-in phase, you can use the
:meth:`anesthetic.samples.MCMCSamples.remove_burn_in` method:
.. plot:: :context: close-figs
mcmc_burnout = mcmc_samples.remove_burn_in(burn_in=0.1)
Positive ``burn_in`` values are interpreted as the *first* samples to
*remove*, whereas negative ``burn_in`` values are interpreted as the *last*
samples to *keep*. You can think of it in the usual python slicing mentality:
``samples[burn_in:]``.
If ``0 < abs(burn_in) < 1`` then it is interpreted as a fraction of the total
number of samples in the respective chain.
To see how ``remove_burn_in`` has removed the burn-in samples in both
chains, see the plot in the following section, alongside an assessment of
convergence.
Gelman--Rubin statistic
-----------------------
Another important issue when it comes to MCMC samples is assessing convergence.
In anesthetic we have implemented the modified Gelman--Rubin statistic as
described in `Antony Lewis (2013) `_. For the
underlying (more theoretical) accounts of this statistic, see e.g. `Gelman and
Rubin (1992) `_ and `Brooks and Gelman
(1998) `_.
Provided you have an MCMC run containing multiple chains, you can compute the
Gelman--Rubin ``R-1`` statistic using the
:meth:`anesthetic.samples.MCMCSamples.Gelman_Rubin` method:
.. plot:: :context: close-figs
Rminus1_old = mcmc_samples.Gelman_Rubin()
Rminus1_new = mcmc_burnout.Gelman_Rubin()
Rminus1_par = mcmc_burnout.Gelman_Rubin(per_param='par')
You can get the convergence per parameter by passing the keyword
``per_param='par'``. By passing ``per_param='cov'`` you will even get the
covariant part of the convergence of pairs of parameters.
The following plot shows how ``remove_burn_in`` gets rid of burn-in samples.
Note the stark difference in the Gelman--Rubin statistic, as listed in the
legend, depending on whether burn-in samples were removed or not.
.. plot:: :context: close-figs
fig, axes = make_2d_axes(['x0', 'x1'], figsize=(5, 5))
mcmc_samples.plot_2d(axes, alpha=0.7, label="Before burn-in removal, $R-1=%.3f$" % Rminus1_old)
mcmc_burnout.plot_2d(axes, alpha=0.7, label="After burn-in removal, $R-1=%.3f$" % Rminus1_new)
axes.iloc[-1, 0].legend(bbox_to_anchor=(len(axes)/2, len(axes)), loc='lower center')
.. note::
Unless you specify which parameters to compute the Gelman--Rubin statistic
for (by passing the keyword ``params``), anesthetic will use _all_
parameters in the data frame except those containing 'prior', 'chi2', or
'logL' in their name. So if you for example want to exclude derived
parameters, you should pass ``params`` directly.
|
Nested sampling statistics
==========================
Anesthetic really comes to the fore for nested sampling (for details on nested
sampling we recommend `John Skilling, 2006
`_). We can do all of the
above and more with the power that nested sampling chains provide.
.. plot:: :context: close-figs
from anesthetic import read_chains, make_2d_axes
nested_samples = read_chains("../../tests/example_data/pc")
nested_samples['y'] = nested_samples['x1'] * nested_samples['x0']
nested_samples.set_label('y', '$y=x_0 \\cdot x_1$')
Prior distribution
------------------
While MCMC explores effectively only the posterior bulk, nested sampling
explores the full parameter space, allowing us to calculate and plot not only
the posterior distribution, but also the prior distribution, which you can get with the :meth:`anesthetic.samples.NestedSamples.prior` method:
.. plot:: :context: close-figs
prior_samples = nested_samples.prior()
.. note::
Note that the ``.prior()`` method is really just a shorthand for
``.set_beta(beta=0)``, i.e. for setting the inverse temperature parameter
``beta=0`` (where ``1/beta=kT``) in the
:meth:`anesthetic.samples.NestedSamples.set_beta` method, which allows you to
get the distribution at any temperature.
This allows us to plot both prior and posterior distributions together. Note,
how the prior is also computed for the derived parameter ``y``:
.. plot:: :context: close-figs
fig, axes = make_2d_axes(['x0', 'x1', 'y'])
prior_samples.plot_2d(axes, label="prior")
nested_samples.plot_2d(axes, label="posterior")
axes.iloc[-1, 0].legend(bbox_to_anchor=(len(axes)/2, len(axes)), loc='lower center', ncol=2)
Note, how the uniform priors on the parameters ``x0`` and ``x1`` lead to a
non-uniform prior on the derived parameter ``y``.
Note further the different colour gradient in the posterior contours and the
prior contours. While the iso-probability contour levels are defined by the
amount of probability mass they contain, the colours are assigned according to
the probability density in the contour. As such, the lower probability density
in the posterior tails is reflected in the lighter colour shading of the second
compared to the first contour level. In contrast, the uniform probability
density of the prior distributions of ``x0`` and ``x1`` is reflected in the
similar colour shading of both contour levels.
Bayesian statistics
-------------------
.. role:: raw-html(raw)
:format: html
Thanks to the power of nested sampling, we can compute Bayesian statistics from
the nested samples, such as the following:
* Bayesian (log-)evidence :meth:`anesthetic.samples.NestedSamples.logZ`
* Kullback--Leibler (KL) divergence :meth:`anesthetic.samples.NestedSamples.D_KL`
* Posterior average of the log-likelihood
:meth:`anesthetic.samples.NestedSamples.logL_P`
:raw-html:`
`
(this connects Bayesian evidence with KL-divergence as
``logZ = logL_P - D_KL``, allowing the interpretation of the Bayesian
evidence as a trade-off between model fit ``logL_P`` and Occam penalty
``D_KL``, see also our paper `Hergt, Handley, Hobson, and Lasenby (2021)
`_)
* Gaussian model dimensionality :meth:`anesthetic.samples.NestedSamples.d_G`
:raw-html:`
`
(for more, see our paper `Handley and Lemos (2019)
`_)
* All of the above in one go, using :meth:`anesthetic.samples.NestedSamples.stats`
By default (i.e. without passing any additional keywords) the mean values for
these quantities are computed:
.. plot:: :context: close-figs
bayesian_means = nested_samples.stats()
Passing an integer number ``nsamples`` will create a data frame of samples
reflecting the underlying distributions of the Bayesian statistics:
.. plot:: :context: close-figs
nsamples = 2000
bayesian_stats = nested_samples.stats(nsamples)
Since ``bayesian_stats`` is an instance of :class:`anesthetic.samples.Samples`,
the same plotting functions can be used as for the posterior plots above.
Plotting the 2D distributions allows us to inspect the correlation between the
inferences:
.. plot:: :context: close-figs
fig, axes = make_2d_axes(['logZ', 'D_KL', 'logL_P', 'd_G'], upper=False)
bayesian_stats.plot_2d(axes);
for y, row in axes.iterrows():
for x, ax in row.items():
if x == y:
ax.set_title("%s$ = %.2g \\pm %.1g$"
% (bayesian_stats.get_label(x),
bayesian_stats[x].mean(),
bayesian_stats[x].std()),
fontsize='small')
Nested Sampling GUI
-------------------
We can also set up an interactive plot, which allows us to replay a nested
sampling run after the fact.
.. plot:: :context: close-figs
nested_samples.gui()