anesthetic package

Anesthetic: nested sampling post-processing.

anesthetic subpackages

anesthetic modules

anesthetic.boundary module

Boundary correction utilities.

Implements local-linear boundary correction for Gaussian kernel density estimates following:

M. C. Jones (1993), “Simple boundary correction for kernel density estimation”, Statistics and Computing, 3, 135-146. https://doi.org/10.1007/BF00147776

J. E. Chacón & T. Duong (2018), “Multivariate Kernel Smoothing and Its Applications”, Chapman and Hall/CRC, Chapter 4. https://doi.org/10.1201/9780429485572-4 https://www.researchgate.net/publication/345555871_Modified_density_estimation

anesthetic.boundary.boundary_correction_1d(kde, x, order=1, xmin=None, xmax=None)[source]

Boundary correction for a 1D Gaussian KDE.

Parameters:

kdescipy.stats.gaussian_kde

Fitted 1D KDE object.

xnp.array

Evaluation points.

xmin, xmaxfloat, optional

Lower/upper bounds.

orderint, default=1

Boundary correction order.

< 0: no correction — return raw KDE estimate.
0: renormalisation — O(h) bias.
1: linear correction (Jones 1993, Eq. 3.4) — O(h²) bias.

Returns:

pnp.array: Boundary-corrected density values.

anesthetic.boundary.boundary_correction_2d(kde, X, Y, order=1, xmin=None, xmax=None, ymin=None, ymax=None)[source]

Boundary correction for a 2D Gaussian KDE.

Applies renormalisation (order 0) or the full non-separable linear correction (order 1) using the full kernel covariance matrix.

Parameters:

kdescipy.stats.gaussian_kde

Fitted 2D KDE object.

X, Ynp.array

2D evaluation grids (from np.mgrid).

xmin, xmax, ymin, ymaxfloat, optional

Bounds per axis.

orderint, default=1

Boundary correction order.

< 0: no correction — return raw KDE estimate.
0: renormalisation — O(h) bias.
1: linear correction — O(h²) bias.

Returns:

pnp.array: Boundary-corrected 2D density, same shape as X and Y.

anesthetic.convert module

Tools for converting to other outputs.

anesthetic.convert.to_getdist(samples)[source]

Convert from anesthetic to getdist samples.

Parameters:

samplesanesthetic.samples.Samples: anesthetic samples to be converted

Returns:

getdist_samplesgetdist.mcsamples.MCSamples: getdist equivalent samples

anesthetic.kde module

Kernel density estimation tools.

These act as a wrapper around fastKDE, but could be replaced in future by alternative kernel density estimators

anesthetic.kde.fastkde_1d(d, xmin=None, xmax=None)[source]

Perform a one-dimensional kernel density estimation.

Wrapper around fastkde.fastKDE. Boundary corrections implemented by reflecting boundary conditions.

Parameters:

dnp.array: Data to perform kde on
xmin, xmaxfloat: lower/upper prior bounds optional, default None

Returns:

xnp.array: x-coordinates of kernel density estimates
pnp.array: kernel density estimates

anesthetic.kde.fastkde_2d(d_x, d_y, xmin=None, xmax=None, ymin=None, ymax=None)[source]

Perform a two-dimensional kernel density estimation.

Wrapper round fastkde.fastKDE. Boundary corrections implemented by reflecting boundary conditions.

Parameters:

d_x, d_ynp.array: x/y coordinates of data to perform kde on
xmin, xmax, ymin, ymaxfloat: lower/upper prior bounds in x/y coordinates optional, default None

Returns:

x, ynp.array: x/y-coordinates of kernel density estimates. One-dimensional array
pnp.array: kernel density estimates. Two-dimensional array

anesthetic.labelled_pandas module

Pandas DataFrame and Series with labelled columns.

class anesthetic.labelled_pandas.LabelledDataFrame(*args, **kwargs)[source]

Bases: _LabelledObject, DataFrame

Labelled version of pandas.DataFrame.

property T

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Parameters:

*argstuple, optional

Accepted for compatibility with NumPy.

copybool, default False

This keyword is now ignored; changing its value will have no impact on the method.

Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.

Deprecated since version 3.0.0: This keyword is ignored and will be removed in pandas 4.0. Since pandas 3.0, this method always returns a new object using a lazy copy mechanism that defers copies until necessary (Copy-on-Write). See the user guide on Copy-on-Write for more details.

Returns:

DataFrame: The transposed DataFrame.

anesthetic.plot module

Lower-level plotting tools.

Routines that may be of use to users wishing for more fine-grained control may wish to use.

anesthetic.plot.make_1d_axes()
anesthetic.plot.make_2d_axes()

to create a set of axes and legend proxies.

class anesthetic.plot.AxesDataFrame(data=None, index=None, columns=None, fig=None, lower=True, diagonal=True, upper=True, labels=None, ticks='inner', logx=None, logy=None, gridspec_kw=None, subplot_spec=None, *args, **kwargs)[source]

Bases: DataFrame

Anesthetic’s axes version of pandas.DataFrame.

Parameters:

indexlist(str): Parameters to be placed on the y-axes.
columnslist(str): Parameters to be placed on the x-axes.
figmatplotlib.figure.Figure
lower, diagonal, upperbool, default=True: Whether to create 2D marginalised plots above or below the diagonal, or to create a 1D marginalised plot on the diagonal.
labelsdict(str:str), optional: Dictionary mapping params to plot labels. Default: params
ticksstr, default=’inner’: If ‘outer’, plot ticks only on the very left and very bottom. If ‘inner’, plot ticks also in inner subplots. If None, plot no ticks at all.
logx, logylist(str), optional: Lists of parameters to be plotted on a log scale on the x-axis or y-axis, respectively.
gridspec_kwdict, optional: Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.
subplot_specmatplotlib.gridspec.GridSpec, default=None: GridSpec instance to plot array as part of a subfigure.

Methods

axlines:	Add vertical and horizontal lines across all axes.
axspans:	Add vertical and horizontal spans across all axes.
scatter:	Add scatter points across all axes.
set_labels:	Set the labels for the axes.
set_margins:	Set margins across all axes.
tick_params:	Set tick parameters across all axes.

axlines(params, lower=True, diagonal=True, upper=True, **kwargs)[source]

Add vertical and horizontal lines across all axes.

Parameters:

paramsdict(array_like): Dictionary of parameter labels and desired values. Can provide more than one value per label.
lower, diagonal, upperbool, default=True: Whether to plot the lines on the lower, diagonal, and/or upper triangle plots.
kwargs: Any kwarg that can be passed to matplotlib.axes.Axes.axvline() or matplotlib.axes.Axes.axhline().

axspans(params, lower=True, diagonal=True, upper=True, **kwargs)[source]

Add vertical and horizontal spans across all axes.

Parameters:

paramsdict(array_like(2-tuple)): Dictionary of parameter labels and desired value tuples. Can provide more than one value tuple per label. Each value tuple provides the min and max value for an axis span.
lower, diagonal, upperbool, default=True: Whether to plot the spans on the lower, diagonal, and/or upper triangle plots.
kwargs: Any kwarg that can be passed to matplotlib.axes.Axes.axvspan() or matplotlib.axes.Axes.axhspan().

scatter(params, lower=True, upper=True, **kwargs)[source]

Add scatter points across all axes.

Parameters:

paramsdict(array_like): Dictionary of parameter labels and desired values. Can provide more than one value per label, but length has to match for all parameter labels.
lower, upperbool, default=True: Whether to plot the spans on the lower and/or upper triangle plots.
kwargs: Any kwarg that can be passed to matplotlib.axes.Axes.scatter().

set_labels(labels, **kwargs)[source]

Set the labels for the axes.

Parameters:

labelsdict: Dictionary of the axes labels.
kwargs: Any kwarg that can be passed to matplotlib.axes.Axes.set_xlabel() or matplotlib.axes.Axes.set_ylabel().

set_margins(m)[source]: Apply matplotlib.axes.Axes.set_xmargin() across all axes.

tick_params(*args, **kwargs)[source]: Apply matplotlib.axes.Axes.tick_params() across all axes.

class anesthetic.plot.AxesSeries(data=None, index=None, fig=None, ncol=None, labels=None, logx=None, gridspec_kw=None, subplot_spec=None, *args, **kwargs)[source]

Bases: Series

Anesthetic’s axes version of pandas.Series.

Parameters:

indexlist(str): Parameters to be placed on the y-axes.
figmatplotlib.figure.Figure
ncolint: Number of axes columns. Decides after how many axes the AxesSeries is split to continue in a new row.
labelsdict(str:str), optional: Dictionary mapping params to plot labels. Default: params
logxlist(str), optional: List of parameters to be plotted on a log scale.
gridspec_kwdict, optional: Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.
subplot_specmatplotlib.gridspec.GridSpec, default=None: GridSpec instance to plot array as part of a subfigure.

Methods

set_xlabels:	Set the labels for the x-axes.
tick_params:	Set tick parameters across all axes.

static axes_series(index, fig, ncol=None, gridspec_kw=None, subplot_spec=None)[source]: Set up subplots for AxesSeries.

set_xlabels(labels, **kwargs)[source]

Set the labels for the x-axes.

Parameters:

labelsdict: Dictionary of the axes labels.
kwargs: Any kwarg that can be passed to matplotlib.axes.Axes.set_xlabel().

tick_params(*args, **kwargs)[source]: Apply matplotlib.axes.Axes.tick_params() across all axes.

anesthetic.plot.basic_cmap(color)[source]: Construct basic colormap a single color.

anesthetic.plot.fastkde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot a 2d marginalised distribution as contours.

This functions as a wrapper around matplotlib.axes.Axes.contour(), and matplotlib.axes.Axes.contourf() with a kernel density estimation (KDE) computation in-between. All remaining keyword arguments are passed onwards to both functions.

Parameters:

axmatplotlib.axes.Axes: Axis object to plot on.
data_x, data_ynp.array: The x and y coordinates of uniformly weighted samples to generate kernel density estimator.
levelslist: Amount of mass within each iso-probability contour. Has to be ordered from outermost to innermost contour. Default: [0.95, 0.68]
xmin, xmax, ymin, ymaxfloat, default=None: The lower/upper prior bounds in x/y coordinates.

Returns:

cmatplotlib.contour.QuadContourSet: A set of contourlines or filled regions.

anesthetic.plot.fastkde_plot_1d(ax, data, *args, **kwargs)[source]

Plot a 1d marginalised distribution.

This functions as a wrapper around matplotlib.axes.Axes.plot(), with a kernel density estimation (KDE) computation provided by the package fastkde in-between. All remaining keyword arguments are passed onwards.

Parameters:

axmatplotlib.axes.Axes

Axis object to plot on.

datanp.array

Uniformly weighted samples to generate kernel density estimator.

xmin, xmaxfloat, default=None

lower/upper prior bound

levelslist

Values at which to draw iso-probability lines. Optional, Default: [0.95, 0.68]

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

0: full data range, i.e. q=0 –> quantile range (0, 1)
int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)
float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)
tuple: quantile range, e.g. (0.16, 0.84)

facecolorbool or string, default=False

If set to True then the 1d plot will be shaded with the value of the color kwarg. Set to a string such as ‘blue’, ‘k’, ‘r’, ‘C1’ ect. to define the color of the shading directly.

Returns:

linesmatplotlib.lines.Line2D: A list of line objects representing the plotted data (same as matplotlib.axes.Axes.plot() command).

anesthetic.plot.hist_plot_1d(ax, data, *args, **kwargs)[source]

Plot a 1d histogram.

This functions is a wrapper around matplotlib.axes.Axes.hist(). All remaining keyword arguments are passed onwards.

Parameters:

axmatplotlib.axes.Axes

Axis object to plot on.

datanp.array

Samples to generate histogram from

weightsnp.array, optional

Sample weights.

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

0: full data range, i.e. q=0 –> quantile range (0, 1)
int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)
float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)
tuple: quantile range, e.g. (0.16, 0.84)

Returns:

patcheslist or list of lists: Silent list of individual patches used to create the histogram or list of such list if multiple input datasets.

Other Parameters:

**kwargsmatplotlib.axes.Axes.hist() properties

anesthetic.plot.hist_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot a 2d marginalised distribution as a histogram.

This functions as a wrapper around matplotlib.axes.Axes.hist2d().

Parameters:

axmatplotlib.axes.Axes

Axis object to plot on.

data_x, data_ynp.array

The x and y coordinates of uniformly weighted samples to generate a two-dimensional histogram.

levelslist, default=None

Shade iso-probability contours containing these levels of probability mass. If None defaults to usual matplotlib.axes.Axes.hist2d() colouring.

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

0: full data range, i.e. q=0 –> quantile range (0, 1)
int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)
float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)
tuple: quantile range, e.g. (0.16, 0.84)

Returns:

cmatplotlib.collections.QuadMesh: A set of colors.

anesthetic.plot.kde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot a 2d marginalised distribution as contours.

This functions as a wrapper around matplotlib.axes.Axes.contour() and matplotlib.axes.Axes.contourf() with a kernel density estimation (KDE) computation provided by scipy.stats.gaussian_kde in-between. All remaining keyword arguments are passed onwards to both functions.

Parameters:

axmatplotlib.axes.Axes

Axis object to plot on.

data_x, data_ynp.array

The x and y coordinates of uniformly weighted samples to generate kernel density estimator.

weightsnp.array, optional

Sample weights.

levelslist, optional

Amount of mass within each iso-probability contour. Has to be ordered from outermost to innermost contour. Default: [0.95, 0.68]

ncompressint, str, default=’equal’

Degree of compression.

If int: desired number of samples after compression.
If False: no compression.
If True: compresses to the channel capacity, equivalent to ncompress='entropy'.
If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

nplot_2dint, default=1000

Number of plotting points to use.

bw_methodstr, scalar or callable, optional

Forwarded to scipy.stats.gaussian_kde.

bw_scalefloat, default=1

Scales the bandwidth relative to the automatically computed one by scipy.stats.gaussian_kde. A value greater 1 will smooth more, a value smaller 1 will smooth less.

Returns:

cmatplotlib.contour.QuadContourSet: A set of contourlines or filled regions.

anesthetic.plot.kde_plot_1d(ax, data, *args, **kwargs)[source]

Plot a 1d marginalised distribution.

This functions as a wrapper around matplotlib.axes.Axes.plot(), with a kernel density estimation computation provided by scipy.stats.gaussian_kde in-between. All remaining keyword arguments are passed onwards.

Parameters:

axmatplotlib.axes.Axes

Axis object to plot on.

datanp.array

Samples to generate kernel density estimator.

weightsnp.array, optional

Sample weights.

ncompressint, str, default=False

Degree of compression.

If False: no compression.
If True: compresses to the channel capacity, equivalent to ncompress='entropy'.
If int: desired number of samples after compression.
If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

nplot_1dint, default=100

Number of plotting points to use.

levelslist

Values at which to draw iso-probability lines. Default: [0.95, 0.68]

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

0: full data range, i.e. q=0 –> quantile range (0, 1)
int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)
float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)
tuple: quantile range, e.g. (0.16, 0.84)

facecolorbool or string, default=False

If set to True then the 1d plot will be shaded with the value of the color kwarg. Set to a string such as ‘blue’, ‘k’, ‘r’, ‘C1’ ect. to define the color of the shading directly.

bw_methodstr, scalar or callable, optional

Forwarded to scipy.stats.gaussian_kde.

bw_scalefloat, default=1

Scales the bandwidth relative to the automatically computed one by scipy.stats.gaussian_kde. A value greater 1 will smooth more, a value smaller 1 will smooth less.

betaint, float, default = 1

The value of beta used to calculate the number of effective samples

Returns:

linesmatplotlib.lines.Line2D: A list of line objects representing the plotted data (same as matplotlib.axes.Axes.plot() command).

anesthetic.plot.make_1d_axes(params, ncol=None, labels=None, logx=None, gridspec_kw=None, subplot_spec=None, **fig_kw)[source]

Create a set of axes for plotting 1D marginalised posteriors.

Parameters:

paramslist(str): names of parameters.
ncolint: Number of columns of the subplot grid. Default: ceil(sqrt(num_params))
labelsdict(str:str), optional: Dictionary mapping params to plot labels. Default: params
logxlist(str), optional: List of parameters to be plotted on a log scale.
gridspec_kwdict, optional: Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.
subplot_specmatplotlib.gridspec.GridSpec, default=None: GridSpec instance to plot array as part of a subfigure.
**fig_kw: All additional keyword arguments are passed to the matplotlib.pyplot.figure() call. Or directly pass the figure to plot on via the keyword ‘fig’.

Returns:

figmatplotlib.figure.Figure: New or original (if supplied) figure object.
axesanesthetic.plot.AxesSeries: Pandas array of axes objects.

anesthetic.plot.make_2d_axes(params, labels=None, lower=True, diagonal=True, upper=True, ticks='inner', logx=None, logy=None, gridspec_kw=None, subplot_spec=None, **fig_kw)[source]

Create a set of axes for plotting 2D marginalised posteriors.

Parameters:

paramslists of parameters

Can be either:

list(str) if the x and y axes are the same
[list(str), list(str)] if the x and y axes are different

Strings indicate the names of the parameters.

labelsdict(str:str), optional

Dictionary mapping params to plot labels. Default: params

lower, diagonal, upperlogical, default=True

Whether to create 2D marginalised plots above or below the diagonal, or to create a 1D marginalised plot on the diagonal.

ticksstr, default=’inner’

Can be one of ‘outer’, ‘inner’, or None.

'outer': plot ticks only on the very left and very bottom.
'inner': plot ticks also in inner subplots.
None: plot no ticks at all.

logx, logylist(str), optional

Lists of parameters to be plotted on a log scale on the x-axis or y-axis, respectively.

gridspec_kwdict, optional

Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.

subplot_specmatplotlib.gridspec.GridSpec, default=None

GridSpec instance to plot array as part of a subfigure.

**fig_kw

All additional keyword arguments are passed to the matplotlib.pyplot.figure() call. Or directly pass the figure to plot on via the keyword ‘fig’.

Returns:

figmatplotlib.figure.Figure: New or original (if supplied) figure object.
axesanesthetic.plot.AxesDataFrame: Pandas array of axes objects.

anesthetic.plot.normalize_kwargs(kwargs, alias_mapping=None, drop=None)[source]

Normalize kwarg inputs.

Works the same way as matplotlib.cbook.normalize_kwargs(), but additionally allows to drop kwargs.

anesthetic.plot.quantile_plot_interval(q)[source]: Interpret quantile q input to quantile plot range tuple.

anesthetic.plot.scatter_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot samples from a 2d marginalised distribution.

This functions as a wrapper around matplotlib.axes.Axes.plot(), enforcing any prior bounds. All remaining keyword arguments are passed onwards.

Parameters:

axmatplotlib.axes.Axes

axis object to plot on

data_x, data_ynp.array

x and y coordinates of uniformly weighted samples to plot.

ncompressint, str, default=’equal’

Degree of compression.

If int: desired number of samples after compression.
If False: no compression.
If True: compresses to the channel capacity, equivalent to ncompress='entropy'.
If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

Returns:

linesmatplotlib.lines.Line2D: A list of line objects representing the plotted data (same as matplotlib.axes.Axes.plot() command).

anesthetic.plot.set_colors(c, fc, ec, cmap)[source]: Navigate interplay between possible color inputs {c, fc, ec, cmap}.

anesthetic.samples module

Main classes for the anesthetic module.

anesthetic.samples.Samples
anesthetic.samples.MCMCSamples
anesthetic.samples.NestedSamples

class anesthetic.samples.MCMCSamples(*args, **kwargs)[source]

Bases: Samples

Storage and plotting tools for MCMC samples.

Any new functionality specific to MCMC (e.g. convergence criteria etc.) should be put here.

Parameters:

datanp.array: Coordinates of samples. shape = (nsamples, ndims).
columnsarray-like: reference names of parameters
weightsnp.array: weights of samples.
logLnp.array: loglikelihoods of samples.
labelsdict or array-like: mapping from columns to plotting labels
labelstr: Legend label
logzerofloat, default=-1e30: The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf.

Gelman_Rubin(params=None, per_param=False)[source]

Gelman–Rubin convergence statistic of multiple MCMC chains.

Determine the Gelman–Rubin convergence statistic R-1 by computing and comparing the within-chain variance and the between-chain variance. This follows the routine as outlined in Lewis (2013), section IV.A.

Note that this requires more than one chain. To circumvent this, you could overwrite the 'chain' column, splitting the samples into two or more sets.

Parameters:

paramslist(str)

List of column names (i.e. parameters) to be included in the convergence calculation. Default: all parameters (except those parameters that contain ‘prior’, ‘chi2’, or ‘logL’ in their names)

per_parambool or str, default=False

Whether to return the per-parameter convergence statistic R-1.

If False: returns only the total convergence statistic.
If True: returns the total convergence statistic and the per-parameter convergence statistic.
If 'par': returns only the per-parameter convergence statistic.
If 'cov': returns only the per-parameter covariant convergence statistic.
If 'all': returns the total convergence statistic and the per-parameter covariant convergence statistic.

Returns:

Rminus1float: Total Gelman–Rubin convergence statistic R-1. The smaller, the better converged. Aiming for Rminus1~0.01 should normally work well.
Rminus1_parpandas.DataFrame: Per-parameter Gelman–Rubin convergence statistic.
Rminus1_covpandas.DataFrame: Per-parameter covariant Gelman–Rubin convergence statistic.

remove_burn_in(burn_in, reset_index=False, inplace=False)[source]

Remove burn-in samples from each MCMC chain.

Parameters:

burn_inint or float or array_like

Fraction or number of samples to remove or keep:

if 0 < burn_in < 1: remove first fraction of samples
elif 1 < burn_in: remove first number of samples
elif -1 < burn_in < 0: keep last fraction of samples
elif burn_in < -1: keep last number of samples
elif type(burn_in)==list: different burn-in for each chain

reset_indexbool, default=False

Whether to reset the index counter to start at zero or not.

inplacebool, default=False

Indicates whether to modify the existing array or return a copy.

class anesthetic.samples.NestedSamples(*args, **kwargs)[source]

Bases: Samples

Storage and plotting tools for Nested Sampling samples.

We extend the Samples class with the additional methods:

self.live_points(logL)
self.set_beta(beta)
self.prior()
self.posterior_points(beta)
self.prior_points()
self.stats()
self.logZ()
self.D_KL()
self.d()
self.recompute()
self.gui()
self.importance_sample()

Parameters:

datanp.array: Coordinates of samples. shape = (nsamples, ndims).
columnslist(str): reference names of parameters
logLnp.array: loglikelihoods of samples.
logL_birthnp.array or int: birth loglikelihoods, or number of live points.
labelsdict: optional mapping from column names to plot labels
labelstr: Legend label default: basename of root
betafloat: thermodynamic inverse temperature default: 1.
logzerofloat: The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf. default: -1e30

D(nsamples=None)[source]

D_KL(nsamples=None, beta=None)[source]

Kullback–Leibler divergence.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:

if nsamples is array-like:: pandas.Series, index nsamples.columns
elif beta is scalar and nsamples is None:: float
elif beta is array-like and nsamples is None:: pandas.Series, index beta
elif beta is scalar and nsamples is int:: pandas.Series, index range(nsamples)
elif beta is array-like and nsamples is int:: pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

property beta: Thermodynamic inverse temperature.

beta_max()[source]

Maximum numerically stable beta value.

Returns the beta value where the ratio between the maximum and second-maximum likelihood weights equals the maximum representable floating-point number. Beyond this beta, numerical precision is lost.

Returns:

beta_maxfloat: Maximum safe beta value based on floating-point precision

beta_min()[source]

Minimum meaningful beta value.

Returns the beta value where likelihood weight differences become comparable to machine precision. Below this beta, all weights are effectively equal (uniform distribution).

Returns:

beta_minfloat: Minimum meaningful beta value based on floating-point precision

contour(logL=None)[source]

Convert contour from (index or None) to a float loglikelihood.

Convention is that live points are inclusive of the contour.

Helper function for:

NestedSamples.live_points,
NestedSamples.dead_points,
NestedSamples.truncate.

Parameters:

logLfloat or int, optional: Loglikelihood or iteration number If not provided, return the contour containing the last set of live points.

Returns:

logLfloat: Loglikelihood of contour

d(nsamples=None)[source]

d_G(nsamples=None, beta=None)[source]

Bayesian model dimensionality.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:

if nsamples is array-like:: pandas.Series, index nsamples.columns
elif beta is scalar and nsamples is None:: float
elif beta is array-like and nsamples is None:: pandas.Series, index beta
elif beta is scalar and nsamples is int:: pandas.Series, index range(nsamples)
elif beta is array-like and nsamples is int:: pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

dead_points(logL=None)[source]

Get the dead points at a given contour.

Convention is that dead points are exclusive of the contour.

Parameters:

logLfloat or int, optional: Loglikelihood or iteration number to return dead points. If not provided, return the last set of dead points.

Returns:

dead_pointsSamples

Dead points at either:

contour logL (if input is float)
ith iteration (if input is integer)
last set of dead points if no argument provided

dlogX(nsamples=None)[source]

gui(params=None)[source]: Construct a graphical user interface for viewing samples.

importance_sample(logL_new, action='add', inplace=False)[source]

Perform importance re-weighting on the log-likelihood.

Parameters:

logL_newnp.array

New log-likelihood values. Should have the same shape as logL.

actionstr, default=’add’

Can be any of {‘add’, ‘replace’, ‘mask’}.

add: Add the new logL_new to the current logL.
replace: Replace the current logL with the new logL_new.
mask: treat logL_new as a boolean mask and only keep the corresponding (True) samples.

inplacebool, optional

Indicates whether to modify the existing array, or return a new frame with importance sampling applied. default: False

Returns:

samplesNestedSamples: Importance re-weighted samples.

live_points(logL=None)[source]

Get the live points within a contour.

Parameters:

logLfloat or int, optional: Loglikelihood or iteration number to return live points. If not provided, return the last set of active live points.

Returns:

live_pointsSamples

Live points at either:

contour logL (if input is float)
ith iteration (if input is integer)
last set of live points if no argument provided

logL_P(nsamples=None, beta=None)[source]

Posterior averaged loglikelihood.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:

if nsamples is array-like:: pandas.Series, index nsamples.columns
elif beta is scalar and nsamples is None:: float
elif beta is array-like and nsamples is None:: pandas.Series, index beta
elif beta is scalar and nsamples is int:: pandas.Series, index range(nsamples)
elif beta is array-like and nsamples is int:: pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

logX(nsamples=None)[source]

Log-Volume.

The log of the prior volume contained within each iso-likelihood contour.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

Returns:

if nsamples is None:: WeightedSeries like self
elif nsamples is int:: WeightedDataFrame like self, columns range(nsamples)

logZ(nsamples=None, beta=None)[source]

Log-Evidence.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:

if nsamples is array-like:: pandas.Series, index nsamples.columns
elif beta is scalar and nsamples is None:: float
elif beta is array-like and nsamples is None:: pandas.Series, index beta
elif beta is scalar and nsamples is int:: pandas.Series, index range(nsamples)
elif beta is array-like and nsamples is int:: pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

logdX(nsamples=None)[source]

Compute volume of shell of loglikelihood.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

Returns:

if nsamples is None:: WeightedSeries like self
elif nsamples is int:: WeightedDataFrame like self, columns range(nsamples)

logw(nsamples=None, beta=None)[source]

Log-nested sampling weight.

The logarithm of the (unnormalised) sampling weight log(L**beta*dX).

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw and returned (implementation convenience functionality)

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:

if nsamples is array-like:: WeightedDataFrame equal to nsamples
elif beta is scalar and nsamples is None:: WeightedSeries like self
elif beta is array-like and nsamples is None:: WeightedDataFrame like self, columns of beta
elif beta is scalar and nsamples is int:: WeightedDataFrame like self, columns of range(nsamples)
elif beta is array-like and nsamples is int:: WeightedDataFrame like self, MultiIndex columns the product of beta and range(nsamples)

ns_output(*args, **kwargs)[source]

posterior_points(beta=1)[source]: Get equally weighted posterior points at temperature beta.

prior(inplace=False)[source]: Re-weight samples at infinite temperature to get prior samples.

prior_points(params=None)[source]: Get equally weighted prior points.

recompute(logL_birth=None, inplace=False)[source]

Re-calculate the nested sampling contours and live points.

Parameters:

logL_birtharray-like or int, optional

array-like: the birth contours.
int: the number of live points.
default: use the existing birth contours to compute nlive

inplacebool, default=False

Indicates whether to modify the existing array, or return a new frame with contours resorted and nlive recomputed

set_beta(beta, inplace=False)[source]

Change the inverse temperature.

Parameters:

betafloat: Inverse temperature to set. (beta=0 corresponds to the prior distribution.)
inplacebool, default=False: Indicates whether to modify the existing array, or return a copy with the inverse temperature changed.

stats(nsamples=None, beta=None, norm=None)[source]

Compute Nested Sampling statistics.

Using nested sampling we can compute:

logZ: Bayesian evidence

\[\log Z = \int L \pi d\theta\]
D_KL: Kullback–Leibler divergence

\[D_{KL} = \int P \log(P / \pi) d\theta\]
logL_P: posterior averaged log-likelihood

\[\langle\log L\rangle_P = \int P \log L d\theta\]
d_G: Gaussian model dimensionality (or posterior variance of the log-likelihood)

\[d_G/2 = \langle(\log L)^2\rangle_P - \langle\log L\rangle_P^2\]

see Handley and Lemos (2019) for more details on model dimensionalities.

(Note that all of these are available as individual functions with the same signature.)

In addition to point estimates nested sampling provides an error bar or more generally samples from a (correlated) distribution over the variables. Samples from this distribution can be computed by providing an integer nsamples.

Nested sampling as an athermal algorithm is also capable of producing these as a function of inverse thermodynamic temperature beta. This is provided as a vectorised function. If nsamples is also provided a MultiIndex dataframe is generated.

These obey Occam’s razor equation:

\[\log Z = \langle\log L\rangle_P - D_{KL},\]

which splits a model’s quality logZ into a goodness-of-fit logL_P and a complexity penalty D_KL. See Hergt et al. (2021) for more detail.

Parameters:

nsamplesint, optional

If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

normSeries, Samples, optional

NestedSamples.stats() output used for normalisation. Can be either a Series of mean values or Samples produced with matching nsamples and beta. In addition to the columns [‘logZ’, ‘D_KL’, ‘logL_P’, ‘d_G’], this adds the normalised versions [‘Delta_logZ’, ‘Delta_D_KL’, ‘Delta_logL_P’, ‘Delta_d_G’].

Returns:

if beta is scalar and nsamples is None:: Series, index [‘logZ’, ‘d_G’, ‘D_KL’, ‘logL_P’]
elif beta is scalar and nsamples is int:: Samples, index range(nsamples), columns [‘logZ’, ‘d_G’, ‘D_KL’, ‘logL_P’]
elif beta is array-like and nsamples is None:: Samples, index beta, columns [‘logZ’, ‘d_G’, ‘D_KL’, ‘logL_P’]
elif beta is array-like and nsamples is int:: Samples, index pandas.MultiIndex the product of beta and range(nsamples) columns [‘logZ’, ‘d_G’, ‘D_KL’, ‘logL_P’]

truncate(logL=None)[source]

Truncate the run at a given contour.

Returns the union of the live_points and dead_points.

Parameters:

logLfloat or int, optional: Loglikelihood or iteration number to truncate run. If not provided, truncate at the last set of dead points.

Returns:

truncated_runNestedSamples

Run truncated at either:

contour logL (if input is float)
ith iteration (if input is integer)
last set of dead points if no argument provided

class anesthetic.samples.Samples(*args, **kwargs)[source]

Bases: WeightedLabelledDataFrame

Storage and plotting tools for general samples.

Extends the pandas.DataFrame by providing plotting methods and standardising sample storage.

Example plotting commands include

samples.plot_1d(['paramA', 'paramB'])
samples.plot_2d(['paramA', 'paramB'])
samples.plot_2d([['paramA', 'paramB'], ['paramC', 'paramD']])

Parameters:

datanp.array: Coordinates of samples. shape = (nsamples, ndims).
columnslist(str): reference names of parameters
weightsnp.array: weights of samples.
logLnp.array: loglikelihoods of samples.
labelsdict or array-like: mapping from columns to plotting labels
labelstr: Legend label
logzerofloat, default=-1e30: The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf.

compress(ncompress=True, axis=0, weighted=True)[source]

Reduce the number of samples by discarding low-weights.

Parameters:

ncompressint, float, str, default=True

Degree of compression.

If True (default): reduce to the channel capacity (theoretical optimum compression), equivalent to ncompress='entropy'.
If > 0: desired number of samples after compression.
If <= 0: compress so that all remaining weights are unity.
If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

weightedbool, default=False

If False (default), return an unweighted object with potentially repeated samples. If True, return a weighted object with non-zero compressed weights.

Returns:

samplesSamples: Compressed samples (preserving input distribution). Downcast from MCMCSamples or NestedSamples, since MCMC- or nested-sampling-specific information is lost during compression.

importance_sample(logL_new, action='add', inplace=False)[source]

Perform importance re-weighting on the log-likelihood.

Parameters:

logL_newnp.array

New log-likelihood values. Should have the same shape as logL.

actionstr, default=’add’

Can be any of {‘add’, ‘replace’, ‘mask’}.

add: Add the new logL_new to the current logL.
replace: Replace the current logL with the new logL_new.
mask: treat logL_new as a boolean mask and only keep the corresponding (True) samples.

inplacebool, default=False

Indicates whether to modify the existing array, or return a new frame with importance sampling applied.

Returns:

samplesSamples/MCMCSamples/NestedSamples: Importance re-weighted samples.

plot_1d(axes=None, *args, **kwargs)[source]

Create an array of 1D plots.

Parameters:

axesplotting axes, optional

Can be:

list(str) or str
pandas.Series of matplotlib.axes.Axes

If a pandas.Series is provided as an existing set of axes, then this is used for creating the plot. Otherwise, a new set of axes are created using the list or lists of strings.

If not provided, then all parameters are plotted. This is intended for plotting a sliced array (e.g. samples[[‘x0’,’x1]].plot_1d().

kindstr, default=’kde_1d’

What kind of plots to produce. Alongside the usual pandas options {‘hist’, ‘box’, ‘kde’, ‘density’}, anesthetic also provides

‘hist_1d’: anesthetic.plot.hist_plot_1d()
‘kde_1d’: anesthetic.plot.kde_plot_1d()
‘fastkde_1d’: anesthetic.plot.fastkde_plot_1d()

Warning – while the other pandas plotting options {‘line’, ‘bar’, ‘barh’, ‘area’, ‘pie’} are also accessible, these can be hard to interpret/expensive for Samples, MCMCSamples, or NestedSamples.

logxlist(str), optional

Which parameters/columns to plot on a log scale. Needs to match if plotting on top of a pre-existing axes.

labelstr, optional

Legend label added to each axis.

Returns:

axespandas.Series of matplotlib.axes.Axes: Pandas array of axes objects

plot_2d(axes=None, *args, **kwargs)[source]

Create an array of 2D plots.

To avoid interfering with y-axis sharing, one-dimensional plots are created on a separate axis, which is monkey-patched onto the argument ax as the attribute ax.twin.

Parameters:

axesplotting axes, optional

Can be:

list(str) if the x and y axes are the same
[list(str),list(str)] if the x and y axes are different
pandas.DataFrame of matplotlib.axes.Axes

If a pandas.DataFrame is provided as an existing set of axes, then this is used for creating the plot. Otherwise, a new set of axes are created using the list or lists of strings.

If not provided, then all parameters are plotted. This is intended for plotting a sliced array (e.g. samples[[‘x0’,’x1]].plot_2d(). It is not advisible to plot an entire frame, as it is computationally expensive, and liable to run into linear algebra errors for degenerate derived parameters.

kind/kindsdict, optional

What kinds of plots to produce. Dictionary takes the keys ‘diagonal’ for the 1D plots and ‘lower’ and ‘upper’ for the 2D plots. The options for ‘diagonal’ are:

‘kde_1d’: anesthetic.plot.kde_plot_1d()

‘hist_1d’: anesthetic.plot.hist_plot_1d()

‘fastkde_1d’: anesthetic.plot.fastkde_plot_1d()

‘kde’: pandas.Series.plot.kde()

‘hist’: pandas.Series.plot.hist()

‘box’: pandas.Series.plot.box()

‘density’: pandas.Series.plot.density()

The options for ‘lower’ and ‘upper’ are:

‘kde_2d’: anesthetic.plot.kde_contour_plot_2d()

‘hist_2d’: anesthetic.plot.hist_plot_2d()

‘scatter_2d’: anesthetic.plot.scatter_plot_2d()

‘fastkde_2d’: anesthetic.plot.fastkde_contour_plot_2d()

‘kde’: pandas.DataFrame.plot.kde()

‘scatter’: pandas.DataFrame.plot.scatter()

‘hexbin’: pandas.DataFrame.plot.hexbin()

There are also a set of shortcuts provided in plot_2d_default_kinds:

‘kde_1d’: 1d kde plots down the diagonal

‘kde_2d’: 2d kde plots in lower triangle

‘kde’: 1d & 2d kde plots in lower & diagonal

‘hist_1d’: 1d histograms down the diagonal

‘hist_2d’: 2d histograms in lower triangle

‘hist’: 1d & 2d histograms in lower & diagonal

‘scatter_2d’: 2d scatter in lower triangle

‘scatter’: 1d histograms down diagonal
& 2d scatter in lower triangle

Feel free to add your own to this list! Default: {‘diagonal’: ‘kde_1d’, ‘lower’: ‘kde_2d’, ‘upper’:’scatter_2d’}

diagonal_kwargs, lower_kwargs, upper_kwargsdict, optional

kwargs for the diagonal (1D)/lower or upper (2D) plots. This is useful when there is a conflict of kwargs for different kinds of plots. Note that any kwargs directly passed to plot_2d will overwrite any kwarg with the same key passed to <sub>_kwargs. Default: {}

logx, logylist(str), optional

Which parameters/columns to plot on a log scale for the x-axis and y-axis, respectively. Needs to match if plotting on top of a pre-existing axes.

labelstr, optional

Legend label added to each axis.

Returns:

axespandas.DataFrame of matplotlib.axes.Axes: Pandas array of axes objects

plot_2d_default_kinds = {'default': {'diagonal': 'kde_1d', 'lower': 'kde_2d', 'upper': 'scatter_2d'}, 'fastkde': {'diagonal': 'fastkde_1d', 'lower': 'fastkde_2d'}, 'hist': {'diagonal': 'hist_1d', 'lower': 'hist_2d'}, 'hist_1d': {'diagonal': 'hist_1d'}, 'hist_2d': {'lower': 'hist_2d'}, 'kde': {'diagonal': 'kde_1d', 'lower': 'kde_2d'}, 'kde_1d': {'diagonal': 'kde_1d'}, 'kde_2d': {'lower': 'kde_2d'}, 'scatter': {'diagonal': 'hist_1d', 'lower': 'scatter_2d'}, 'scatter_2d': {'lower': 'scatter_2d'}}

property tex

to_hdf(path_or_buf, key, *args, **kwargs)[source]

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another pandas.DataFrame or Series to an existing HDF file please use append mode and a different a key.

Warning

One can store a subclass of pandas.DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

For more information see the user guide.

Parameters:

path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.

complevel{0-9}, default None

Specifies a compression level for data. A value of 0 or None disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

appendbool, default False

For Table formats, append the input data to the existing.

format{‘fixed’, ‘table’, None}, default ‘fixed’

Possible values:

‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.

indexbool, default True

Write pandas.DataFrame index as a column.

min_itemsizedict or int, optional

Map column names to minimum string sizes for columns.

nan_repAny, optional

How to represent null values as str. Not allowed with append=True.

dropnabool, default False, optional

Remove missing values.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open for a full list of options.

encodingstr, default “UTF-8”

Set character encoding.

Examples

>>> df = pandas.DataFrame(
...     {"A": [1, 2, 3], "B": [4, 5, 6]}, index=["a", "b", "c"]
... )
>>> df.to_hdf("data.h5", key="df", mode="w")

We can add another object to the same file:

>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf("data.h5", key="s")

Reading from HDF file:

>>> pandas.read_hdf("data.h5", "df")
A  B
a  1  4
b  2  5
c  3  6
>>> pandas.read_hdf("data.h5", "s")
0    1
1    2
2    3
3    4
dtype: int64

anesthetic.samples.merge_nested_samples(runs)[source]

Merge one or more nested sampling runs.

Parameters:

runslist(NestedSamples): List or array-like of one or more nested sampling runs. If only a single run is provided, this recalculates the live points and as such can be used for masked runs.

Returns:

samplesNestedSamples: Merged run.

anesthetic.samples.merge_samples_weighted(samples, weights=None, label=None)[source]

Merge sets of samples with weights.

Combine two (or more) samples so the new PDF is P(x|new) = weight_A P(x|A) + weight_B P(x|B). The number of samples and internal weights do not affect the result.

Parameters:

sampleslist(NestedSamples) or list(MCMCSamples): List or array-like of one or more MCMC or nested sampling runs.
weightslist(double) or None: Weight for each run in samples (normalized internally). Can be omitted if samples are NestedSamples, then exp(logZ) is used as weight.
labelstr or None, default=None: Label for the new samples.

Returns:

new_samplesSamples: Merged (weighted) run.

anesthetic.scripts module

Command-line scripts for anesthetic.

anesthetic.scripts.gui(args=None)[source]

Launch the anesthetic GUI.

See anesthetic.gui.plot.RunPlotter for details.

anesthetic.tension module

Tension statistics between two or more datasets.

anesthetic.tension.tension_stats(joint, *separate)[source]

Compute tension statistics between two or more samples.

With the Bayesian (log-)evidence logZ, Kullback–Leibler divergence D_KL, posterior average of the log-likelihood logL_P, Gaussian model dimensionality d_G, we can compute tension statistics between two or more samples (example here for simplicity just with two datasets A and B):

logR: R statistic for dataset consistency.

\[\ln R = \ln Z_{AB} - \ln Z_{A} - \ln Z_{B}\]
I: Mutual information estimate between data and params: $I(\Theta,A,B)$.

\[\hat{I} = D_{KL}^{A} + D_{KL}^{B} - D_{KL}^{AB}\]
logS: Suspiciousness.

\[\ln S = \ln L_{AB} - \ln L_{A} - \ln L_{B}\]
d_G: Gaussian model dimensionality of shared constrained parameters.

\[d = d_{A} + d_{B} - d_{AB}\]
p: p-value for the tension between two samples based on logS.

\[p = \int_{d-2\ln{S}}^{\infty} \chi^2_d(x) dx\]
sigma: Tension quantification in terms of numbers of sigma calculated from p.

\[\sqrt{2} {\rm erfc}^{-1}(p)\]

Parameters:

jointanesthetic.samples.Samples: Bayesian stats from a nested sampling run using all the datasets from the list in separate jointly. This should be a stats object with columns [‘logZ’, ‘D_KL’, ‘logL_P’, ‘d_G’] as returned by anesthetic.samples.NestedSamples.stats().
*separate: A variable number of Bayesian stats from independent nested sampling runs using various datasets (A, B, …) separately. Each should be a stats object with the columns [‘logZ’, ‘D_KL’, ‘logL_P’, ‘d_G’] as returned by anesthetic.samples.NestedSamples.stats().

Returns:

samplesanesthetic.samples.Samples: DataFrame containing the following tension statistics in columns: [‘logR’, ‘I’, ‘logS’, ‘d_G’, ‘p’, ‘sigma’]

anesthetic.testing module

Anesthetic testing utilities.

anesthetic.testing.assert_frame_equal(left, right, *args, **kwargs)[source]: Assert frames are equal, including metadata.

anesthetic.utils module

Data-processing utility functions.

anesthetic.utils.adjust_docstrings(obj, pattern, repl, *args, **kwargs)[source]

Adjust the docstrings of a class using regular expressions.

After the first argument, the remaining arguments are identical to re.sub.

Parameters:

clsclass: class to adjust
patternstr: regular expression pattern
replstr: replacement string

anesthetic.utils.compress_weights(w, u=None, ncompress=True)[source]: Compresses weights to their approximate channel capacity.

anesthetic.utils.compute_insertion_indexes(death, birth)[source]

Compute the live point insertion index for each point.

For more detail, see Fowlie et al. (2020)

Parameters:

death, birtharray-like: list of birth and death contours

Returns:

indexesnp.array: live point index at which each live point was inserted

anesthetic.utils.compute_nlive(death, birth)[source]

Compute number of live points from birth and death contours.

Parameters:

death, birtharray-like: list of birth and death contours

Returns:

nlivenp.array: number of live points at each contour

anesthetic.utils.cov_unbiased(a, w, ddof=1, return_corr=False)[source]

Compute the unbiased covariance from weighted samples.

Parameters:

anp.array, shape (n_samples, n_features): Input samples in rows, features/parameters in columns.
wnp.array, shape (n_samples,): Associated sample weights. Integer -> frequency weights; Float -> reliability weights.
ddofint, default=1: Delta degrees of freedom.

Returns:

covndarray, shape (n_features, n_features): Unbiased covariance matrix.

anesthetic.utils.credibility_interval(samples, weights=None, level=0.68, method='iso-pdf', return_covariance=False, nsamples=12)[source]

Compute the credibility interval of weighted samples.

Based on linear interpolation of the cumulative density function, thus expect discretisation errors on the scale of distances between samples.

https://github.com/Stefan-Heimersheim/fastCI#readme

Parameters:

samplesarray

Samples to compute the credibility interval of.

weightsarray, default=np.ones_like(samples)

Weights corresponding to samples.

levelfloat, default=0.68

Credibility level (probability, <1).

methodstr, default=’iso-pdf’

Which definition of interval to use:

'iso-pdf': Calculate iso probability density interval with the same probability density at each end. Also known as waterline-interval or highest average posterior density interval. This is only accurate if the distribution is sufficiently uni-modal.
'lower-limit'/'upper-limit': Lower/upper limit. One-sided limits for which level fraction of the (equally weighted) samples lie above/below the limit.
'equal-tailed': Equal-tailed interval with the same fraction of (equally weighted) samples below and above the interval region.

return_covariance: bool, default=False

Return the covariance of the sampled limits, in addition to the mean

nsamplesint, default=12

Number of CDF samples to improve mean and std estimate.

Returns:

limit(s)float, array, or tuple of floats or arrays: Returns the credibility interval boundari(es). By default, returns the mean over nsamples samples, which is either two numbers (method='iso-pdf'/'equal-tailed') or one number (method='lower-limit'/'upper-limit'). If return_covariance=True, returns a tuple (mean(s), covariance) where covariance is the covariance over the sampled limits.

anesthetic.utils.histogram(a, **kwargs)[source]

Produce a histogram for path-based plotting.

This is a cheap histogram. Necessary if one wants to update the histogram dynamically, and redrawing and filling is very expensive.

This has the same arguments and keywords as numpy.histogram(), but is normalised to 1.

anesthetic.utils.histogram_bin_edges(samples, weights, bins='fd', range=None, beta='equal')[source]

Compute a good number of bins dynamically from weighted samples.

Parameters:

samplesarray_like

Input data.

weightsarray-like

Array of sample weights.

binsstr, default=’fd’

String defining the rule used to automatically compute a good number of bins for the weighted samples:

‘fd’ : Freedman–Diaconis rule (modified for weighted data)

‘scott’ : Scott’s rule (modified for weighted data)

‘sqrt’ : Square root estimator (modified for weighted data)

range(float, float), optional

The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max()). Values outside the range are ignored. The first element of the range must be less than or equal to the second.

betafloat, default=’equal’

The value of beta>0 used to calculate the number of effective samples via neff().

Returns:

bin_edgesarray of dtype float: The edges to pass to numpy.histogram().

anesthetic.utils.insertion_p_value(indexes, nlive, batch=0)[source]

Compute the p-value from insertion indexes, assuming constant nlive.

Note that this function doesn’t use scipy.stats.kstest() as the latter assumes continuous distributions.

For more detail, see Fowlie et al. (2020)

For a rolling test, you should provide the optional parameter batch!=0. In this case the test computes the p-value on consecutive batches of size nlive * batch, selects the smallest one and adjusts for multiple comparisons using a Bonferroni correction.

Parameters:

indexesarray-like: list of insertion indexes, sorted by death contour
nliveint: number of live points
batchfloat: batch size in units of nlive for a rolling p-value

Returns:

ks_resultdict

Kolmogorov-Smirnov test results:

D: Kolmogorov-Smirnov statistic

sample_size: sample size

p-value: p-value

if batch != 0:

iterations: bounds of batch with minimum p-value

nbatches: the number of batches in total

uncorrected p-value: p-value without Bonferroni correction

anesthetic.utils.is_int(x)[source]: Test whether x is an integer.

anesthetic.utils.iso_probability_contours(pdf, contours=[0.95, 0.68])[source]: Compute the iso-probability contour values.

anesthetic.utils.iso_probability_contours_from_samples(pdf, contours=[0.95, 0.68], weights=None)[source]: Compute the iso-probability contour values.

anesthetic.utils.kurt_unbiased(a, w, axis=0)[source]

Compute the unbiased kurtosis from weighted samples.

Adapted from Lorenzo Rimoldini (2013): https://arxiv.org/pdf/1304.6564

Parameters:

anp.array: Input samples.
wnp.array: Associated sample weights. Integer -> frequency weights; Float -> reliability weights.
axisint: Axis along which to compute kurtosis.

Returns:

kurtfloat, np.array: Unbiased kurtosis (G2).

anesthetic.utils.logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False)[source]

Compute the log of the sum of exponentials of input elements.

This function has the same call signature as scipy.special.logsumexp() and mirrors scipy’s behaviour except for -np.inf input. If a and b are both -inf then scipy’s function will output nan whereas here we use:

\[\lim_{x \to -\infty} x \exp(x) = 0\]

Thus, if a=-inf in log(sum(b * exp(a)) then we can set b=0 such that that term is ignored in the sum.

anesthetic.utils.match_contour_to_contourf(contours, vmin, vmax)[source]

Get needed vmin, vmax to match contour colors to contourf colors.

contourf uses the arithmetic mean of contour levels to assign colors, whereas contour uses the contour level directly. To get the same colors for contour lines as for contourf faces, we need some fiddly algebra.

anesthetic.utils.mirror_1d(d, xmin=None, xmax=None)[source]: If necessary apply reflecting boundary conditions.

anesthetic.utils.mirror_2d(d_x_, d_y_, xmin=None, xmax=None, ymin=None, ymax=None)[source]: If necessary apply reflecting boundary conditions.

anesthetic.utils.neff(w, beta=1)[source]

Calculate effective number of samples.

Using the Huggins-Roy family of effective samples (https://aakinshin.net/posts/huggins-roy-ess/).

Parameters:

betaint, float, str, default = 1

The value of beta used to calculate the number of effective samples according to

\[ \begin{align}\begin{aligned}N_{eff} &= \bigg(\sum_{i=0}^n w_i^\beta \bigg)^{\frac{1}{1-\beta}}\\w_i &= \frac{w_i}{\sum_j w_j}\end{aligned}\end{align} \]

Beta can take any positive value. Larger beta corresponds to a greater compression such that:

\[\beta_1 < \beta_2 \Rightarrow N_{eff}(\beta_1) > N_{eff}(\beta_2)\]

Alternatively, beta can take one of the following strings as input:

If ‘inf’ or ‘equal’ is supplied (equivalent to beta=inf), then the resulting number of samples is the number of samples when compressed to equal weights, and given by:

\[ \begin{align}\begin{aligned}w_i &= \frac{w_i}{\sum_j w_j}\\N_{eff} &= \frac{1}{\max_i[w_i]}\end{aligned}\end{align} \]

If ‘entropy’ is supplied (equivalent to beta=1), then the estimate is determined via the entropy based calculation, also referred to as the channel capacity:

\[ \begin{align}\begin{aligned}H &= -\sum_i p_i \ln p_i\\p_i &= \frac{w_i}{\sum_j w_j}\\N_{eff} &= e^{H}\end{aligned}\end{align} \]

If ‘kish’ is supplied (equivalent to beta=2), then a Kish estimate is computed (Kish, Leslie (1965). Survey Sampling. New York: John Wiley & Sons, Inc. ISBN 0-471-10949-5):

\[N_{eff} = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}\]

str(float) input gets converted to the corresponding float value.

anesthetic.utils.nest_level(lst)[source]: Calculate the nesting level of a list.

anesthetic.utils.quantile(a, q, w=None, interpolation='linear')[source]: Compute the weighted quantile for a one dimensional array.

anesthetic.utils.sample_cdf(samples, inverse=False, interpolation='linear')[source]: Sample the empirical cdf for a 1d array.

anesthetic.utils.sample_compression_1d(x, w=None, ncompress=True)[source]

Histogram a 1D set of weighted samples via subsampling.

This compresses the number of samples, combining weights.

Parameters:

xarray-like

x coordinate of samples for compressing

wpandas.Series, optional

weights of samples

ncompressint, default=True

Degree of compression.

If int: number of samples returned.
If True: compresses to the channel capacity (same as ncompress='entropy').
If False: no compression.
If str: determine number from the Huggins-Roy family of effective samples in neff() with beta=ncompress.

Returns:

x, w: array-like: Compressed samples and weights

anesthetic.utils.scaled_triangulation(x, y, cov)[source]

Triangulation scaled by a covariance matrix.

Parameters:

x, yarray-like: x and y coordinates of samples
covarray-like, 2d: Covariance matrix for scaling

Returns:

matplotlib.tri.Triangulation: Triangulation with the appropriate scaling

anesthetic.utils.skew_unbiased(a, w, axis=0)[source]

Compute the unbiased skewness from weighted samples.

Adapted from Lorenzo Rimoldini (2013): https://arxiv.org/pdf/1304.6564

Parameters:

anp.array: Input samples.
wnp.array: Associated sample weights. Integer -> frequency weights; Float -> reliability weights.
axisint: Axis along which to compute kurtosis.

Returns:

skewfloat, np.array: Unbiased skewness (G1).

anesthetic.utils.temporary_seed(seed)[source]: Context for temporarily setting a numpy seed.

anesthetic.utils.triangular_sample_compression_2d(x, y, cov, w=None, n=1000)[source]

Histogram a 2D set of weighted samples via triangulation.

This defines bins via a triangulation of the subsamples and sums weights within triangles surrounding each point

Parameters:

x, yarray-like: x and y coordinates of samples for compressing
covarray-like, 2d: Covariance matrix for scaling
wpandas.Series, optional: weights of samples
nint, default=1000: number of samples returned.

Returns:

tri: matplotlib.tri.Triangulation with an appropriate scaling
warray-like: Compressed samples and weights

anesthetic.utils.unique(a)[source]: Find unique elements, retaining order.

anesthetic.utils.var_unbiased(a, w, axis=0, ddof=1)[source]

Compute the unbiased variance from weighted samples.

Uses the standard reliability-weight correction: var = s2 / (v1 - v2/v1) (for ddof=1),
and supports the frequency-weight case by using: var = s2 / (v1 - ddof) when w is integer and v1 > 1.

Parameters:

anp.array: Input samples.
wnp.array: Associated sample weights. Integer -> frequency weights; Float -> reliability weights.
axisint: Axis along which to compute variance.
ddofint, default=1: Delta degrees of freedom.

Returns:

varfloat, np.array: Unbiased variance.

anesthetic.weighted_labelled_pandas module

Pandas DataFrame with weights and labels.

class anesthetic.weighted_labelled_pandas.WeightedLabelledDataFrame(*args, **kwargs)[source]

Bases: WeightedDataFrame, LabelledDataFrame

pandas.DataFrame with weights and labels.

drop_labels(axis=1)[source]: Drop the labels from an axis if present.

get_label(param, axis=1)[source]: Retrieve mapping from paramnames to labels from an axis.

get_labels(axis=1)[source]: Retrieve labels from an axis.

get_labels_map(axis=1, fill=True)[source]: Retrieve mapping from paramnames to labels from an axis.

islabelled(axis=1)[source]: Search for existence of labels.

set_label(param, value, axis=1)[source]: Set a specific label to a specific value on an axis.

set_labels(labels, axis=1, inplace=False, level=None)[source]: Set labels along an axis.

class anesthetic.weighted_labelled_pandas.WeightedLabelledSeries(*args, **kwargs)[source]

Bases: WeightedSeries, LabelledSeries

Series with weights and labels.

set_label(param, value, axis=0)[source]: Set a specific label to a specific value.

anesthetic.weighted_labelled_pandas.read_csv(filename, *args, **kwargs)[source]: Read a CSV file into a WeightedLabelledDataFrame.

anesthetic.weighted_pandas module

Pandas DataFrame and Series with weighted samples.

class anesthetic.weighted_pandas.WeightedDataFrame(*args, **kwargs)[source]

Weighted version of pandas.DataFrame.

compress(ncompress=True, axis=0, weighted=False)[source]

Reduce the number of samples by discarding low-weights.

Parameters:

ncompressint, float, str, default=True

Degree of compression.

If True (default): reduce to the channel capacity (theoretical optimum compression), equivalent to ncompress='entropy'.
If > 0: desired number of samples after compression.
If <= 0: compress so that all remaining weights are unity.
If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

weightedbool, default=False

If False (default), return an unweighted object with potentially repeated samples. If True, return a weighted object with non-zero compressed weights.

corr(**kwargs)[source]

Compute pairwise correlation of columns, excluding NA/null values.

Parameters:

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

numeric_onlybool, default False

Include only float, int or boolean data.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

WeightedDataFrame: Correlation matrix.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.WeightedDataFrame(
...     [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)],
...     columns=["dogs", "cats"],
... )
>>> df.corr(method=histogram_intersection)
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0

>>> df = pd.WeightedDataFrame(
...     [(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], columns=["dogs", "cats"]
... )
>>> df.corr(min_periods=3)
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0

corrwith(other, axis=0, drop=False, **kwargs)[source]

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of WeightedDataFrame with rows or columns of WeightedSeries or WeightedDataFrame. WeightedDataFrames are first aligned along both axes before computing the correlations.

Parameters:

otherWeightedDataFrame, WeightedSeries

Object with which to compute correlations.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.

dropbool, default False

Drop missing indices from result.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: callable with input two 1d ndarrays
and returning a float.

numeric_onlybool, default False

Include only float, int or boolean data.

min_periodsint, optional

Minimum number of observations needed to have a valid result.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

WeightedSeries: Pairwise correlations.

Examples

>>> index = ["a", "b", "c", "d", "e"]
>>> columns = ["one", "two", "three", "four"]
>>> df1 = pd.WeightedDataFrame(
...     np.arange(20).reshape(5, 4), index=index, columns=columns
... )
>>> df2 = pd.WeightedDataFrame(
...     np.arange(16).reshape(4, 4), index=index[:4], columns=columns
... )
>>> df1.corrwith(df2)
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64

>>> df2.corrwith(df1, axis=1)
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64

cov(ddof=1, **kwargs)[source]

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a WeightedDataFrame. The returned data frame is the covariance matrix of the columns of the WeightedDataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:

min_periodsint, optional: Minimum number of observations required per pair of columns to have a valid result.
ddofint, default 1: Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.
numeric_onlybool, default False: Include only float, int or boolean data.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

WeightedDataFrame: The covariance matrix of the series of the WeightedDataFrame.

Examples

>>> df = pd.WeightedDataFrame(
...     [(1, 2), (0, 3), (2, 0), (1, 1)], columns=["dogs", "cats"]
... )
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667

>>> np.random.seed(42)
>>> df = pd.WeightedDataFrame(
...     np.random.randn(1000, 5), columns=["a", "b", "c", "d", "e"]
... )
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.WeightedDataFrame(np.random.randn(20, 3), columns=["a", "b", "c"])
>>> df.loc[df.index[:5], "a"] = np.nan
>>> df.loc[df.index[5:10], "b"] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202

credibility_interval(level=0.68, method='iso-pdf', return_covariance=False, nsamples=12)[source]

Compute the credibility interval of the weighted samples.

Based on linear interpolation of the cumulative density function, thus expect discretisation errors on the scale of distances between samples.

https://github.com/Stefan-Heimersheim/fastCI#readme

Parameters:

levelfloat, default=0.68

Credibility level (probability, <1).

methodstr, default=’iso-pdf’

Which definition of interval to use:

'iso-pdf': Calculate iso probability density interval with the same probability density at each end. Also known as waterline-interval or highest average posterior density interval. This is only accurate if the distribution is sufficiently uni-modal.
'lower-limit'/'upper-limit': Lower/upper limit. One-sided limits for which level fraction of the (equally weighted) samples lie above/below the limit.
'equal-tailed': Equal-tailed interval with the same fraction of (equally weighted) samples below and above the interval region.

return_covariance: bool, default=False

Return the covariance of the sampled limits, in addition to the mean

nsamplesint, default=12

Number of CDF samples to improve mean and std estimate.

Returns:

limit(s)float, array, or tuple of floats or arrays: Returns the credibility interval boundaries for each column. By default, returns the mean over nsamples samples, which is either two numbers (method='iso-pdf'/'equal-tailed') or one number (method='lower-limit'/'upper-limit'). If return_covariance=True, returns a tuple (means, covariances) where covariances are the covariance over the sampled limits for each column.

groupby(by=None, axis=<no_default>, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]

Group WeightedDataFrame using a mapper or by a WeightedSeries of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:

bymapping, function, label, pd.Grouper or list of such: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or WeightedSeries is passed, the WeightedSeries or dict VALUES will be used to determine the groups (the WeightedSeries’ values are first aligned; see .align() method). If a list or ndarray of length equal to the number of rows is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
levelint, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.
as_indexbool, default True: Return object with group labels as the index. Only relevant for WeightedDataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).
sortbool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original WeightedDataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.
group_keysbool, default True: When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 2.0.0: group_keys now defaults to True.
observedbool, default True: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Changed in version 3.0.0: The default value is now True.
dropnabool, default True: If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

pandas.api.typing.WeightedDataFrameGroupBy: Returns a groupby object that contains information about the groups.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "Animal": ["Falcon", "Falcon", "Parrot", "Parrot"],
...         "Max Speed": [380.0, 370.0, 24.0, 26.0],
...     }
... )
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(["Animal"]).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [
...     ["Falcon", "Falcon", "Parrot", "Parrot"],
...     ["Captive", "Wild", "Captive", "Wild"],
... ]
>>> index = pd.MultiIndex.from_arrays(arrays, names=("Animal", "Type"))
>>> df = pd.WeightedDataFrame({"Max Speed": [390.0, 350.0, 30.0, 20.0]}, index=index)
>>> df
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level="Type").mean()
         Max Speed
Type
Captive      210.0
Wild         185.0

We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True.

>>> arr = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.WeightedDataFrame(arr, columns=["a", "b", "c"])

>>> df.groupby(by=["b"]).sum()
    a   c
b
1.0 2   3
2.0 2   5

>>> df.groupby(by=["b"], dropna=False).sum()
    a   c
b
1.0 2   3
2.0 2   5
NaN 1   4

>>> arr = [["a", 12, 12], [None, 12.3, 33.0], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.WeightedDataFrame(arr, columns=["a", "b", "c"])

>>> df.groupby(by="a").sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0

>>> df.groupby(by="a", dropna=False).sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0

When using .apply(), use group_keys to include or exclude the group keys. The group_keys argument defaults to True (include).

>>> df = pd.WeightedDataFrame(
...     {
...         "Animal": ["Falcon", "Falcon", "Parrot", "Parrot"],
...         "Max Speed": [380.0, 370.0, 24.0, 26.0],
...     }
... )
>>> df.groupby("Animal", group_keys=True)[["Max Speed"]].apply(lambda x: x)
          Max Speed
Animal
Falcon 0      380.0
       1      370.0
Parrot 2       24.0
       3       26.0

>>> df.groupby("Animal", group_keys=False)[["Max Speed"]].apply(lambda x: x)
   Max Speed
0      380.0
1      370.0
2       24.0
3       26.0

kurt(axis=0, skipna=True, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

WeightedSeries or scalar: Unbiased kurtosis over requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=["cat", "dog", "dog", "mouse"])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame(
...     {"a": [1, 2, 2, 3], "b": [3, 4, 4, 4]},
...     index=["cat", "dog", "dog", "mouse"],
... )
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None)
-0.9886927196984727

Using axis=1

>>> df = pd.WeightedDataFrame(
...     {"a": [1, 2], "b": [3, 4], "c": [3, 4], "d": [1, 2]},
...     index=["cat", "dog"],
... )
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

kurtosis(**kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

WeightedSeries or scalar: Unbiased kurtosis over requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=["cat", "dog", "dog", "mouse"])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame(
...     {"a": [1, 2, 2, 3], "b": [3, 4, 4, 4]},
...     index=["cat", "dog", "dog", "mouse"],
... )
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None)
-0.9886927196984727

Using axis=1

>>> df = pd.WeightedDataFrame(
...     {"a": [1, 2], "b": [3, 4], "c": [3, 4], "d": [1, 2]},
...     index=["cat", "dog"],
... )
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

mean(axis=0, skipna=True, **kwargs)[source]

Return the mean of the values over the requested axis.

Parameters:

axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

WeightedSeries or scalar: Value containing the calculation referenced in the description.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.mean()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": [2, 3]}, index=["tiger", "zebra"])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.mean()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.mean(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": ["T", "Z"]}, index=["tiger", "zebra"])
>>> df.mean(numeric_only=True)
a   1.5
dtype: float64

median(**kwargs)[source]

Return the median of the values over the requested axis.

Parameters:

axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

WeightedSeries or scalar: Value containing the calculation referenced in the description.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.median()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": [2, 3]}, index=["tiger", "zebra"])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": ["T", "Z"]}, index=["tiger", "zebra"])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64

quantile(q=0.5, axis=0, numeric_only=None, interpolation='linear', method=None)[source]

Return values at the given quantile over requested axis.

Parameters:

qfloat or array-like, default 0.5 (50% quantile)

Value between 0 <= q <= 1, the quantile(s) to compute.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

numeric_onlybool, default False

Include only float, int or boolean data.

Changed in version 2.0.0: The default value of numeric_only is now False.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
lower: i.
higher: j.
nearest: i or j whichever is nearest.
midpoint: (i + j) / 2.

method{‘single’, ‘table’}, default ‘single’

Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). When ‘table’, the only allowed interpolation methods are ‘nearest’, ‘lower’, and ‘higher’.

Returns:

WeightedSeries or WeightedDataFrame

If q is an array, a WeightedDataFrame will be returned where the: index is q, the columns are the columns of self, and the values are the quantiles.
If q is a float, a WeightedSeries will be returned where the: index is the columns of self and the values are the quantiles.

Examples

>>> df = pd.WeightedDataFrame(
...     np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=["a", "b"]
... )
>>> df.quantile(0.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([0.1, 0.5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying method=’table’ will compute the quantile over all columns.

>>> df.quantile(0.1, method="table", interpolation="nearest")
a    1
b    1
Name: 0.1, dtype: int64
>>> df.quantile([0.1, 0.5], method="table", interpolation="nearest")
     a    b
0.1  1    1
0.5  3  100

Specifying numeric_only=False will compute the quantiles for all columns.

>>> df = pd.WeightedDataFrame(
...     {
...         "A": [1, 2],
...         "B": [pd.Timestamp("2010"), pd.Timestamp("2011")],
...         "C": [pd.Timedelta("1 days"), pd.Timedelta("2 days")],
...     }
... )
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object

sample(*args, **kwargs)[source]

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:

nint, optional: Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
fracfloat, optional: Fraction of axis items to return. Cannot be used with n.
replacebool, default False: Allow or disallow sampling of the same row more than once.
weightsstr or ndarray-like, optional: Default None results in equal probability weighting. If passed a WeightedSeries, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a WeightedDataFrame, will accept the name of a column when axis = 0. Unless weights are a WeightedSeries, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed. When replace = False will not allow (n * max(weights) / sum(weights)) > 1 in order to avoid biased results. See the Notes below for more details.
random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional: If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given. Default None results in sampling with the current state of np.random.
axis{0 or ‘index’, 1 or ‘columns’, None}, default None: Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For WeightedSeries this parameter is unused and defaults to None.
ignore_indexbool, default False: If True, the resulting index will be labeled 0, 1, …, n - 1.

Returns:

WeightedSeries or WeightedDataFrame: A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "num_legs": [2, 4, 8, 0],
...         "num_wings": [2, 0, 0, 0],
...         "num_specimen_seen": [10, 2, 1, 8],
...     },
...     index=["falcon", "dog", "spider", "fish"],
... )
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the WeightedSeries df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df["num_legs"].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the WeightedDataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the WeightedDataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a WeightedDataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights="num_specimen_seen", random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8

sem(axis=0, skipna=True, **kwargs)[source]

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:

axis{index (0), columns (1)}: For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default False: Include only float, int, boolean columns. Not implemented for WeightedSeries.
**kwargs: Additional keywords passed.

Returns:

WeightedSeries or WeightedDataFrame (if level specified): Unbiased standard error of the mean over requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> round(s.sem(), 6)
0.57735

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": [2, 3]}, index=["tiger", "zebra"])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.sem()
a   0.5
b   0.5
dtype: float64

Using axis=1

>>> df.sem(axis=1)
tiger   0.5
zebra   0.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": ["T", "Z"]}, index=["tiger", "zebra"])
>>> df.sem(numeric_only=True)
a   0.5
dtype: float64

skew(axis=0, skipna=True, **kwargs)[source]

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:

axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

WeightedSeries or scalar: Unbiased skew over requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.skew()
0.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame(
...     {"a": [1, 2, 3], "b": [2, 3, 4], "c": [1, 3, 5]},
...     index=["tiger", "zebra", "cow"],
... )
>>> df
        a   b   c
tiger   1   2   1
zebra   2   3   3
cow     3   4   5
>>> df.skew()
a   0.0
b   0.0
c   0.0
dtype: float64

Using axis=1

>>> df.skew(axis=1)
tiger   1.732051
zebra  -1.732051
cow     0.000000
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame(
...     {"a": [1, 2, 3], "b": ["T", "Z", "X"]}, index=["tiger", "zebra", "cow"]
... )
>>> df.skew(numeric_only=True)
a   0.0
dtype: float64

std(axis=0, skipna=True, **kwargs)[source]

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis{index (0), columns (1)}: For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default False: Include only float, int, boolean columns. Not implemented for WeightedSeries.
**kwargsdict: Additional keyword arguments to be passed to the function.

Returns:

WeightedSeries or scalar: Standard deviation over requested axis.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "person_id": [0, 1, 2, 3],
...         "age": [21, 25, 62, 43],
...         "height": [1.61, 1.87, 1.49, 2.01],
...     }
... ).set_index("person_id")
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

The standard deviation of the columns can be found as follows:

>>> df.std()
age       18.786076
height     0.237417
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.std(ddof=0)
age       16.269219
height     0.205609
dtype: float64

var(axis=0, skipna=True, **kwargs)[source]

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis{index (0), columns (1)}: For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default False: Include only float, int, boolean columns. Not implemented for WeightedSeries.
**kwargs: Additional keywords passed.

Returns:

WeightedSeries or scalaer: Unbiased variance over requested axis.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "person_id": [0, 1, 2, 3],
...         "age": [21, 25, 62, 43],
...         "height": [1.61, 1.87, 1.49, 2.01],
...     }
... ).set_index("person_id")
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

>>> df.var()
age       352.916667
height      0.056367
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.var(ddof=0)
age       264.687500
height      0.042275
dtype: float64

class anesthetic.weighted_pandas.WeightedDataFrameGroupBy(*args, **kwargs)[source]

Weighted version of pandas.core.groupby.DataFrameGroupBy.

cov(*args, **kwargs)[source]

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a WeightedDataFrame. The returned data frame is the covariance matrix of the columns of the WeightedDataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:

min_periodsint, optional: Minimum number of observations required per pair of columns to have a valid result.
ddofint, default 1: Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.
numeric_onlybool, default False: Include only float, int or boolean data.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

WeightedDataFrame: The covariance matrix of the series of the WeightedDataFrame.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "age": [2, 3, 4, 6, 6, 1, 2, 1],
...         "weight": [2.1, 3.2, 4.1, 6.5, 3.3, 2.1, 4.1, 1.9],
...         "pet": ["dog", "cat", "dog", "cat", "dog", "cat", "dog", "cat"],
...     }
... )
>>> df
   age  weight  pet
0    2     2.1  dog
1    3     3.2  cat
2    4     4.1  dog
3    6     6.5  cat
4    6     3.3  dog
5    1     2.1  cat
6    2     4.1  dog
7    1     1.9  cat
>>> df.groupby("pet").cov()
                 age    weight
pet
cat age     5.583333  4.975000
    weight  4.975000  4.529167
dog age     3.666667  0.333333
    weight  0.333333  0.893333

get_weights()[source]: Return the weights of the grouped samples.

sample(*args, **kwargs)[source]

Return a random sample of items from each group.

You can use random_state for reproducibility.

Parameters:

nint, optional: Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.
fracfloat, optional: Fraction of items to return. Cannot be used with n.
replacebool, default False: Allow or disallow sampling of the same row more than once.
weightslist-like, optional: Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying WeightedDataFrame or WeightedSeries object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.
random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional: If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given. Default None results in sampling with the current state of np.random.

Returns:

WeightedSeries or WeightedDataFrame: A new object of same type as caller containing items randomly sampled within each group from the caller object.

Examples

>>> df = pd.WeightedDataFrame(
...     {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)}
... )
>>> df
       a  b
0    red  0
1    red  1
2   blue  2
3   blue  3
4  black  4
5  black  5

Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:

>>> df.groupby("a").sample(n=1, random_state=1)
       a  b
4  black  4
2   blue  2
1    red  1

Set frac to sample fixed proportions rather than counts:

>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2)
5    5
2    2
0    0
Name: b, dtype: int64

Control sample probabilities within groups by setting weights:

>>> df.groupby("a").sample(
...     n=1,
...     weights=[1, 1, 1, 0, 0, 1],
...     random_state=1,
... )
       a  b
5  black  5
2   blue  2
0    red  0

class anesthetic.weighted_pandas.WeightedGroupBy(*args, **kwargs)[source]

Weighted version of pandas.core.groupby.GroupBy.

get_weights()[source]: Return the weights of the grouped samples.

kurt(**kwargs)[source]

kurtosis(**kwargs)[source]

mean(**kwargs)[source]

Compute mean of groups, excluding missing values.

Parameters:

numeric_onlybool, default False

Include only float, int, boolean columns.

Changed in version 2.0.0: numeric_only no longer accepts None and defaults to False.

skipnabool, default True

Exclude NA/null values. If an entire group is NA, the result will be NA.

enginestr, default None

'cython' : Runs the operation through C-extensions from cython.
'numba' : Runs the operation through JIT compiled code from numba.
None : Defaults to 'cython' or globally setting compute.use_numba

engine_kwargsdict, default None

For 'cython' engine, there are no accepted engine_kwargs
For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False}

Returns:

pandas.WeightedSeries or pandas.WeightedDataFrame: Mean of values within each group. Same object type as the caller.

Examples

>>> df = pd.WeightedDataFrame(
...     {"A": [1, 1, 2, 1, 2], "B": [np.nan, 2, 3, 4, 5], "C": [1, 2, 1, 1, 2]},
...     columns=["A", "B", "C"],
... )

Groupby one column and return the mean of the remaining columns in each group.

>>> df.groupby("A").mean()
     B         C
A
1  3.0  1.333333
2  4.0  1.500000

Groupby two columns and return the mean of the remaining column.

>>> df.groupby(["A", "B"]).mean()
         C
A B
1 2.0  2.0
  4.0  1.0
2 3.0  1.0
  5.0  2.0

Groupby one column and return the mean of only particular column in the group.

>>> df.groupby("A")["B"].mean()
A
1    3.0
2    4.0
Name: B, dtype: float64

median(**kwargs)[source]

Compute median of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex

Parameters:

numeric_onlybool, default False: Include only float, int, boolean columns.

Changed in version 2.0.0: numeric_only no longer accepts None and defaults to False.
skipnabool, default True: Exclude NA/null values. If an entire group is NA, the result will be NA.

Added in version 3.0.0.

Returns:

WeightedSeries or WeightedDataFrame: Median of values within each group.

Examples

For WeightedSeriesGroupBy:

>>> lst = ["a", "a", "a", "b", "b", "b"]
>>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst)
>>> ser
a     7
a     2
a     8
b     4
b     3
b     3
dtype: int64
>>> ser.groupby(level=0).median()
a    7.0
b    3.0
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = {"a": [1, 3, 5, 7, 7, 8, 3], "b": [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.WeightedDataFrame(
...     data, index=["dog", "dog", "dog", "mouse", "mouse", "mouse", "mouse"]
... )
>>> df
         a  b
  dog    1  1
  dog    3  4
  dog    5  8
mouse    7  4
mouse    7  4
mouse    8  2
mouse    3  1
>>> df.groupby(level=0).median()
         a    b
dog    3.0  4.0
mouse  7.0  3.0

For Resampler:

>>> ser = pd.WeightedSeries(
...     [1, 2, 3, 3, 4, 5],
...     index=pd.DatetimeIndex(
...         [
...             "2023-01-01",
...             "2023-01-10",
...             "2023-01-15",
...             "2023-02-01",
...             "2023-02-10",
...             "2023-02-15",
...         ]
...     ),
... )
>>> ser.resample("MS").median()
2023-01-01    2.0
2023-02-01    4.0
Freq: MS, dtype: float64

quantile(*args, **kwargs)[source]

Return group values at the given quantile, a la numpy.percentile.

Parameters:

qfloat or array-like, default 0.5 (50% quantile): Value(s) between 0 and 1 providing the quantile(s) to compute.
interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}: Method to use when the desired quantile falls between two points.
numeric_onlybool, default False: Include only float, int or boolean data.

Changed in version 2.0.0: numeric_only now defaults to False.

Returns:

WeightedSeries or WeightedDataFrame: Return type determined by caller of GroupBy object.

Examples

>>> df = pd.WeightedDataFrame(
...     [["a", 1], ["a", 2], ["a", 3], ["b", 1], ["b", 3], ["b", 5]],
...     columns=["key", "val"],
... )
>>> df.groupby("key").quantile()
    val
key
a    2.0
b    3.0

sem(**kwargs)[source]

Compute standard error of the mean of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:

ddofint, default 1: Degrees of freedom.
numeric_onlybool, default False: Include only float, int or boolean data.

Changed in version 2.0.0: numeric_only now defaults to False.
skipnabool, default True: Exclude NA/null values. If an entire group is NA, the result will be NA.

Added in version 3.0.0.

Returns:

WeightedSeries or WeightedDataFrame: Standard error of the mean of values within each group.

Examples

For WeightedSeriesGroupBy:

>>> lst = ["a", "a", "b", "b"]
>>> ser = pd.WeightedSeries([5, 10, 8, 14], index=lst)
>>> ser
a     5
a    10
b     8
b    14
dtype: int64
>>> ser.groupby(level=0).sem()
a    2.5
b    3.0
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = [[1, 12, 11], [1, 15, 2], [2, 5, 8], [2, 6, 12]]
>>> df = pd.WeightedDataFrame(
...     data,
...     columns=["a", "b", "c"],
...     index=["tuna", "salmon", "catfish", "goldfish"],
... )
>>> df
           a   b   c
    tuna   1  12  11
  salmon   1  15   2
 catfish   2   5   8
goldfish   2   6  12
>>> df.groupby("a").sem()
      b  c
a
1    1.5  4.5
2    0.5  2.0

For Resampler:

>>> ser = pd.WeightedSeries(
...     [1, 3, 2, 4, 3, 8],
...     index=pd.DatetimeIndex(
...         [
...             "2023-01-01",
...             "2023-01-10",
...             "2023-01-15",
...             "2023-02-01",
...             "2023-02-10",
...             "2023-02-15",
...         ]
...     ),
... )
>>> ser.resample("MS").sem()
2023-01-01    0.577350
2023-02-01    1.527525
Freq: MS, dtype: float64

skew(**kwargs)[source]

std(**kwargs)[source]

Compute standard deviation of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

enginestr, default None

'cython' : Runs the operation through C-extensions from cython.
'numba' : Runs the operation through JIT compiled code from numba.
None : Defaults to 'cython' or globally setting compute.use_numba

engine_kwargsdict, default None

For 'cython' engine, there are no accepted engine_kwargs
For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False}

numeric_onlybool, default False

Include only float, int or boolean data.

Changed in version 2.0.0: numeric_only now defaults to False.

skipnabool, default True

Exclude NA/null values. If an entire group is NA, the result will be NA.

Added in version 3.0.0.

Returns:

WeightedSeries or WeightedDataFrame: Standard deviation of values within each group.

Examples

For WeightedSeriesGroupBy:

>>> lst = ["a", "a", "a", "b", "b", "b"]
>>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst)
>>> ser
a     7
a     2
a     8
b     4
b     3
b     3
dtype: int64
>>> ser.groupby(level=0).std()
a    3.21455
b    0.57735
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = {"a": [1, 3, 5, 7, 7, 8, 3], "b": [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.WeightedDataFrame(
...     data, index=["dog", "dog", "dog", "mouse", "mouse", "mouse", "mouse"]
... )
>>> df
         a  b
  dog    1  1
  dog    3  4
  dog    5  8
mouse    7  4
mouse    7  4
mouse    8  2
mouse    3  1
>>> df.groupby(level=0).std()
              a         b
dog    2.000000  3.511885
mouse  2.217356  1.500000

var(**kwargs)[source]

Compute variance of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:

ddofint, default 1

Degrees of freedom.

enginestr, default None

'cython' : Runs the operation through C-extensions from cython.
'numba' : Runs the operation through JIT compiled code from numba.
None : Defaults to 'cython' or globally setting compute.use_numba

engine_kwargsdict, default None

For 'cython' engine, there are no accepted engine_kwargs
For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False}

numeric_onlybool, default False

Include only float, int or boolean data.

Changed in version 2.0.0: numeric_only now defaults to False.

skipnabool, default True

Exclude NA/null values. If an entire group is NA, the result will be NA.

Added in version 3.0.0.

Returns:

WeightedSeries or WeightedDataFrame: Variance of values within each group.

Examples

For WeightedSeriesGroupBy:

>>> lst = ["a", "a", "a", "b", "b", "b"]
>>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst)
>>> ser
a     7
a     2
a     8
b     4
b     3
b     3
dtype: int64
>>> ser.groupby(level=0).var()
a    10.333333
b     0.333333
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = {"a": [1, 3, 5, 7, 7, 8, 3], "b": [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.WeightedDataFrame(
...     data, index=["dog", "dog", "dog", "mouse", "mouse", "mouse", "mouse"]
... )
>>> df
         a  b
  dog    1  1
  dog    3  4
  dog    5  8
mouse    7  4
mouse    7  4
mouse    8  2
mouse    3  1
>>> df.groupby(level=0).var()
              a          b
dog    4.000000  12.333333
mouse  4.916667   2.250000

class anesthetic.weighted_pandas.WeightedSeries(*args, **kwargs)[source]

Weighted version of pandas.Series.

compress(ncompress=True, weighted=False)[source]

Reduce the number of samples by discarding low-weights.

Parameters:

ncompressint, float, str, default=True

Degree of compression.

If True (default): reduce to the channel capacity (theoretical optimum compression), equivalent to ncompress='entropy'.
If > 0: desired number of samples after compression.
If <= 0: compress so that all remaining weights are unity.
If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

weightedbool, default=False

If False (default), return an unweighted object with potentially repeated samples. If True, return a weighted object with non-zero compressed weights.

corr(other, **kwargs)[source]

Compute correlation with other WeightedSeries, excluding missing values.

The two WeightedSeries objects are not required to be the same length and will be aligned internally before the correlation function is applied.

Parameters:

otherWeightedSeries

WeightedSeries with which to compute the correlation.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method used to compute correlation:

pearson : Standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: Callable with input two 1d ndarrays and returning a float.

Warning

Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional

Minimum number of observations needed to have a valid result.

Returns:

float: Correlation with other.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> s1 = pd.WeightedSeries([0.2, 0.0, 0.6, 0.2])
>>> s2 = pd.WeightedSeries([0.3, 0.6, 0.0, 0.1])
>>> s1.corr(s2, method=histogram_intersection)
0.3

Pandas auto-aligns the values with matching indices

>>> s1 = pd.WeightedSeries([1, 2, 3], index=[0, 1, 2])
>>> s2 = pd.WeightedSeries([1, 2, 3], index=[2, 1, 0])
>>> s1.corr(s2)
-1.0

If the input is a constant array, the correlation is not defined in this case, and np.nan is returned.

>>> s1 = pd.WeightedSeries([0.45, 0.45])
>>> s1.corr(s1)
nan

cov(other, ddof=1, **kwargs)[source]

Compute covariance with WeightedSeries, excluding missing values.

The two WeightedSeries objects are not required to be the same length and will be aligned internally before the covariance is calculated.

Parameters:

otherWeightedSeries: WeightedSeries with which to compute the covariance.
min_periodsint, optional: Minimum number of observations needed to have a valid result.
ddofint, default 1: Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns:

float: Covariance between WeightedSeries and other normalized by N-1 (unbiased estimator).

Examples

>>> s1 = pd.WeightedSeries([0.90010907, 0.13484424, 0.62036035])
>>> s2 = pd.WeightedSeries([0.12528585, 0.26962463, 0.51111198])
>>> s1.cov(s2)
-0.01685762652715874

credibility_interval(level=0.68, method='iso-pdf', return_covariance=False, nsamples=12)[source]

Compute the credibility interval of the weighted samples.

Based on linear interpolation of the cumulative density function, thus expect discretisation errors on the scale of distances between samples.

https://github.com/Stefan-Heimersheim/fastCI#readme

Parameters:

levelfloat, default=0.68

Credibility level (probability, <1).

methodstr, default=’iso-pdf’

Which definition of interval to use:

'iso-pdf': Calculate iso probability density interval with the same probability density at each end. Also known as waterline-interval or highest average posterior density interval. This is only accurate if the distribution is sufficiently uni-modal.
'lower-limit'/'upper-limit': Lower/upper limit. One-sided limits for which level fraction of the (equally weighted) samples lie above/below the limit.
'equal-tailed': Equal-tailed interval with the same fraction of (equally weighted) samples below and above the interval region.

return_covariance: bool, default=False

Return the covariance of the sampled limits, in addition to the mean

nsamplesint, default=12

Number of CDF samples to improve mean and std estimate.

Returns:

limit(s)float, array, or tuple of floats or arrays: Returns the credibility interval boundaries of the WeightedSeries. By default, returns the mean over nsamples samples, which is either two numbers (method='iso-pdf'/'equal-tailed') or one number (method='lower-limit'/'upper-limit'). If return_covariance=True, returns a tuple (mean(s), covariance) where covariance is the covariance over the sampled limits.

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]

Group WeightedSeries using a mapper or by a WeightedSeries of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:

bymapping, function, label, pd.Grouper or list of such: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or WeightedSeries is passed, the WeightedSeries or dict VALUES will be used to determine the groups (the WeightedSeries’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
levelint, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.
as_indexbool, default True: Return object with group labels as the index. Only relevant for WeightedDataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).
sortbool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original WeightedDataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.
group_keysbool, default True: When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 2.0.0: group_keys now defaults to True.
observedbool, default True: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Changed in version 3.0.0: The default value is now True.
dropnabool, default True: If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

pandas.api.typing.WeightedSeriesGroupBy: Returns a groupby object that contains information about the groups.

Examples

>>> ser = pd.WeightedSeries([390., 350., 30., 20.],
...                 index=['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...                 name="Max Speed")
>>> ser
Falcon    390.0
Falcon    350.0
Parrot     30.0
Parrot     20.0
Name: Max Speed, dtype: float64

We can pass a list of values to group the WeightedSeries data by custom labels:

>>> ser.groupby(["a", "b", "a", "b"]).mean()
a    210.0
b    185.0
Name: Max Speed, dtype: float64

Grouping by numeric labels yields similar results:

>>> ser.groupby([0, 1, 0, 1]).mean()
0    210.0
1    185.0
Name: Max Speed, dtype: float64

We can group by a level of the index:

>>> ser.groupby(level=0).mean()
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64

We can group by a condition applied to the WeightedSeries values:

>>> ser.groupby(ser > 100).mean()
Max Speed
False     25.0
True     370.0
Name: Max Speed, dtype: float64

Grouping by Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> ser = pd.WeightedSeries([390., 350., 30., 20.], index=index, name="Max Speed")
>>> ser
Animal  Type
Falcon  Captive    390.0
        Wild       350.0
Parrot  Captive     30.0
        Wild        20.0
Name: Max Speed, dtype: float64

>>> ser.groupby(level=0).mean()
Animal
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64

We can also group by the ‘Type’ level of the hierarchical index to get the mean speed for each type:

>>> ser.groupby(level="Type").mean()
Type
Captive    210.0
Wild       185.0
Name: Max Speed, dtype: float64

We can also choose to include NA in group keys or not by defining dropna parameter, the default setting is True.

>>> ser = pd.WeightedSeries([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
>>> ser.groupby(level=0).sum()
a    3
b    3
dtype: int64

To include NA values in the group keys, set dropna=False:

>>> ser.groupby(level=0, dropna=False).sum()
a    3
b    3
NaN  3
dtype: int64

We can also group by a custom list with NaN values to handle missing group labels:

>>> arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot']
>>> ser = pd.WeightedSeries([390., 350., 30., 20.], index=arrays, name="Max Speed")
>>> ser.groupby(["a", "b", "a", np.nan]).mean()
a    210.0
b    350.0
Name: Max Speed, dtype: float64

>>> ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()
a    210.0
b    350.0
NaN   20.0
Name: Max Speed, dtype: float64

kurt(skipna=True, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

scalar: Unbiased kurtosis.

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=["cat", "dog", "dog", "mouse"])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

kurtosis(**kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

scalar: Unbiased kurtosis.

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=["cat", "dog", "dog", "mouse"])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

mean(skipna=True)[source]

Return the mean of the values over the requested axis.

Parameters:

axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

scalar or WeightedSeries (if level specified): Mean of the values for the requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.mean()
2.0

median(**kwargs)[source]

Return the median of the values over the requested axis.

Parameters:

axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:

scalar or WeightedSeries (if level specified): Median of the values for the requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.median()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": [2, 3]}, index=["tiger", "zebra"])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({"a": [1, 2], "b": ["T", "Z"]}, index=["tiger", "zebra"])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64

quantile(q=0.5, interpolation='linear')[source]

Return value at the given quantile.

Parameters:

qfloat or array-like, default 0.5 (50% quantile)

The quantile(s) to compute, which can lie in range: 0 <= q <= 1.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

linear: i + (j - i) * (x-i)/(j-i), where (x-i)/(j-i) is the fractional part of the index surrounded by i > j.

lower: i.

higher: j.

nearest: i or j whichever is nearest.

midpoint: (i + j) / 2.

Returns:

float or WeightedSeries: If q is an array, a WeightedSeries will be returned where the index is q and the values are the quantiles, otherwise a float will be returned.

Examples

>>> s = pd.WeightedSeries([1, 2, 3, 4])
>>> s.quantile(0.5)
2.5
>>> s.quantile([0.25, 0.5, 0.75])
0.25    1.75
0.50    2.50
0.75    3.25
dtype: float64

sample(*args, **kwargs)[source]

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:

nint, optional: Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
fracfloat, optional: Fraction of axis items to return. Cannot be used with n.
replacebool, default False: Allow or disallow sampling of the same row more than once.
weightsstr or ndarray-like, optional: Default None results in equal probability weighting. If passed a WeightedSeries, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a WeightedDataFrame, will accept the name of a column when axis = 0. Unless weights are a WeightedSeries, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed. When replace = False will not allow (n * max(weights) / sum(weights)) > 1 in order to avoid biased results. See the Notes below for more details.
random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional: If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given. Default None results in sampling with the current state of np.random.
axis{0 or ‘index’, 1 or ‘columns’, None}, default None: Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For WeightedSeries this parameter is unused and defaults to None.
ignore_indexbool, default False: If True, the resulting index will be labeled 0, 1, …, n - 1.

Returns:

WeightedSeries or WeightedDataFrame: A new object of same type as caller containing n items randomly sampled from the caller object.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "num_legs": [2, 4, 8, 0],
...         "num_wings": [2, 0, 0, 0],
...         "num_specimen_seen": [10, 2, 1, 8],
...     },
...     index=["falcon", "dog", "spider", "fish"],
... )
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the WeightedSeries df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df["num_legs"].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the WeightedDataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the WeightedDataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a WeightedDataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights="num_specimen_seen", random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8

sem(skipna=True, ddof=1, **kwargs)[source]

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:

axis{index (0)}: This parameter is unused and defaults to 0.
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default False: Include only float, int, boolean columns. Not implemented for WeightedSeries.
**kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

scalar or WeightedSeries (if level specified): Unbiased standard error of the mean over requested axis.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> round(s.sem(), 6)
0.57735

skew(skipna=True, **kwargs)[source]

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:

axis{index (0)}: This parameter is unused and defaults to 0.
skipnabool, default True: Exclude NA/null values when computing the result.
numeric_onlybool, default False: Unused.
**kwargs: Additional keyword arguments to be passed to the function.

Returns:

scalar: Unbiased skew of the WeightedSeries.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.skew()
0.0

std(skipna=True, **kwargs)[source]

Return sample standard deviation.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis{index (0)}: This parameter is unused and defaults to 0.
skipnabool, default True: Exclude NA/null values. If WeightedSeries is NA, the result will be NA.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default False: Not implemented for WeightedSeries.
**kwargs: Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

scalar: Standard deviation over all values in the WeightedSeries.

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.std()
1.0

Alternatively, ddof=0 can be set to normalize by $N$ instead of $N-1$:

>>> s.std(ddof=0)
0.816496580927726

var(skipna=True, **kwargs)[source]

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis{index (0)}: For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipnabool, default True: Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddofint, default 1: Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_onlybool, default False: Include only float, int, boolean columns. Not implemented for WeightedSeries.
**kwargs: Additional keywords passed.

Returns:

scalar or WeightedSeries (if level specified): Unbiased variance over requested axis.

Examples

>>> df = pd.WeightedDataFrame(
...     {
...         "person_id": [0, 1, 2, 3],
...         "age": [21, 25, 62, 43],
...         "height": [1.61, 1.87, 1.49, 2.01],
...     }
... ).set_index("person_id")
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

>>> df.var()
age       352.916667
height      0.056367
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.var(ddof=0)
age       264.687500
height      0.042275
dtype: float64

class anesthetic.weighted_pandas.WeightedSeriesGroupBy(*args, **kwargs)[source]

Weighted version of pandas.core.groupby.SeriesGroupBy.

cov(*args, **kwargs)[source]

Compute covariance between each group and another WeightedSeries.

Parameters:

otherWeightedSeries: WeightedSeries to compute covariance with.
min_periodsint, optional: Minimum number of observations required per pair of columns to have a valid result.
ddofint, optional: Delta degrees of freedom for variance calculation.

Returns:

WeightedSeries: Covariance value for each group.

Examples

>>> s = pd.WeightedSeries([1, 2, 3, 4], index=[0, 0, 1, 1])
>>> g = s.groupby([0, 0, 1, 1])
>>> g.cov()

sample(*args, **kwargs)[source]

Return a random sample of items from each group.

You can use random_state for reproducibility.

Parameters:

nint, optional: Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.
fracfloat, optional: Fraction of items to return. Cannot be used with n.
replacebool, default False: Allow or disallow sampling of the same row more than once.
weightslist-like, optional: Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying WeightedDataFrame or WeightedSeries object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.
random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional: If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given. Default None results in sampling with the current state of np.random.

Returns:

WeightedSeries or WeightedDataFrame: A new object of same type as caller containing items randomly sampled within each group from the caller object.

Examples

>>> df = pd.WeightedDataFrame(
...     {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)}
... )
>>> df
       a  b
0    red  0
1    red  1
2   blue  2
3   blue  3
4  black  4
5  black  5

Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:

>>> df.groupby("a").sample(n=1, random_state=1)
       a  b
4  black  4
2   blue  2
1    red  1

Set frac to sample fixed proportions rather than counts:

>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2)
5    5
2    2
0    0
Name: b, dtype: int64

Control sample probabilities within groups by setting weights:

>>> df.groupby("a").sample(
...     n=1,
...     weights=[1, 1, 1, 0, 0, 1],
...     random_state=1,
... )
       a  b
5  black  5
2   blue  2
0    red  0

class anesthetic.weighted_pandas._WeightedObject(*args, **kwargs)[source]

Common methods for WeightedSeries and WeightedDataFrame.

drop_weights(axis=0)[source]: Drop weights.

get_weights(axis=0)[source]: Retrieve sample weights from an axis.

isweighted(axis=0)[source]: Determine if weights are actually present.

neff(axis=0, beta=1)[source]: Effective number of samples.

reset_index(level=None, drop=False, inplace=False, *args, **kwargs)[source]: Reset the index, retaining weights.

set_weights(weights, axis=0, inplace=False, level=None)[source]

Set sample weights along an axis.

Parameters:

weights1d array-like: The sample weights to put in an index.
axisint (0,1), default=0: Whether to put weights in an index or column.
inplacebool, default=False: Whether to operate inplace, or return a new array.
levelint: Which level in the index to insert before. Defaults to inserting at back

anesthetic.weighted_pandas.cls: alias of WeightedSeriesGroupBy

anesthetic.weighted_pandas.read_csv(filename, *args, **kwargs)[source]: Read a CSV file into a WeightedDataFrame.