anesthetic package

Anesthetic: nested sampling post-processing.

anesthetic subpackages

anesthetic modules

anesthetic.boundary module

Boundary correction utilities.

anesthetic.boundary.cut_and_normalise_gaussian(x, p, bw, xmin=None, xmax=None)[source]

Cut and normalise boundary correction for a Gaussian kernel.

Parameters:
xarray-like

locations for normalisation correction

parray-like

probability densities for normalisation correction

bwfloat

bandwidth of KDE

xmin, xmaxfloat

lower/upper prior bound optional, default None

Returns:
pnp.array

corrected probabilities

anesthetic.convert module

Tools for converting to other outputs.

anesthetic.convert.to_getdist(samples)[source]

Convert from anesthetic to getdist samples.

Parameters:
samplesanesthetic.samples.Samples

anesthetic samples to be converted

Returns:
getdist_samplesgetdist.mcsamples.MCSamples

getdist equivalent samples

anesthetic.kde module

Kernel density estimation tools.

These act as a wrapper around fastKDE, but could be replaced in future by alternative kernel density estimators

anesthetic.kde.fastkde_1d(d, xmin=None, xmax=None)[source]

Perform a one-dimensional kernel density estimation.

Wrapper around fastkde.fastKDE. Boundary corrections implemented by reflecting boundary conditions.

Parameters:
dnp.array

Data to perform kde on

xmin, xmaxfloat

lower/upper prior bounds optional, default None

Returns:
xnp.array

x-coordinates of kernel density estimates

pnp.array

kernel density estimates

anesthetic.kde.fastkde_2d(d_x, d_y, xmin=None, xmax=None, ymin=None, ymax=None)[source]

Perform a two-dimensional kernel density estimation.

Wrapper round fastkde.fastKDE. Boundary corrections implemented by reflecting boundary conditions.

Parameters:
d_x, d_ynp.array

x/y coordinates of data to perform kde on

xmin, xmax, ymin, ymaxfloat

lower/upper prior bounds in x/y coordinates optional, default None

Returns:
x, ynp.array

x/y-coordinates of kernel density estimates. One-dimensional array

pnp.array

kernel density estimates. Two-dimensional array

anesthetic.labelled_pandas module

Pandas DataFrame and Series with labelled columns.

class anesthetic.labelled_pandas.LabelledDataFrame(*args, **kwargs)[source]

Bases: _LabelledObject, DataFrame

Labelled version of pandas.DataFrame.

property T

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Parameters:
*argstuple, optional

Accepted for compatibility with NumPy.

copybool, default False

Whether to copy the data after transposing, even for DataFrames with a single dtype.

Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
DataFrame

The transposed DataFrame.

See also

numpy.transpose

Permute the dimensions of a given array.

Notes

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Examples

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4
>>> df1_transposed = df1.T  # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0
>>> df2_transposed = df2.T  # or df2.transpose()
>>> df2_transposed
              0     1
name      Alice   Bob
score       9.5   8.0
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

>>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object
transpose(copy=False)[source]

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Parameters:
*argstuple, optional

Accepted for compatibility with NumPy.

copybool, default False

Whether to copy the data after transposing, even for DataFrames with a single dtype.

Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
DataFrame

The transposed DataFrame.

See also

numpy.transpose

Permute the dimensions of a given array.

Notes

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Examples

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4
>>> df1_transposed = df1.T  # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0
>>> df2_transposed = df2.T  # or df2.transpose()
>>> df2_transposed
              0     1
name      Alice   Bob
score       9.5   8.0
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

>>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object
class anesthetic.labelled_pandas.LabelledSeries(*args, **kwargs)[source]

Bases: _LabelledObject, Series

Labelled version of pandas.Series.

class anesthetic.labelled_pandas._LabelledObject(*args, **kwargs)[source]

Bases: object

Common methods for LabelledSeries and LabelledDataFrame.

property at
drop_labels(axis=0)[source]

Drop the labels from an axis if present.

get_label(param, axis=0)[source]

Retrieve mapping from paramnames to labels from an axis.

get_labels(axis=0)[source]

Retrieve labels from an axis.

get_labels_map(axis=0, fill=True)[source]

Retrieve mapping from paramnames to labels from an axis.

islabelled(axis=0)[source]

Search for existence of labels.

property loc
reset_index(level=None, drop=False, inplace=False, *args, **kwargs)[source]
set_label(param, value, axis=0, inplace=False)[source]

Set a specific label to a specific value on an axis.

set_labels(labels, axis=0, inplace=False, level=None)[source]

Set labels along an axis.

xs(key, axis=0, level=None, drop_level=True)[source]
anesthetic.labelled_pandas.ac(funcs, *args)[source]

Accessor function helper.

Given a list of callables funcs, and their arguments *args, evaluate each of these, catching exceptions, and then sort results by their dimensionality, smallest first. Return the non-exceptional result with the smallest dimensionality.

anesthetic.labelled_pandas.read_csv(filename, *args, **kwargs)[source]

Read a CSV file into a LabelledDataFrame.

anesthetic.plot module

Lower-level plotting tools.

Routines that may be of use to users wishing for more fine-grained control may wish to use.

to create a set of axes and legend proxies.

class anesthetic.plot.AxesDataFrame(data=None, index=None, columns=None, fig=None, lower=True, diagonal=True, upper=True, labels=None, ticks='inner', logx=None, logy=None, gridspec_kw=None, subplot_spec=None, *args, **kwargs)[source]

Bases: DataFrame

Anesthetic’s axes version of pandas.DataFrame.

Parameters:
indexlist(str)

Parameters to be placed on the y-axes.

columnslist(str)

Parameters to be placed on the x-axes.

figmatplotlib.figure.Figure
lower, diagonal, upperbool, default=True

Whether to create 2D marginalised plots above or below the diagonal, or to create a 1D marginalised plot on the diagonal.

labelsdict(str:str), optional

Dictionary mapping params to plot labels. Default: params

ticksstr, default=’inner’

If ‘outer’, plot ticks only on the very left and very bottom. If ‘inner’, plot ticks also in inner subplots. If None, plot no ticks at all.

logx, logylist(str), optional

Lists of parameters to be plotted on a log scale on the x-axis or y-axis, respectively.

gridspec_kwdict, optional

Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.

subplot_specmatplotlib.gridspec.GridSpec, default=None

GridSpec instance to plot array as part of a subfigure.

Methods

axlines:

Add vertical and horizontal lines across all axes.

axspans:

Add vertical and horizontal spans across all axes.

scatter:

Add scatter points across all axes.

set_labels:

Set the labels for the axes.

set_margins:

Set margins across all axes.

tick_params:

Set tick parameters across all axes.

axlines(params, lower=True, diagonal=True, upper=True, **kwargs)[source]

Add vertical and horizontal lines across all axes.

Parameters:
paramsdict(array_like)

Dictionary of parameter labels and desired values. Can provide more than one value per label.

lower, diagonal, upperbool, default=True

Whether to plot the lines on the lower, diagonal, and/or upper triangle plots.

kwargs

Any kwarg that can be passed to matplotlib.axes.Axes.axvline() or matplotlib.axes.Axes.axhline().

axspans(params, lower=True, diagonal=True, upper=True, **kwargs)[source]

Add vertical and horizontal spans across all axes.

Parameters:
paramsdict(array_like(2-tuple))

Dictionary of parameter labels and desired value tuples. Can provide more than one value tuple per label. Each value tuple provides the min and max value for an axis span.

lower, diagonal, upperbool, default=True

Whether to plot the spans on the lower, diagonal, and/or upper triangle plots.

kwargs

Any kwarg that can be passed to matplotlib.axes.Axes.axvspan() or matplotlib.axes.Axes.axhspan().

scatter(params, lower=True, upper=True, **kwargs)[source]

Add scatter points across all axes.

Parameters:
paramsdict(array_like)

Dictionary of parameter labels and desired values. Can provide more than one value per label, but length has to match for all parameter labels.

lower, upperbool, default=True

Whether to plot the spans on the lower and/or upper triangle plots.

kwargs

Any kwarg that can be passed to matplotlib.axes.Axes.scatter().

set_labels(labels, **kwargs)[source]

Set the labels for the axes.

Parameters:
labelsdict

Dictionary of the axes labels.

kwargs

Any kwarg that can be passed to matplotlib.axes.Axes.set_xlabel() or matplotlib.axes.Axes.set_ylabel().

set_margins(m)[source]

Apply matplotlib.axes.Axes.set_xmargin() across all axes.

tick_params(*args, **kwargs)[source]

Apply matplotlib.axes.Axes.tick_params() across all axes.

class anesthetic.plot.AxesSeries(data=None, index=None, fig=None, ncol=None, labels=None, logx=None, gridspec_kw=None, subplot_spec=None, *args, **kwargs)[source]

Bases: Series

Anesthetic’s axes version of pandas.Series.

Parameters:
indexlist(str)

Parameters to be placed on the y-axes.

figmatplotlib.figure.Figure
ncolint

Number of axes columns. Decides after how many axes the AxesSeries is split to continue in a new row.

labelsdict(str:str), optional

Dictionary mapping params to plot labels. Default: params

logxlist(str), optional

List of parameters to be plotted on a log scale.

gridspec_kwdict, optional

Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.

subplot_specmatplotlib.gridspec.GridSpec, default=None

GridSpec instance to plot array as part of a subfigure.

Methods

set_xlabels:

Set the labels for the x-axes.

tick_params:

Set tick parameters across all axes.

static axes_series(index, fig, ncol=None, gridspec_kw=None, subplot_spec=None)[source]

Set up subplots for AxesSeries.

set_xlabels(labels, **kwargs)[source]

Set the labels for the x-axes.

Parameters:
labelsdict

Dictionary of the axes labels.

kwargs

Any kwarg that can be passed to matplotlib.axes.Axes.set_xlabel().

tick_params(*args, **kwargs)[source]

Apply matplotlib.axes.Axes.tick_params() across all axes.

anesthetic.plot.basic_cmap(color)[source]

Construct basic colormap a single color.

anesthetic.plot.fastkde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot a 2d marginalised distribution as contours.

This functions as a wrapper around matplotlib.axes.Axes.contour(), and matplotlib.axes.Axes.contourf() with a kernel density estimation (KDE) computation in-between. All remaining keyword arguments are passed onwards to both functions.

Parameters:
axmatplotlib.axes.Axes

Axis object to plot on.

data_x, data_ynp.array

The x and y coordinates of uniformly weighted samples to generate kernel density estimator.

levelslist

Amount of mass within each iso-probability contour. Has to be ordered from outermost to innermost contour. Default: [0.95, 0.68]

xmin, xmax, ymin, ymaxfloat, default=None

The lower/upper prior bounds in x/y coordinates.

Returns:
cmatplotlib.contour.QuadContourSet

A set of contourlines or filled regions.

anesthetic.plot.fastkde_plot_1d(ax, data, *args, **kwargs)[source]

Plot a 1d marginalised distribution.

This functions as a wrapper around matplotlib.axes.Axes.plot(), with a kernel density estimation (KDE) computation provided by the package fastkde in-between. All remaining keyword arguments are passed onwards.

Parameters:
axmatplotlib.axes.Axes

Axis object to plot on.

datanp.array

Uniformly weighted samples to generate kernel density estimator.

xmin, xmaxfloat, default=None

lower/upper prior bound

levelslist

Values at which to draw iso-probability lines. Optional, Default: [0.95, 0.68]

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

  • 0: full data range, i.e. q=0 –> quantile range (0, 1)

  • int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)

  • float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)

  • tuple: quantile range, e.g. (0.16, 0.84)

facecolorbool or string, default=False

If set to True then the 1d plot will be shaded with the value of the color kwarg. Set to a string such as ‘blue’, ‘k’, ‘r’, ‘C1’ ect. to define the color of the shading directly.

Returns:
linesmatplotlib.lines.Line2D

A list of line objects representing the plotted data (same as matplotlib.axes.Axes.plot() command).

anesthetic.plot.hist_plot_1d(ax, data, *args, **kwargs)[source]

Plot a 1d histogram.

This functions is a wrapper around matplotlib.axes.Axes.hist(). All remaining keyword arguments are passed onwards.

Parameters:
axmatplotlib.axes.Axes

Axis object to plot on.

datanp.array

Samples to generate histogram from

weightsnp.array, optional

Sample weights.

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

  • 0: full data range, i.e. q=0 –> quantile range (0, 1)

  • int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)

  • float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)

  • tuple: quantile range, e.g. (0.16, 0.84)

Returns:
patcheslist or list of lists

Silent list of individual patches used to create the histogram or list of such list if multiple input datasets.

Other Parameters:
**kwargsmatplotlib.axes.Axes.hist() properties
anesthetic.plot.hist_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot a 2d marginalised distribution as a histogram.

This functions as a wrapper around matplotlib.axes.Axes.hist2d().

Parameters:
axmatplotlib.axes.Axes

Axis object to plot on.

data_x, data_ynp.array

The x and y coordinates of uniformly weighted samples to generate a two-dimensional histogram.

levelslist, default=None

Shade iso-probability contours containing these levels of probability mass. If None defaults to usual matplotlib.axes.Axes.hist2d() colouring.

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

  • 0: full data range, i.e. q=0 –> quantile range (0, 1)

  • int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)

  • float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)

  • tuple: quantile range, e.g. (0.16, 0.84)

Returns:
cmatplotlib.collections.QuadMesh

A set of colors.

anesthetic.plot.kde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot a 2d marginalised distribution as contours.

This functions as a wrapper around matplotlib.axes.Axes.contour() and matplotlib.axes.Axes.contourf() with a kernel density estimation (KDE) computation provided by scipy.stats.gaussian_kde in-between. All remaining keyword arguments are passed onwards to both functions.

Parameters:
axmatplotlib.axes.Axes

Axis object to plot on.

data_x, data_ynp.array

The x and y coordinates of uniformly weighted samples to generate kernel density estimator.

weightsnp.array, optional

Sample weights.

levelslist, optional

Amount of mass within each iso-probability contour. Has to be ordered from outermost to innermost contour. Default: [0.95, 0.68]

ncompressint, str, default=’equal’

Degree of compression.

  • If int: desired number of samples after compression.

  • If False: no compression.

  • If True: compresses to the channel capacity, equivalent to ncompress='entropy'.

  • If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

nplot_2dint, default=1000

Number of plotting points to use.

bw_methodstr, scalar or callable, optional

Forwarded to scipy.stats.gaussian_kde.

Returns:
cmatplotlib.contour.QuadContourSet

A set of contourlines or filled regions.

anesthetic.plot.kde_plot_1d(ax, data, *args, **kwargs)[source]

Plot a 1d marginalised distribution.

This functions as a wrapper around matplotlib.axes.Axes.plot(), with a kernel density estimation computation provided by scipy.stats.gaussian_kde in-between. All remaining keyword arguments are passed onwards.

Parameters:
axmatplotlib.axes.Axes

Axis object to plot on.

datanp.array

Samples to generate kernel density estimator.

weightsnp.array, optional

Sample weights.

ncompressint, str, default=False

Degree of compression.

  • If False: no compression.

  • If True: compresses to the channel capacity, equivalent to ncompress='entropy'.

  • If int: desired number of samples after compression.

  • If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

nplot_1dint, default=100

Number of plotting points to use.

levelslist

Values at which to draw iso-probability lines. Default: [0.95, 0.68]

qint or float or tuple, default=5

Quantile to determine the data range to be plotted.

  • 0: full data range, i.e. q=0 –> quantile range (0, 1)

  • int: q-sigma range, e.g. q=1 –> quantile range (0.16, 0.84)

  • float: percentile, e.g. q=0.8 –> quantile range (0.1, 0.9)

  • tuple: quantile range, e.g. (0.16, 0.84)

facecolorbool or string, default=False

If set to True then the 1d plot will be shaded with the value of the color kwarg. Set to a string such as ‘blue’, ‘k’, ‘r’, ‘C1’ ect. to define the color of the shading directly.

bw_methodstr, scalar or callable, optional

Forwarded to scipy.stats.gaussian_kde.

betaint, float, default = 1

The value of beta used to calculate the number of effective samples

Returns:
linesmatplotlib.lines.Line2D

A list of line objects representing the plotted data (same as matplotlib.axes.Axes.plot() command).

anesthetic.plot.make_1d_axes(params, ncol=None, labels=None, logx=None, gridspec_kw=None, subplot_spec=None, **fig_kw)[source]

Create a set of axes for plotting 1D marginalised posteriors.

Parameters:
paramslist(str)

names of parameters.

ncolint

Number of columns of the subplot grid. Default: ceil(sqrt(num_params))

labelsdict(str:str), optional

Dictionary mapping params to plot labels. Default: params

logxlist(str), optional

List of parameters to be plotted on a log scale.

gridspec_kwdict, optional

Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.

subplot_specmatplotlib.gridspec.GridSpec, default=None

GridSpec instance to plot array as part of a subfigure.

**fig_kw

All additional keyword arguments are passed to the matplotlib.pyplot.figure() call. Or directly pass the figure to plot on via the keyword ‘fig’.

Returns:
figmatplotlib.figure.Figure

New or original (if supplied) figure object.

axesanesthetic.plot.AxesSeries

Pandas array of axes objects.

anesthetic.plot.make_2d_axes(params, labels=None, lower=True, diagonal=True, upper=True, ticks='inner', logx=None, logy=None, gridspec_kw=None, subplot_spec=None, **fig_kw)[source]

Create a set of axes for plotting 2D marginalised posteriors.

Parameters:
paramslists of parameters

Can be either:

  • list(str) if the x and y axes are the same

  • [list(str), list(str)] if the x and y axes are different

Strings indicate the names of the parameters.

labelsdict(str:str), optional

Dictionary mapping params to plot labels. Default: params

lower, diagonal, upperlogical, default=True

Whether to create 2D marginalised plots above or below the diagonal, or to create a 1D marginalised plot on the diagonal.

ticksstr, default=’inner’

Can be one of ‘outer’, ‘inner’, or None.

  • 'outer': plot ticks only on the very left and very bottom.

  • 'inner': plot ticks also in inner subplots.

  • None: plot no ticks at all.

logx, logylist(str), optional

Lists of parameters to be plotted on a log scale on the x-axis or y-axis, respectively.

gridspec_kwdict, optional

Dict with keywords passed to the matplotlib.gridspec.GridSpec constructor used to create the grid the subplots are placed on.

subplot_specmatplotlib.gridspec.GridSpec, default=None

GridSpec instance to plot array as part of a subfigure.

**fig_kw

All additional keyword arguments are passed to the matplotlib.pyplot.figure() call. Or directly pass the figure to plot on via the keyword ‘fig’.

Returns:
figmatplotlib.figure.Figure

New or original (if supplied) figure object.

axesanesthetic.plot.AxesDataFrame

Pandas array of axes objects.

anesthetic.plot.normalize_kwargs(kwargs, alias_mapping=None, drop=None)[source]

Normalize kwarg inputs.

Works the same way as matplotlib.cbook.normalize_kwargs(), but additionally allows to drop kwargs.

anesthetic.plot.quantile_plot_interval(q)[source]

Interpret quantile q input to quantile plot range tuple.

anesthetic.plot.scatter_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]

Plot samples from a 2d marginalised distribution.

This functions as a wrapper around matplotlib.axes.Axes.plot(), enforcing any prior bounds. All remaining keyword arguments are passed onwards.

Parameters:
axmatplotlib.axes.Axes

axis object to plot on

data_x, data_ynp.array

x and y coordinates of uniformly weighted samples to plot.

ncompressint, str, default=’equal’

Degree of compression.

  • If int: desired number of samples after compression.

  • If False: no compression.

  • If True: compresses to the channel capacity, equivalent to ncompress='entropy'.

  • If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

Returns:
linesmatplotlib.lines.Line2D

A list of line objects representing the plotted data (same as matplotlib.axes.Axes.plot() command).

anesthetic.plot.set_colors(c, fc, ec, cmap)[source]

Navigate interplay between possible color inputs {c, fc, ec, cmap}.

anesthetic.samples module

Main classes for the anesthetic module.

class anesthetic.samples.MCMCSamples(*args, **kwargs)[source]

Storage and plotting tools for MCMC samples.

Any new functionality specific to MCMC (e.g. convergence criteria etc.) should be put here.

Parameters:
datanp.array

Coordinates of samples. shape = (nsamples, ndims).

columnsarray-like

reference names of parameters

weightsnp.array

weights of samples.

logLnp.array

loglikelihoods of samples.

labelsdict or array-like

mapping from columns to plotting labels

labelstr

Legend label

logzerofloat, default=-1e30

The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf.

Gelman_Rubin(params=None, per_param=False)[source]

Gelman–Rubin convergence statistic of multiple MCMC chains.

Determine the Gelman–Rubin convergence statistic R-1 by computing and comparing the within-chain variance and the between-chain variance. This follows the routine as outlined in Lewis (2013), section IV.A.

Note that this requires more than one chain. To circumvent this, you could overwrite the 'chain' column, splitting the samples into two or more sets.

Parameters:
paramslist(str)

List of column names (i.e. parameters) to be included in the convergence calculation. Default: all parameters (except those parameters that contain ‘prior’, ‘chi2’, or ‘logL’ in their names)

per_parambool or str, default=False

Whether to return the per-parameter convergence statistic R-1.

  • If False: returns only the total convergence statistic.

  • If True: returns the total convergence statistic and the per-parameter convergence statistic.

  • If 'par': returns only the per-parameter convergence statistic.

  • If 'cov': returns only the per-parameter covariant convergence statistic.

  • If 'all': returns the total convergence statistic and the per-parameter covariant convergence statistic.

Returns:
Rminus1float

Total Gelman–Rubin convergence statistic R-1. The smaller, the better converged. Aiming for Rminus1~0.01 should normally work well.

Rminus1_parpandas.DataFrame

Per-parameter Gelman–Rubin convergence statistic.

Rminus1_covpandas.DataFrame

Per-parameter covariant Gelman–Rubin convergence statistic.

remove_burn_in(burn_in, reset_index=False, inplace=False)[source]

Remove burn-in samples from each MCMC chain.

Parameters:
burn_inint or float or array_like

Fraction or number of samples to remove or keep:

  • if 0 < burn_in < 1: remove first fraction of samples

  • elif 1 < burn_in: remove first number of samples

  • elif -1 < burn_in < 0: keep last fraction of samples

  • elif burn_in < -1: keep last number of samples

  • elif type(burn_in)==list: different burn-in for each chain

reset_indexbool, default=False

Whether to reset the index counter to start at zero or not.

inplacebool, default=False

Indicates whether to modify the existing array or return a copy.

class anesthetic.samples.NestedSamples(*args, **kwargs)[source]

Storage and plotting tools for Nested Sampling samples.

We extend the Samples class with the additional methods:

  • self.live_points(logL)

  • self.set_beta(beta)

  • self.prior()

  • self.posterior_points(beta)

  • self.prior_points()

  • self.stats()

  • self.logZ()

  • self.D_KL()

  • self.d()

  • self.recompute()

  • self.gui()

  • self.importance_sample()

Parameters:
datanp.array

Coordinates of samples. shape = (nsamples, ndims).

columnslist(str)

reference names of parameters

logLnp.array

loglikelihoods of samples.

logL_birthnp.array or int

birth loglikelihoods, or number of live points.

labelsdict

optional mapping from column names to plot labels

labelstr

Legend label default: basename of root

betafloat

thermodynamic inverse temperature default: 1.

logzerofloat

The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf. default: -1e30

D(nsamples=None)[source]
D_KL(nsamples=None, beta=None)[source]

Kullback–Leibler divergence.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

  • If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:
if nsamples is array-like:

pandas.Series, index nsamples.columns

elif beta is scalar and nsamples is None:

float

elif beta is array-like and nsamples is None:

pandas.Series, index beta

elif beta is scalar and nsamples is int:

pandas.Series, index range(nsamples)

elif beta is array-like and nsamples is int:

pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

property beta

Thermodynamic inverse temperature.

contour(logL=None)[source]

Convert contour from (index or None) to a float loglikelihood.

Convention is that live points are inclusive of the contour.

Helper function for:
  • NestedSamples.live_points,

  • NestedSamples.dead_points,

  • NestedSamples.truncate.

Parameters:
logLfloat or int, optional

Loglikelihood or iteration number If not provided, return the contour containing the last set of live points.

Returns:
logLfloat

Loglikelihood of contour

d(nsamples=None)[source]
d_G(nsamples=None, beta=None)[source]

Bayesian model dimensionality.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

  • If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:
if nsamples is array-like:

pandas.Series, index nsamples.columns

elif beta is scalar and nsamples is None:

float

elif beta is array-like and nsamples is None:

pandas.Series, index beta

elif beta is scalar and nsamples is int:

pandas.Series, index range(nsamples)

elif beta is array-like and nsamples is int:

pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

dead_points(logL=None)[source]

Get the dead points at a given contour.

Convention is that dead points are exclusive of the contour.

Parameters:
logLfloat or int, optional

Loglikelihood or iteration number to return dead points. If not provided, return the last set of dead points.

Returns:
dead_pointsSamples
Dead points at either:
  • contour logL (if input is float)

  • ith iteration (if input is integer)

  • last set of dead points if no argument provided

dlogX(nsamples=None)[source]
gui(params=None)[source]

Construct a graphical user interface for viewing samples.

importance_sample(logL_new, action='add', inplace=False)[source]

Perform importance re-weighting on the log-likelihood.

Parameters:
logL_newnp.array

New log-likelihood values. Should have the same shape as logL.

actionstr, default=’add’

Can be any of {‘add’, ‘replace’, ‘mask’}.

  • add: Add the new logL_new to the current logL.

  • replace: Replace the current logL with the new logL_new.

  • mask: treat logL_new as a boolean mask and only keep the corresponding (True) samples.

inplacebool, optional

Indicates whether to modify the existing array, or return a new frame with importance sampling applied. default: False

Returns:
samplesNestedSamples

Importance re-weighted samples.

live_points(logL=None)[source]

Get the live points within a contour.

Parameters:
logLfloat or int, optional

Loglikelihood or iteration number to return live points. If not provided, return the last set of active live points.

Returns:
live_pointsSamples
Live points at either:
  • contour logL (if input is float)

  • ith iteration (if input is integer)

  • last set of live points if no argument provided

logL_P(nsamples=None, beta=None)[source]

Posterior averaged loglikelihood.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

  • If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:
if nsamples is array-like:

pandas.Series, index nsamples.columns

elif beta is scalar and nsamples is None:

float

elif beta is array-like and nsamples is None:

pandas.Series, index beta

elif beta is scalar and nsamples is int:

pandas.Series, index range(nsamples)

elif beta is array-like and nsamples is int:

pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

logX(nsamples=None)[source]

Log-Volume.

The log of the prior volume contained within each iso-likelihood contour.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

Returns:
if nsamples is None:

WeightedSeries like self

elif nsamples is int:

WeightedDataFrame like self, columns range(nsamples)

logZ(nsamples=None, beta=None)[source]

Log-Evidence.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

  • If nsamples is array, nsamples is assumed to be logw

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:
if nsamples is array-like:

pandas.Series, index nsamples.columns

elif beta is scalar and nsamples is None:

float

elif beta is array-like and nsamples is None:

pandas.Series, index beta

elif beta is scalar and nsamples is int:

pandas.Series, index range(nsamples)

elif beta is array-like and nsamples is int:

pandas.Series, pandas.MultiIndex columns the product of beta and range(nsamples)

logdX(nsamples=None)[source]

Compute volume of shell of loglikelihood.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

Returns:
if nsamples is None:

WeightedSeries like self

elif nsamples is int:

WeightedDataFrame like self, columns range(nsamples)

logw(nsamples=None, beta=None)[source]

Log-nested sampling weight.

The logarithm of the (unnormalised) sampling weight log(L**beta*dX).

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

  • If nsamples is array, nsamples is assumed to be logw and returned (implementation convenience functionality)

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:
if nsamples is array-like:

WeightedDataFrame equal to nsamples

elif beta is scalar and nsamples is None:

WeightedSeries like self

elif beta is array-like and nsamples is None:

WeightedDataFrame like self, columns of beta

elif beta is scalar and nsamples is int:

WeightedDataFrame like self, columns of range(nsamples)

elif beta is array-like and nsamples is int:

WeightedDataFrame like self, MultiIndex columns the product of beta and range(nsamples)

ns_output(*args, **kwargs)[source]
posterior_points(beta=1)[source]

Get equally weighted posterior points at temperature beta.

prior(inplace=False)[source]

Re-weight samples at infinite temperature to get prior samples.

prior_points(params=None)[source]

Get equally weighted prior points.

recompute(logL_birth=None, inplace=False)[source]

Re-calculate the nested sampling contours and live points.

Parameters:
logL_birtharray-like or int, optional
  • array-like: the birth contours.

  • int: the number of live points.

  • default: use the existing birth contours to compute nlive

inplacebool, default=False

Indicates whether to modify the existing array, or return a new frame with contours resorted and nlive recomputed

set_beta(beta, inplace=False)[source]

Change the inverse temperature.

Parameters:
betafloat

Inverse temperature to set. (beta=0 corresponds to the prior distribution.)

inplacebool, default=False

Indicates whether to modify the existing array, or return a copy with the inverse temperature changed.

stats(nsamples=None, beta=None)[source]

Compute Nested Sampling statistics.

Using nested sampling we can compute:

  • logZ: Bayesian evidence

    \[\log Z = \int L \pi d\theta\]
  • D_KL: Kullback–Leibler divergence

    \[D_{KL} = \int P \log(P / \pi) d\theta\]
  • logL_P: posterior averaged log-likelihood

    \[\langle\log L\rangle_P = \int P \log L d\theta\]
  • d_G: Gaussian model dimensionality (or posterior variance of the log-likelihood)

    \[d_G/2 = \langle(\log L)^2\rangle_P - \langle\log L\rangle_P^2\]

    see Handley and Lemos (2019) for more details on model dimensionalities.

(Note that all of these are available as individual functions with the same signature.)

In addition to point estimates nested sampling provides an error bar or more generally samples from a (correlated) distribution over the variables. Samples from this distribution can be computed by providing an integer nsamples.

Nested sampling as an athermal algorithm is also capable of producing these as a function of inverse thermodynamic temperature beta. This is provided as a vectorised function. If nsamples is also provided a MultiIndex dataframe is generated.

These obey Occam’s razor equation:

\[\log Z = \langle\log L\rangle_P - D_{KL},\]

which splits a model’s quality logZ into a goodness-of-fit logL_P and a complexity penalty D_KL. See Hergt et al. (2021) for more detail.

Parameters:
nsamplesint, optional
  • If nsamples is not supplied, calculate mean value

  • If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling

betafloat, array-like, optional

inverse temperature(s) beta=1/kT. Default self.beta

Returns:
if beta is scalar and nsamples is None:

Series, index [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]

elif beta is scalar and nsamples is int:

Samples, index range(nsamples), columns [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]

elif beta is array-like and nsamples is None:

Samples, index beta, columns [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]

elif beta is array-like and nsamples is int:

Samples, index pandas.MultiIndex the product of beta and range(nsamples) columns [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]

truncate(logL=None)[source]

Truncate the run at a given contour.

Returns the union of the live_points and dead_points.

Parameters:
logLfloat or int, optional

Loglikelihood or iteration number to truncate run. If not provided, truncate at the last set of dead points.

Returns:
truncated_runNestedSamples
Run truncated at either:
  • contour logL (if input is float)

  • ith iteration (if input is integer)

  • last set of dead points if no argument provided

class anesthetic.samples.Samples(*args, **kwargs)[source]

Storage and plotting tools for general samples.

Extends the pandas.DataFrame by providing plotting methods and standardising sample storage.

Example plotting commands include
  • samples.plot_1d(['paramA', 'paramB'])

  • samples.plot_2d(['paramA', 'paramB'])

  • samples.plot_2d([['paramA', 'paramB'], ['paramC', 'paramD']])

Parameters:
datanp.array

Coordinates of samples. shape = (nsamples, ndims).

columnslist(str)

reference names of parameters

weightsnp.array

weights of samples.

logLnp.array

loglikelihoods of samples.

labelsdict or array-like

mapping from columns to plotting labels

labelstr

Legend label

logzerofloat, default=-1e30

The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf.

importance_sample(logL_new, action='add', inplace=False)[source]

Perform importance re-weighting on the log-likelihood.

Parameters:
logL_newnp.array

New log-likelihood values. Should have the same shape as logL.

actionstr, default=’add’

Can be any of {‘add’, ‘replace’, ‘mask’}.

  • add: Add the new logL_new to the current logL.

  • replace: Replace the current logL with the new logL_new.

  • mask: treat logL_new as a boolean mask and only keep the corresponding (True) samples.

inplacebool, default=False

Indicates whether to modify the existing array, or return a new frame with importance sampling applied.

Returns:
samplesSamples/MCMCSamples/NestedSamples

Importance re-weighted samples.

plot_1d(axes=None, *args, **kwargs)[source]

Create an array of 1D plots.

Parameters:
axesplotting axes, optional

Can be:

If a pandas.Series is provided as an existing set of axes, then this is used for creating the plot. Otherwise, a new set of axes are created using the list or lists of strings.

If not provided, then all parameters are plotted. This is intended for plotting a sliced array (e.g. samples[[‘x0’,’x1]].plot_1d().

kindstr, default=’kde_1d’

What kind of plots to produce. Alongside the usual pandas options {‘hist’, ‘box’, ‘kde’, ‘density’}, anesthetic also provides

Warning – while the other pandas plotting options {‘line’, ‘bar’, ‘barh’, ‘area’, ‘pie’} are also accessible, these can be hard to interpret/expensive for Samples, MCMCSamples, or NestedSamples.

logxlist(str), optional

Which parameters/columns to plot on a log scale. Needs to match if plotting on top of a pre-existing axes.

labelstr, optional

Legend label added to each axis.

Returns:
axespandas.Series of matplotlib.axes.Axes

Pandas array of axes objects

plot_2d(axes=None, *args, **kwargs)[source]

Create an array of 2D plots.

To avoid interfering with y-axis sharing, one-dimensional plots are created on a separate axis, which is monkey-patched onto the argument ax as the attribute ax.twin.

Parameters:
axesplotting axes, optional
Can be:

If a pandas.DataFrame is provided as an existing set of axes, then this is used for creating the plot. Otherwise, a new set of axes are created using the list or lists of strings.

If not provided, then all parameters are plotted. This is intended for plotting a sliced array (e.g. samples[[‘x0’,’x1]].plot_2d(). It is not advisible to plot an entire frame, as it is computationally expensive, and liable to run into linear algebra errors for degenerate derived parameters.

kind/kindsdict, optional

What kinds of plots to produce. Dictionary takes the keys ‘diagonal’ for the 1D plots and ‘lower’ and ‘upper’ for the 2D plots. The options for ‘diagonal’ are:

The options for ‘lower’ and ‘upper’ are:

There are also a set of shortcuts provided in plot_2d_default_kinds:

  • ‘kde_1d’: 1d kde plots down the diagonal

  • ‘kde_2d’: 2d kde plots in lower triangle

  • ‘kde’: 1d & 2d kde plots in lower & diagonal

  • ‘hist_1d’: 1d histograms down the diagonal

  • ‘hist_2d’: 2d histograms in lower triangle

  • ‘hist’: 1d & 2d histograms in lower & diagonal

  • ‘scatter_2d’: 2d scatter in lower triangle

  • ‘scatter’: 1d histograms down diagonal

    & 2d scatter in lower triangle

Feel free to add your own to this list! Default: {‘diagonal’: ‘kde_1d’, ‘lower’: ‘kde_2d’, ‘upper’:’scatter_2d’}

diagonal_kwargs, lower_kwargs, upper_kwargsdict, optional

kwargs for the diagonal (1D)/lower or upper (2D) plots. This is useful when there is a conflict of kwargs for different kinds of plots. Note that any kwargs directly passed to plot_2d will overwrite any kwarg with the same key passed to <sub>_kwargs. Default: {}

logx, logylist(str), optional

Which parameters/columns to plot on a log scale for the x-axis and y-axis, respectively. Needs to match if plotting on top of a pre-existing axes.

labelstr, optional

Legend label added to each axis.

Returns:
axespandas.DataFrame of matplotlib.axes.Axes

Pandas array of axes objects

plot_2d_default_kinds = {'default': {'diagonal': 'kde_1d', 'lower': 'kde_2d', 'upper': 'scatter_2d'}, 'fastkde': {'diagonal': 'fastkde_1d', 'lower': 'fastkde_2d'}, 'hist': {'diagonal': 'hist_1d', 'lower': 'hist_2d'}, 'hist_1d': {'diagonal': 'hist_1d'}, 'hist_2d': {'lower': 'hist_2d'}, 'kde': {'diagonal': 'kde_1d', 'lower': 'kde_2d'}, 'kde_1d': {'diagonal': 'kde_1d'}, 'kde_2d': {'lower': 'kde_2d'}, 'scatter': {'diagonal': 'hist_1d', 'lower': 'scatter_2d'}, 'scatter_2d': {'lower': 'scatter_2d'}}
property tex
to_hdf(path_or_buf, key, *args, **kwargs)[source]

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another pandas.DataFrame or Series to an existing HDF file please use append mode and a different a key.

Warning

One can store a subclass of pandas.DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

For more information see the user guide.

Parameters:
path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

  • ‘w’: write, a new file is created (an existing file with the same name would be deleted).

  • ‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ‘r+’: similar to ‘a’, but the file must already exist.

complevel{0-9}, default None

Specifies a compression level for data. A value of 0 or None disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

appendbool, default False

For Table formats, append the input data to the existing.

format{‘fixed’, ‘table’, None}, default ‘fixed’

Possible values:

  • ‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

  • ‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

  • If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.

indexbool, default True

Write pandas.DataFrame index as a column.

min_itemsizedict or int, optional

Map column names to minimum string sizes for columns.

nan_repAny, optional

How to represent null values as str. Not allowed with append=True.

dropnabool, default False, optional

Remove missing values.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open for a full list of options.

encodingstr, default “UTF-8”

See also

pandas.read_hdf

Read from HDF file.

pandas.DataFrame.to_orc

Write a pandas.DataFrame to the binary orc format.

pandas.DataFrame.to_parquet

Write a pandas.DataFrame to the binary parquet format.

pandas.DataFrame.to_sql

Write to a SQL table.

pandas.DataFrame.to_feather

Write out feather-format for pandas.DataFrames.

pandas.DataFrame.to_csv

Write out to a csv file.

Examples

>>> df = pandas.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
...                   index=['a', 'b', 'c'])  
>>> df.to_hdf('data.h5', key='df', mode='w')  

We can add another object to the same file:

>>> s = pd.Series([1, 2, 3, 4])  
>>> s.to_hdf('data.h5', key='s')  

Reading from HDF file:

>>> pandas.read_hdf('data.h5', 'df')  
A  B
a  1  4
b  2  5
c  3  6
>>> pandas.read_hdf('data.h5', 's')  
0    1
1    2
2    3
3    4
dtype: int64
anesthetic.samples.merge_nested_samples(runs)[source]

Merge one or more nested sampling runs.

Parameters:
runslist(NestedSamples)

List or array-like of one or more nested sampling runs. If only a single run is provided, this recalculates the live points and as such can be used for masked runs.

Returns:
samplesNestedSamples

Merged run.

anesthetic.samples.merge_samples_weighted(samples, weights=None, label=None)[source]

Merge sets of samples with weights.

Combine two (or more) samples so the new PDF is P(x|new) = weight_A P(x|A) + weight_B P(x|B). The number of samples and internal weights do not affect the result.

Parameters:
sampleslist(NestedSamples) or list(MCMCSamples)

List or array-like of one or more MCMC or nested sampling runs.

weightslist(double) or None

Weight for each run in samples (normalized internally). Can be omitted if samples are NestedSamples, then exp(logZ) is used as weight.

labelstr or None, default=None

Label for the new samples.

Returns:
new_samplesSamples

Merged (weighted) run.

anesthetic.scripts module

Command-line scripts for anesthetic.

anesthetic.scripts.gui(args=None)[source]

Launch the anesthetic GUI.

See anesthetic.gui.plot.RunPlotter for details.

anesthetic.testing module

Anesthetic testing utilities.

anesthetic.testing.assert_frame_equal(left, right, *args, **kwargs)[source]

Assert frames are equal, including metadata.

anesthetic.utils module

Data-processing utility functions.

anesthetic.utils.adjust_docstrings(obj, pattern, repl, *args, **kwargs)[source]

Adjust the docstrings of a class using regular expressions.

After the first argument, the remaining arguments are identical to re.sub.

Parameters:
clsclass

class to adjust

patternstr

regular expression pattern

replstr

replacement string

anesthetic.utils.compress_weights(w, u=None, ncompress=True)[source]

Compresses weights to their approximate channel capacity.

anesthetic.utils.compute_insertion_indexes(death, birth)[source]

Compute the live point insertion index for each point.

For more detail, see Fowlie et al. (2020)

Parameters:
death, birtharray-like

list of birth and death contours

Returns:
indexesnp.array

live point index at which each live point was inserted

anesthetic.utils.compute_nlive(death, birth)[source]

Compute number of live points from birth and death contours.

Parameters:
death, birtharray-like

list of birth and death contours

Returns:
nlivenp.array

number of live points at each contour

anesthetic.utils.histogram(a, **kwargs)[source]

Produce a histogram for path-based plotting.

This is a cheap histogram. Necessary if one wants to update the histogram dynamically, and redrawing and filling is very expensive.

This has the same arguments and keywords as numpy.histogram(), but is normalised to 1.

anesthetic.utils.histogram_bin_edges(samples, weights, bins='fd', range=None, beta='equal')[source]

Compute a good number of bins dynamically from weighted samples.

Parameters:
samplesarray_like

Input data.

weightsarray-like

Array of sample weights.

binsstr, default=’fd’

String defining the rule used to automatically compute a good number of bins for the weighted samples:

  • ‘fd’ : Freedman–Diaconis rule (modified for weighted data)

  • ‘scott’ : Scott’s rule (modified for weighted data)

  • ‘sqrt’ : Square root estimator (modified for weighted data)

range(float, float), optional

The lower and upper range of the bins. If not provided, range is simply (a.min(), a.max()). Values outside the range are ignored. The first element of the range must be less than or equal to the second.

betafloat, default=’equal’

The value of beta>0 used to calculate the number of effective samples via neff().

Returns:
bin_edgesarray of dtype float

The edges to pass to numpy.histogram().

anesthetic.utils.insertion_p_value(indexes, nlive, batch=0)[source]

Compute the p-value from insertion indexes, assuming constant nlive.

Note that this function doesn’t use scipy.stats.kstest() as the latter assumes continuous distributions.

For more detail, see Fowlie et al. (2020)

For a rolling test, you should provide the optional parameter batch!=0. In this case the test computes the p-value on consecutive batches of size nlive * batch, selects the smallest one and adjusts for multiple comparisons using a Bonferroni correction.

Parameters:
indexesarray-like

list of insertion indexes, sorted by death contour

nliveint

number of live points

batchfloat

batch size in units of nlive for a rolling p-value

Returns:
ks_resultdict

Kolmogorov-Smirnov test results:

  • D: Kolmogorov-Smirnov statistic

  • sample_size: sample size

  • p-value: p-value

if batch != 0:

  • iterations: bounds of batch with minimum p-value

  • nbatches: the number of batches in total

  • uncorrected p-value: p-value without Bonferroni correction

anesthetic.utils.is_int(x)[source]

Test whether x is an integer.

anesthetic.utils.iso_probability_contours(pdf, contours=[0.95, 0.68])[source]

Compute the iso-probability contour values.

anesthetic.utils.iso_probability_contours_from_samples(pdf, contours=[0.95, 0.68], weights=None)[source]

Compute the iso-probability contour values.

anesthetic.utils.logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False)[source]

Compute the log of the sum of exponentials of input elements.

This function has the same call signature as scipy.special.logsumexp() and mirrors scipy’s behaviour except for -np.inf input. If a and b are both -inf then scipy’s function will output nan whereas here we use:

\[\lim_{x \to -\infty} x \exp(x) = 0\]

Thus, if a=-inf in log(sum(b * exp(a)) then we can set b=0 such that that term is ignored in the sum.

anesthetic.utils.match_contour_to_contourf(contours, vmin, vmax)[source]

Get needed vmin, vmax to match contour colors to contourf colors.

contourf uses the arithmetic mean of contour levels to assign colors, whereas contour uses the contour level directly. To get the same colors for contour lines as for contourf faces, we need some fiddly algebra.

anesthetic.utils.mirror_1d(d, xmin=None, xmax=None)[source]

If necessary apply reflecting boundary conditions.

anesthetic.utils.mirror_2d(d_x_, d_y_, xmin=None, xmax=None, ymin=None, ymax=None)[source]

If necessary apply reflecting boundary conditions.

anesthetic.utils.neff(w, beta=1)[source]

Calculate effective number of samples.

Using the Huggins-Roy family of effective samples (https://aakinshin.net/posts/huggins-roy-ess/).

Parameters:
betaint, float, str, default = 1

The value of beta used to calculate the number of effective samples according to

\[ \begin{align}\begin{aligned}N_{eff} &= \bigg(\sum_{i=0}^n w_i^\beta \bigg)^{\frac{1}{1-\beta}}\\w_i &= \frac{w_i}{\sum_j w_j}\end{aligned}\end{align} \]

Beta can take any positive value. Larger beta corresponds to a greater compression such that:

\[\beta_1 < \beta_2 \Rightarrow N_{eff}(\beta_1) > N_{eff}(\beta_2)\]

Alternatively, beta can take one of the following strings as input:

  • If ‘inf’ or ‘equal’ is supplied (equivalent to beta=inf), then the resulting number of samples is the number of samples when compressed to equal weights, and given by:

\[ \begin{align}\begin{aligned}w_i &= \frac{w_i}{\sum_j w_j}\\N_{eff} &= \frac{1}{\max_i[w_i]}\end{aligned}\end{align} \]
  • If ‘entropy’ is supplied (equivalent to beta=1), then the estimate is determined via the entropy based calculation, also referred to as the channel capacity:

\[ \begin{align}\begin{aligned}H &= -\sum_i p_i \ln p_i\\p_i &= \frac{w_i}{\sum_j w_j}\\N_{eff} &= e^{H}\end{aligned}\end{align} \]
  • If ‘kish’ is supplied (equivalent to beta=2), then a Kish estimate is computed (Kish, Leslie (1965). Survey Sampling. New York: John Wiley & Sons, Inc. ISBN 0-471-10949-5):

\[N_{eff} = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}\]
  • str(float) input gets converted to the corresponding float value.

anesthetic.utils.nest_level(lst)[source]

Calculate the nesting level of a list.

anesthetic.utils.quantile(a, q, w=None, interpolation='linear')[source]

Compute the weighted quantile for a one dimensional array.

anesthetic.utils.sample_compression_1d(x, w=None, ncompress=True)[source]

Histogram a 1D set of weighted samples via subsampling.

This compresses the number of samples, combining weights.

Parameters:
xarray-like

x coordinate of samples for compressing

wpandas.Series, optional

weights of samples

ncompressint, default=True

Degree of compression.

  • If int: number of samples returned.

  • If True: compresses to the channel capacity (same as ncompress='entropy').

  • If False: no compression.

  • If str: determine number from the Huggins-Roy family of effective samples in neff() with beta=ncompress.

Returns:
x, w: array-like

Compressed samples and weights

anesthetic.utils.scaled_triangulation(x, y, cov)[source]

Triangulation scaled by a covariance matrix.

Parameters:
x, yarray-like

x and y coordinates of samples

covarray-like, 2d

Covariance matrix for scaling

Returns:
matplotlib.tri.Triangulation

Triangulation with the appropriate scaling

anesthetic.utils.temporary_seed(seed)[source]

Context for temporarily setting a numpy seed.

anesthetic.utils.triangular_sample_compression_2d(x, y, cov, w=None, n=1000)[source]

Histogram a 2D set of weighted samples via triangulation.

This defines bins via a triangulation of the subsamples and sums weights within triangles surrounding each point

Parameters:
x, yarray-like

x and y coordinates of samples for compressing

covarray-like, 2d

Covariance matrix for scaling

wpandas.Series, optional

weights of samples

nint, default=1000

number of samples returned.

Returns:
tri

matplotlib.tri.Triangulation with an appropriate scaling

warray-like

Compressed samples and weights

anesthetic.utils.unique(a)[source]

Find unique elements, retaining order.

anesthetic.weighted_labelled_pandas module

Pandas DataFrame with weights and labels.

class anesthetic.weighted_labelled_pandas.WeightedLabelledDataFrame(*args, **kwargs)[source]

Bases: WeightedDataFrame, LabelledDataFrame

pandas.DataFrame with weights and labels.

drop_labels(axis=1)[source]

Drop the labels from an axis if present.

get_label(param, axis=1)[source]

Retrieve mapping from paramnames to labels from an axis.

get_labels(axis=1)[source]

Retrieve labels from an axis.

get_labels_map(axis=1, fill=True)[source]

Retrieve mapping from paramnames to labels from an axis.

islabelled(axis=1)[source]

Search for existence of labels.

set_label(param, value, axis=1)[source]

Set a specific label to a specific value on an axis.

set_labels(labels, axis=1, inplace=False, level=None)[source]

Set labels along an axis.

class anesthetic.weighted_labelled_pandas.WeightedLabelledSeries(*args, **kwargs)[source]

Bases: WeightedSeries, LabelledSeries

Series with weights and labels.

set_label(param, value, axis=0)[source]

Set a specific label to a specific value.

anesthetic.weighted_labelled_pandas.read_csv(filename, *args, **kwargs)[source]

Read a CSV file into a WeightedLabelledDataFrame.

anesthetic.weighted_pandas module

Pandas DataFrame and Series with weighted samples.

class anesthetic.weighted_pandas.WeightedDataFrame(*args, **kwargs)[source]

Weighted version of pandas.DataFrame.

compress(ncompress=True, axis=0)[source]

Reduce the number of samples by discarding low-weights.

Parameters:
ncompressint, str, default=True

Degree of compression.

  • If True (default): reduce to the channel capacity (theoretical optimum compression), equivalent to ncompress='entropy'.

  • If > 0: desired number of samples after compression.

  • If <= 0: compress so that all remaining weights are unity.

  • If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

corr(method='pearson', skipna=True, *args, **kwargs)[source]

Compute pairwise correlation of columns, excluding NA/null values.

Parameters:
method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: callable with input two 1d ndarrays

    and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
WeightedDataFrame

Correlation matrix.

See also

WeightedDataFrame.corrwith

Compute pairwise correlation with another WeightedDataFrame or WeightedSeries.

WeightedSeries.corr

Compute the correlation between two WeightedSeries.

Notes

Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.WeightedDataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0
>>> df = pd.WeightedDataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
...                   columns=['dogs', 'cats'])
>>> df.corr(min_periods=3)
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0
corrwith(other, axis=0, drop=False, method='pearson', *args, **kwargs)[source]

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of WeightedDataFrame with rows or columns of WeightedSeries or WeightedDataFrame. WeightedDataFrames are first aligned along both axes before computing the correlations.

Parameters:
otherWeightedDataFrame, WeightedSeries

Object with which to compute correlations.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.

dropbool, default False

Drop missing indices from result.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

  • pearson : standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: callable with input two 1d ndarrays

    and returning a float.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
WeightedSeries

Pairwise correlations.

See also

WeightedDataFrame.corr

Compute pairwise correlation of columns.

Examples

>>> index = ["a", "b", "c", "d", "e"]
>>> columns = ["one", "two", "three", "four"]
>>> df1 = pd.WeightedDataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
>>> df2 = pd.WeightedDataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)
>>> df1.corrwith(df2)
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
>>> df2.corrwith(df1, axis=1)
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64
cov(*args, **kwargs)[source]

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a WeightedDataFrame. The returned data frame is the covariance matrix of the columns of the WeightedDataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:
min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result.

ddofint, default 1

Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
WeightedDataFrame

The covariance matrix of the series of the WeightedDataFrame.

See also

WeightedSeries.cov

Compute covariance with another WeightedSeries.

pandas.core.window.ewm.ExponentialMovingWindow.cov

Exponential weighted sample covariance.

pandas.core.window.expanding.Expanding.cov

Expanding sample covariance.

pandas.core.window.rolling.Rolling.cov

Rolling sample covariance.

Notes

Returns the covariance matrix of the WeightedDataFrame’s time series. The covariance is normalized by N-ddof.

For WeightedDataFrames that have WeightedSeries that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member WeightedSeries.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

Examples

>>> df = pd.WeightedDataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
>>> np.random.seed(42)
>>> df = pd.WeightedDataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.WeightedDataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202
groupby(by=None, axis=_NoDefault.no_default, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]

Group WeightedDataFrame using a mapper or by a WeightedSeries of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
bymapping, function, label, pd.Grouper or list of such

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or WeightedSeries is passed, the WeightedSeries or dict VALUES will be used to determine the groups (the WeightedSeries’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Split along rows (0) or columns (1). For WeightedSeries this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.

levelint, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

as_indexbool, default True

Return object with group labels as the index. Only relevant for WeightedDataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

sortbool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original WeightedDataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.

group_keysbool, default True

When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed WeightedSeries or WeightedDataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.

observedbool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

dropnabool, default True

If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:
pandas.api.typing.WeightedDataFrameGroupBy

Returns a groupby object that contains information about the groups.

See also

pandas.DataFrame.resample

Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

Examples

>>> df = pd.WeightedDataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.WeightedDataFrame({'Max Speed': [390., 350., 30., 20.]},
...                   index=index)
>>> df
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level="Type").mean()
         Max Speed
Type
Captive      210.0
Wild         185.0

We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True.

>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.WeightedDataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
    a   c
b
1.0 2   3
2.0 2   5
>>> df.groupby(by=["b"], dropna=False).sum()
    a   c
b
1.0 2   3
2.0 2   5
NaN 1   4
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.WeightedDataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by="a").sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
>>> df.groupby(by="a", dropna=False).sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0

When using .apply(), use group_keys to include or exclude the group keys. The group_keys argument defaults to True (include).

>>> df = pd.WeightedDataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
          Max Speed
Animal
Falcon 0      380.0
       1      370.0
Parrot 2       24.0
       3       26.0
>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
   Max Speed
0      380.0
1      370.0
2       24.0
3       26.0
kurt(axis=0, skipna=True, *args, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
WeightedSeries or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64
kurtosis(*args, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
WeightedSeries or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64
mad(axis=0, skipna=True, *args, **kwargs)[source]
mean(axis=0, skipna=True, *args, **kwargs)[source]

Return the mean of the values over the requested axis.

Parameters:
axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
WeightedSeries or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.mean()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.mean()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.mean(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.mean(numeric_only=True)
a   1.5
dtype: float64
median(*args, **kwargs)[source]

Return the median of the values over the requested axis.

Parameters:
axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
WeightedSeries or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.median()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64
quantile(q=0.5, axis=0, numeric_only=None, interpolation='linear', method=None)[source]

Return values at the given quantile over requested axis.

Parameters:
qfloat or array-like, default 0.5 (50% quantile)

Value between 0 <= q <= 1, the quantile(s) to compute.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

numeric_onlybool, default False

Include only float, int or boolean data.

Changed in version 2.0.0: The default value of numeric_only is now False.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

  • linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.

  • lower: i.

  • higher: j.

  • nearest: i or j whichever is nearest.

  • midpoint: (i + j) / 2.

method{‘single’, ‘table’}, default ‘single’

Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). When ‘table’, the only allowed interpolation methods are ‘nearest’, ‘lower’, and ‘higher’.

Returns:
WeightedSeries or WeightedDataFrame
If q is an array, a WeightedDataFrame will be returned where the

index is q, the columns are the columns of self, and the values are the quantiles.

If q is a float, a WeightedSeries will be returned where the

index is the columns of self and the values are the quantiles.

See also

pandas.core.window.rolling.Rolling.quantile

Rolling quantile.

numpy.percentile

Numpy function to compute the percentile.

Examples

>>> df = pd.WeightedDataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying method=’table’ will compute the quantile over all columns.

>>> df.quantile(.1, method="table", interpolation="nearest")
a    1
b    1
Name: 0.1, dtype: int64
>>> df.quantile([.1, .5], method="table", interpolation="nearest")
     a    b
0.1  1    1
0.5  3  100

Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.

>>> df = pd.WeightedDataFrame({'A': [1, 2],
...                    'B': [pd.Timestamp('2010'),
...                          pd.Timestamp('2011')],
...                    'C': [pd.Timedelta('1 days'),
...                          pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object
sample(*args, **kwargs)[source]

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once.

weightsstr or ndarray-like, optional

Default ‘None’ results in equal probability weighting. If passed a WeightedSeries, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a WeightedDataFrame, will accept the name of a column when axis = 0. Unless weights are a WeightedSeries, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional

If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

Changed in version 1.4.0: np.random.Generator objects now accepted

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For WeightedSeries this parameter is unused and defaults to None.

ignore_indexbool, default False

If True, the resulting index will be labeled 0, 1, …, n - 1.

New in version 1.3.0.

Returns:
WeightedSeries or WeightedDataFrame

A new object of same type as caller containing n items randomly sampled from the caller object.

See also

WeightedDataFrameGroupBy.sample

Generates random samples from each group of a WeightedDataFrame object.

WeightedSeriesGroupBy.sample

Generates random samples from each group of a WeightedSeries object.

numpy.random.choice

Generates a random sample from a given 1-D numpy array.

Notes

If frac > 1, replacement should be set to True.

Examples

>>> df = pd.WeightedDataFrame({'num_legs': [2, 4, 8, 0],
...                    'num_wings': [2, 0, 0, 0],
...                    'num_specimen_seen': [10, 2, 1, 8]},
...                   index=['falcon', 'dog', 'spider', 'fish'])
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the WeightedSeries df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df['num_legs'].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the WeightedDataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the WeightedDataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a WeightedDataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8
sem(axis=0, skipna=True)[source]

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:
axis{index (0), columns (1)}

For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

Returns:
WeightedSeries or WeightedDataFrame (if level specified)

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.sem().round(6)
0.57735

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.sem()
a   0.5
b   0.5
dtype: float64

Using axis=1

>>> df.sem(axis=1)
tiger   0.5
zebra   0.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.sem(numeric_only=True)
a   0.5
dtype: float64
skew(axis=0, skipna=True, *args, **kwargs)[source]

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:
axis{index (0), columns (1)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
WeightedSeries or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.skew()
0.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]},
...                   index=['tiger', 'zebra', 'cow'])
>>> df
        a   b   c
tiger   1   2   1
zebra   2   3   3
cow     3   4   5
>>> df.skew()
a   0.0
b   0.0
c   0.0
dtype: float64

Using axis=1

>>> df.skew(axis=1)
tiger   1.732051
zebra  -1.732051
cow     0.000000
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']},
...                   index=['tiger', 'zebra', 'cow'])
>>> df.skew(numeric_only=True)
a   0.0
dtype: float64
std(*args, **kwargs)[source]

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
axis{index (0), columns (1)}

For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

Returns:
WeightedSeries or WeightedDataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

Examples

>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

The standard deviation of the columns can be found as follows:

>>> df.std()
age       18.786076
height     0.237417
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.std(ddof=0)
age       16.269219
height     0.205609
dtype: float64
var(axis=0, skipna=True, *args, **kwargs)[source]

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
axis{index (0), columns (1)}

For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

Returns:
WeightedSeries or WeightedDataFrame (if level specified)

Examples

>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01
>>> df.var()
age       352.916667
height      0.056367
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.var(ddof=0)
age       264.687500
height      0.042275
dtype: float64
class anesthetic.weighted_pandas.WeightedDataFrameGroupBy(*args, **kwargs)[source]

Weighted version of pandas.core.groupby.DataFrameGroupBy.

cov(*args, **kwargs)[source]

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a WeightedDataFrame. The returned data frame is the covariance matrix of the columns of the WeightedDataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:
min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result.

ddofint, default 1

Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
WeightedDataFrame

The covariance matrix of the series of the WeightedDataFrame.

See also

WeightedSeries.cov

Compute covariance with another WeightedSeries.

pandas.core.window.ewm.ExponentialMovingWindow.cov

Exponential weighted sample covariance.

pandas.core.window.expanding.Expanding.cov

Expanding sample covariance.

pandas.core.window.rolling.Rolling.cov

Rolling sample covariance.

Notes

Returns the covariance matrix of the WeightedDataFrame’s time series. The covariance is normalized by N-ddof.

For WeightedDataFrames that have WeightedSeries that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member WeightedSeries.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

Examples

>>> df = pd.WeightedDataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
>>> np.random.seed(42)
>>> df = pd.WeightedDataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.WeightedDataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202
get_weights()[source]

Return the weights of the grouped samples.

sample(*args, **kwargs)[source]

Return a random sample of items from each group.

You can use random_state for reproducibility.

Parameters:
nint, optional

Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.

fracfloat, optional

Fraction of items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once.

weightslist-like, optional

Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying WeightedDataFrame or WeightedSeries object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.

random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional

If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

Changed in version 1.4.0: np.random.Generator objects now accepted

Returns:
WeightedSeries or WeightedDataFrame

A new object of same type as caller containing items randomly sampled within each group from the caller object.

See also

WeightedDataFrame.sample

Generate random samples from a WeightedDataFrame object.

numpy.random.choice

Generate a random sample from a given 1-D numpy array.

Examples

>>> df = pd.WeightedDataFrame(
...     {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)}
... )
>>> df
       a  b
0    red  0
1    red  1
2   blue  2
3   blue  3
4  black  4
5  black  5

Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:

>>> df.groupby("a").sample(n=1, random_state=1)
       a  b
4  black  4
2   blue  2
1    red  1

Set frac to sample fixed proportions rather than counts:

>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2)
5    5
2    2
0    0
Name: b, dtype: int64

Control sample probabilities within groups by setting weights:

>>> df.groupby("a").sample(
...     n=1,
...     weights=[1, 1, 1, 0, 0, 1],
...     random_state=1,
... )
       a  b
5  black  5
2   blue  2
0    red  0
class anesthetic.weighted_pandas.WeightedGroupBy(*args, **kwargs)[source]

Weighted version of pandas.core.groupby.GroupBy.

get_weights()[source]

Return the weights of the grouped samples.

kurt(*args, **kwargs)[source]
kurtosis(*args, **kwargs)[source]
mean(*args, **kwargs)[source]

Compute mean of groups, excluding missing values.

Parameters:
numeric_onlybool, default False

Include only float, int, boolean columns.

Changed in version 2.0.0: numeric_only no longer accepts None and defaults to False.

enginestr, default None
  • 'cython' : Runs the operation through C-extensions from cython.

  • 'numba' : Runs the operation through JIT compiled code from numba.

  • None : Defaults to 'cython' or globally setting compute.use_numba

New in version 1.4.0.

engine_kwargsdict, default None
  • For 'cython' engine, there are no accepted engine_kwargs

  • For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {{'nopython': True, 'nogil': False, 'parallel': False}}

New in version 1.4.0.

Returns:
pandas.WeightedSeries or pandas.WeightedDataFrame

See also

WeightedSeries.groupby

Apply a function groupby to a WeightedSeries.

WeightedDataFrame.groupby

Apply a function groupby to each row or column of a WeightedDataFrame.

Examples

>>> df = pd.WeightedDataFrame({'A': [1, 1, 2, 1, 2],
...                    'B': [np.nan, 2, 3, 4, 5],
...                    'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])

Groupby one column and return the mean of the remaining columns in each group.

>>> df.groupby('A').mean()
     B         C
A
1  3.0  1.333333
2  4.0  1.500000

Groupby two columns and return the mean of the remaining column.

>>> df.groupby(['A', 'B']).mean()
         C
A B
1 2.0  2.0
  4.0  1.0
2 3.0  1.0
  5.0  2.0

Groupby one column and return the mean of only particular column in the group.

>>> df.groupby('A')['B'].mean()
A
1    3.0
2    4.0
Name: B, dtype: float64
median(*args, **kwargs)[source]

Compute median of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex

Parameters:
numeric_onlybool, default False

Include only float, int, boolean columns.

Changed in version 2.0.0: numeric_only no longer accepts None and defaults to False.

Returns:
WeightedSeries or WeightedDataFrame

Median of values within each group.

Examples

For WeightedSeriesGroupBy:

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst)
>>> ser
a     7
a     2
a     8
b     4
b     3
b     3
dtype: int64
>>> ser.groupby(level=0).median()
a    7.0
b    3.0
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.WeightedDataFrame(data, index=['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'])
>>> df
         a  b
  dog    1  1
  dog    3  4
  dog    5  8
mouse    7  4
mouse    7  4
mouse    8  2
mouse    3  1
>>> df.groupby(level=0).median()
         a    b
dog    3.0  4.0
mouse  7.0  3.0

For Resampler:

>>> ser = pd.WeightedSeries([1, 2, 3, 3, 4, 5],
...                 index=pd.DatetimeIndex(['2023-01-01',
...                                         '2023-01-10',
...                                         '2023-01-15',
...                                         '2023-02-01',
...                                         '2023-02-10',
...                                         '2023-02-15']))
>>> ser.resample('MS').median()
2023-01-01    2.0
2023-02-01    4.0
Freq: MS, dtype: float64
quantile(*args, **kwargs)[source]

Return group values at the given quantile, a la numpy.percentile.

Parameters:
qfloat or array-like, default 0.5 (50% quantile)

Value(s) between 0 and 1 providing the quantile(s) to compute.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

Method to use when the desired quantile falls between two points.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: numeric_only now defaults to False.

Returns:
WeightedSeries or WeightedDataFrame

Return type determined by caller of GroupBy object.

See also

WeightedSeries.quantile

Similar method for WeightedSeries.

WeightedDataFrame.quantile

Similar method for WeightedDataFrame.

numpy.percentile

NumPy method to compute qth percentile.

Examples

>>> df = pd.WeightedDataFrame([
...     ['a', 1], ['a', 2], ['a', 3],
...     ['b', 1], ['b', 3], ['b', 5]
... ], columns=['key', 'val'])
>>> df.groupby('key').quantile()
    val
key
a    2.0
b    3.0
sem(*args, **kwargs)[source]

Compute standard error of the mean of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:
ddofint, default 1

Degrees of freedom.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: numeric_only now defaults to False.

Returns:
WeightedSeries or WeightedDataFrame

Standard error of the mean of values within each group.

Examples

For WeightedSeriesGroupBy:

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = pd.WeightedSeries([5, 10, 8, 14], index=lst)
>>> ser
a     5
a    10
b     8
b    14
dtype: int64
>>> ser.groupby(level=0).sem()
a    2.5
b    3.0
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = [[1, 12, 11], [1, 15, 2], [2, 5, 8], [2, 6, 12]]
>>> df = pd.WeightedDataFrame(data, columns=["a", "b", "c"],
...                   index=["tuna", "salmon", "catfish", "goldfish"])
>>> df
           a   b   c
    tuna   1  12  11
  salmon   1  15   2
 catfish   2   5   8
goldfish   2   6  12
>>> df.groupby("a").sem()
      b  c
a
1    1.5  4.5
2    0.5  2.0

For Resampler:

>>> ser = pd.WeightedSeries([1, 3, 2, 4, 3, 8],
...                 index=pd.DatetimeIndex(['2023-01-01',
...                                         '2023-01-10',
...                                         '2023-01-15',
...                                         '2023-02-01',
...                                         '2023-02-10',
...                                         '2023-02-15']))
>>> ser.resample('MS').sem()
2023-01-01    0.577350
2023-02-01    1.527525
Freq: MS, dtype: float64
skew(*args, **kwargs)[source]
std(*args, **kwargs)[source]

Compute standard deviation of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:
ddofint, default 1

Degrees of freedom.

enginestr, default None
  • 'cython' : Runs the operation through C-extensions from cython.

  • 'numba' : Runs the operation through JIT compiled code from numba.

  • None : Defaults to 'cython' or globally setting compute.use_numba

New in version 1.4.0.

engine_kwargsdict, default None
  • For 'cython' engine, there are no accepted engine_kwargs

  • For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {{'nopython': True, 'nogil': False, 'parallel': False}}

New in version 1.4.0.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: numeric_only now defaults to False.

Returns:
WeightedSeries or WeightedDataFrame

Standard deviation of values within each group.

See also

WeightedSeries.groupby

Apply a function groupby to a WeightedSeries.

WeightedDataFrame.groupby

Apply a function groupby to each row or column of a WeightedDataFrame.

Examples

For WeightedSeriesGroupBy:

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst)
>>> ser
a     7
a     2
a     8
b     4
b     3
b     3
dtype: int64
>>> ser.groupby(level=0).std()
a    3.21455
b    0.57735
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.WeightedDataFrame(data, index=['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'])
>>> df
         a  b
  dog    1  1
  dog    3  4
  dog    5  8
mouse    7  4
mouse    7  4
mouse    8  2
mouse    3  1
>>> df.groupby(level=0).std()
              a         b
dog    2.000000  3.511885
mouse  2.217356  1.500000
var(*args, **kwargs)[source]

Compute variance of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameters:
ddofint, default 1

Degrees of freedom.

enginestr, default None
  • 'cython' : Runs the operation through C-extensions from cython.

  • 'numba' : Runs the operation through JIT compiled code from numba.

  • None : Defaults to 'cython' or globally setting compute.use_numba

New in version 1.4.0.

engine_kwargsdict, default None
  • For 'cython' engine, there are no accepted engine_kwargs

  • For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {{'nopython': True, 'nogil': False, 'parallel': False}}

New in version 1.4.0.

numeric_onlybool, default False

Include only float, int or boolean data.

New in version 1.5.0.

Changed in version 2.0.0: numeric_only now defaults to False.

Returns:
WeightedSeries or WeightedDataFrame

Variance of values within each group.

See also

WeightedSeries.groupby

Apply a function groupby to a WeightedSeries.

WeightedDataFrame.groupby

Apply a function groupby to each row or column of a WeightedDataFrame.

Examples

For WeightedSeriesGroupBy:

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst)
>>> ser
a     7
a     2
a     8
b     4
b     3
b     3
dtype: int64
>>> ser.groupby(level=0).var()
a    10.333333
b     0.333333
dtype: float64

For WeightedDataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = pd.WeightedDataFrame(data, index=['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'])
>>> df
         a  b
  dog    1  1
  dog    3  4
  dog    5  8
mouse    7  4
mouse    7  4
mouse    8  2
mouse    3  1
>>> df.groupby(level=0).var()
              a          b
dog    4.000000  12.333333
mouse  4.916667   2.250000
class anesthetic.weighted_pandas.WeightedSeries(*args, **kwargs)[source]

Weighted version of pandas.Series.

compress(ncompress=True)[source]

Reduce the number of samples by discarding low-weights.

Parameters:
ncompressint, str, default=True

Degree of compression.

  • If True (default): reduce to the channel capacity (theoretical optimum compression), equivalent to ncompress='entropy'.

  • If > 0: desired number of samples after compression.

  • If <= 0: compress so that all remaining weights are unity.

  • If str: determine number from the Huggins-Roy family of effective samples in anesthetic.utils.neff() with beta=ncompress.

corr(other, *args, **kwargs)[source]

Compute correlation with other WeightedSeries, excluding missing values.

The two WeightedSeries objects are not required to be the same length and will be aligned internally before the correlation function is applied.

Parameters:
otherWeightedSeries

WeightedSeries with which to compute the correlation.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method used to compute correlation:

  • pearson : Standard correlation coefficient

  • kendall : Kendall Tau correlation coefficient

  • spearman : Spearman rank correlation

  • callable: Callable with input two 1d ndarrays and returning a float.

Warning

Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional

Minimum number of observations needed to have a valid result.

Returns:
float

Correlation with other.

See also

WeightedDataFrame.corr

Compute pairwise correlation between columns.

WeightedDataFrame.corrwith

Compute pairwise correlation with another WeightedDataFrame or WeightedSeries.

Notes

Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

Automatic data alignment: as with all pandas operations, automatic data alignment is performed for this method. corr() automatically considers values with matching indices.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> s1 = pd.WeightedSeries([.2, .0, .6, .2])
>>> s2 = pd.WeightedSeries([.3, .6, .0, .1])
>>> s1.corr(s2, method=histogram_intersection)
0.3

Pandas auto-aligns the values with matching indices

>>> s1 = pd.WeightedSeries([1, 2, 3], index=[0, 1, 2])
>>> s2 = pd.WeightedSeries([1, 2, 3], index=[2, 1, 0])
>>> s1.corr(s2)
-1.0
cov(other, *args, **kwargs)[source]

Compute covariance with WeightedSeries, excluding missing values.

The two WeightedSeries objects are not required to be the same length and will be aligned internally before the covariance is calculated.

Parameters:
otherWeightedSeries

WeightedSeries with which to compute the covariance.

min_periodsint, optional

Minimum number of observations needed to have a valid result.

ddofint, default 1

Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns:
float

Covariance between WeightedSeries and other normalized by N-1 (unbiased estimator).

See also

WeightedDataFrame.cov

Compute pairwise covariance of columns.

Examples

>>> s1 = pd.WeightedSeries([0.90010907, 0.13484424, 0.62036035])
>>> s2 = pd.WeightedSeries([0.12528585, 0.26962463, 0.51111198])
>>> s1.cov(s2)
-0.01685762652715874
groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]

Group WeightedSeries using a mapper or by a WeightedSeries of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
bymapping, function, label, pd.Grouper or list of such

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or WeightedSeries is passed, the WeightedSeries or dict VALUES will be used to determine the groups (the WeightedSeries’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Split along rows (0) or columns (1). For WeightedSeries this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.

levelint, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

as_indexbool, default True

Return object with group labels as the index. Only relevant for WeightedDataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

sortbool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original WeightedDataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.

group_keysbool, default True

When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed WeightedSeries or WeightedDataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.

observedbool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

dropnabool, default True

If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:
pandas.api.typing.WeightedSeriesGroupBy

Returns a groupby object that contains information about the groups.

See also

pandas.Series.resample

Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

Examples

>>> ser = pd.WeightedSeries([390., 350., 30., 20.],
...                 index=['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...                 name="Max Speed")
>>> ser
Falcon    390.0
Falcon    350.0
Parrot     30.0
Parrot     20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(["a", "b", "a", "b"]).mean()
a    210.0
b    185.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(ser > 100).mean()
Max Speed
False     25.0
True     370.0
Name: Max Speed, dtype: float64

Grouping by Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> ser = pd.WeightedSeries([390., 350., 30., 20.], index=index, name="Max Speed")
>>> ser
Animal  Type
Falcon  Captive    390.0
        Wild       350.0
Parrot  Captive     30.0
        Wild        20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Animal
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level="Type").mean()
Type
Captive    210.0
Wild       185.0
Name: Max Speed, dtype: float64

We can also choose to include NA in group keys or not by defining dropna parameter, the default setting is True.

>>> ser = pd.WeightedSeries([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
>>> ser.groupby(level=0).sum()
a    3
b    3
dtype: int64
>>> ser.groupby(level=0, dropna=False).sum()
a    3
b    3
NaN  3
dtype: int64
>>> arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot']
>>> ser = pd.WeightedSeries([390., 350., 30., 20.], index=arrays, name="Max Speed")
>>> ser.groupby(["a", "b", "a", np.nan]).mean()
a    210.0
b    350.0
Name: Max Speed, dtype: float64
>>> ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()
a    210.0
b    350.0
NaN   20.0
Name: Max Speed, dtype: float64
kurt(skipna=True)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64
kurtosis(*args, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64
mad(skipna=True)[source]
mean(skipna=True)[source]

Return the mean of the values over the requested axis.

Parameters:
axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.mean()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.mean()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.mean(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.mean(numeric_only=True)
a   1.5
dtype: float64
median(*args, **kwargs)[source]

Return the median of the values over the requested axis.

Parameters:
axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.median()
2.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64
quantile(q=0.5, interpolation='linear')[source]

Return value at the given quantile.

Parameters:
qfloat or array-like, default 0.5 (50% quantile)

The quantile(s) to compute, which can lie in range: 0 <= q <= 1.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

  • linear: i + (j - i) * (x-i)/(j-i), where (x-i)/(j-i) is the fractional part of the index surrounded by i > j.

  • lower: i.

  • higher: j.

  • nearest: i or j whichever is nearest.

  • midpoint: (i + j) / 2.

Returns:
float or WeightedSeries

If q is an array, a WeightedSeries will be returned where the index is q and the values are the quantiles, otherwise a float will be returned.

See also

pandas.core.window.rolling.Rolling.quantile

Calculate the rolling quantile.

numpy.percentile

Returns the q-th percentile(s) of the array elements.

Examples

>>> s = pd.WeightedSeries([1, 2, 3, 4])
>>> s.quantile(.5)
2.5
>>> s.quantile([.25, .5, .75])
0.25    1.75
0.50    2.50
0.75    3.25
dtype: float64
sample(*args, **kwargs)[source]

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:
nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once.

weightsstr or ndarray-like, optional

Default ‘None’ results in equal probability weighting. If passed a WeightedSeries, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a WeightedDataFrame, will accept the name of a column when axis = 0. Unless weights are a WeightedSeries, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional

If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

Changed in version 1.4.0: np.random.Generator objects now accepted

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For WeightedSeries this parameter is unused and defaults to None.

ignore_indexbool, default False

If True, the resulting index will be labeled 0, 1, …, n - 1.

New in version 1.3.0.

Returns:
WeightedSeries or WeightedDataFrame

A new object of same type as caller containing n items randomly sampled from the caller object.

See also

WeightedDataFrameGroupBy.sample

Generates random samples from each group of a WeightedDataFrame object.

WeightedSeriesGroupBy.sample

Generates random samples from each group of a WeightedSeries object.

numpy.random.choice

Generates a random sample from a given 1-D numpy array.

Notes

If frac > 1, replacement should be set to True.

Examples

>>> df = pd.WeightedDataFrame({'num_legs': [2, 4, 8, 0],
...                    'num_wings': [2, 0, 0, 0],
...                    'num_specimen_seen': [10, 2, 1, 8]},
...                   index=['falcon', 'dog', 'spider', 'fish'])
>>> df
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the WeightedSeries df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

>>> df['num_legs'].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the WeightedDataFrame with replacement:

>>> df.sample(frac=0.5, replace=True, random_state=1)
      num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the WeightedDataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

>>> df.sample(frac=2, replace=True, random_state=1)
        num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a WeightedDataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
        num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8
sem(skipna=True)[source]

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:
axis{index (0)}

For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

Returns:
scalar or WeightedSeries (if level specified)

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.sem().round(6)
0.57735

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.sem()
a   0.5
b   0.5
dtype: float64

Using axis=1

>>> df.sem(axis=1)
tiger   0.5
zebra   0.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.sem(numeric_only=True)
a   0.5
dtype: float64
skew(skipna=True)[source]

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:
axis{index (0)}

Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.

For WeightedDataFrames, specifying axis=None will apply the aggregation across both axes.

New in version 2.0.0.

skipnabool, default True

Exclude NA/null values when computing the result.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

**kwargs

Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Examples

>>> s = pd.WeightedSeries([1, 2, 3])
>>> s.skew()
0.0

With a WeightedDataFrame

>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]},
...                   index=['tiger', 'zebra', 'cow'])
>>> df
        a   b   c
tiger   1   2   1
zebra   2   3   3
cow     3   4   5
>>> df.skew()
a   0.0
b   0.0
c   0.0
dtype: float64

Using axis=1

>>> df.skew(axis=1)
tiger   1.732051
zebra  -1.732051
cow     0.000000
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']},
...                   index=['tiger', 'zebra', 'cow'])
>>> df.skew(numeric_only=True)
a   0.0
dtype: float64
std(*args, **kwargs)[source]

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
axis{index (0)}

For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

Returns:
scalar or WeightedSeries (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

Examples

>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

The standard deviation of the columns can be found as follows:

>>> df.std()
age       18.786076
height     0.237417
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.std(ddof=0)
age       16.269219
height     0.205609
dtype: float64
var(skipna=True)[source]

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
axis{index (0)}

For WeightedSeries this parameter is unused and defaults to 0.

Warning

The behavior of WeightedDataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False

Include only float, int, boolean columns. Not implemented for WeightedSeries.

Returns:
scalar or WeightedSeries (if level specified)

Examples

>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01
>>> df.var()
age       352.916667
height      0.056367
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.var(ddof=0)
age       264.687500
height      0.042275
dtype: float64
class anesthetic.weighted_pandas.WeightedSeriesGroupBy(*args, **kwargs)[source]

Weighted version of pandas.core.groupby.SeriesGroupBy.

cov(*args, **kwargs)[source]

Compute covariance with WeightedSeries, excluding missing values.

The two WeightedSeries objects are not required to be the same length and will be aligned internally before the covariance is calculated.

Parameters:
otherWeightedSeries

WeightedSeries with which to compute the covariance.

min_periodsint, optional

Minimum number of observations needed to have a valid result.

ddofint, default 1

Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns:
float

Covariance between WeightedSeries and other normalized by N-1 (unbiased estimator).

See also

WeightedDataFrame.cov

Compute pairwise covariance of columns.

Examples

>>> s1 = pd.WeightedSeries([0.90010907, 0.13484424, 0.62036035])
>>> s2 = pd.WeightedSeries([0.12528585, 0.26962463, 0.51111198])
>>> s1.cov(s2)
-0.01685762652715874
sample(*args, **kwargs)[source]

Return a random sample of items from each group.

You can use random_state for reproducibility.

Parameters:
nint, optional

Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.

fracfloat, optional

Fraction of items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once.

weightslist-like, optional

Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying WeightedDataFrame or WeightedSeries object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.

random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional

If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

Changed in version 1.4.0: np.random.Generator objects now accepted

Returns:
WeightedSeries or WeightedDataFrame

A new object of same type as caller containing items randomly sampled within each group from the caller object.

See also

WeightedDataFrame.sample

Generate random samples from a WeightedDataFrame object.

numpy.random.choice

Generate a random sample from a given 1-D numpy array.

Examples

>>> df = pd.WeightedDataFrame(
...     {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)}
... )
>>> df
       a  b
0    red  0
1    red  1
2   blue  2
3   blue  3
4  black  4
5  black  5

Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:

>>> df.groupby("a").sample(n=1, random_state=1)
       a  b
4  black  4
2   blue  2
1    red  1

Set frac to sample fixed proportions rather than counts:

>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2)
5    5
2    2
0    0
Name: b, dtype: int64

Control sample probabilities within groups by setting weights:

>>> df.groupby("a").sample(
...     n=1,
...     weights=[1, 1, 1, 0, 0, 1],
...     random_state=1,
... )
       a  b
5  black  5
2   blue  2
0    red  0
class anesthetic.weighted_pandas._WeightedObject(*args, **kwargs)[source]

Common methods for WeightedSeries and WeightedDataFrame.

drop_weights(axis=0)[source]

Drop weights.

get_weights(axis=0)[source]

Retrieve sample weights from an axis.

isweighted(axis=0)[source]

Determine if weights are actually present.

neff(axis=0, beta=1)[source]

Effective number of samples.

reset_index(level=None, drop=False, inplace=False, *args, **kwargs)[source]

Reset the index, retaining weights.

set_weights(weights, axis=0, inplace=False, level=None)[source]

Set sample weights along an axis.

Parameters:
weights1d array-like

The sample weights to put in an index.

axisint (0,1), default=0

Whether to put weights in an index or column.

inplacebool, default=False

Whether to operate inplace, or return a new array.

levelint

Which level in the index to insert before. Defaults to inserting at back

anesthetic.weighted_pandas.cls

alias of WeightedSeriesGroupBy

anesthetic.weighted_pandas.read_csv(filename, *args, **kwargs)[source]

Read a CSV file into a WeightedDataFrame.