anesthetic package
Anesthetic: nested sampling post-processing.
anesthetic subpackages
anesthetic modules
anesthetic.boundary module
Boundary correction utilities.
- anesthetic.boundary.cut_and_normalise_gaussian(x, p, bw, xmin=None, xmax=None)[source]
Cut and normalise boundary correction for a Gaussian kernel.
- Parameters:
- xarray-like
locations for normalisation correction
- parray-like
probability densities for normalisation correction
- bwfloat
bandwidth of KDE
- xmin, xmaxfloat
lower/upper prior bound optional, default None
- Returns:
- pnp.array
corrected probabilities
anesthetic.convert module
Tools for converting to other outputs.
- anesthetic.convert.to_getdist(samples)[source]
Convert from anesthetic to getdist samples.
- Parameters:
- samples
anesthetic.samples.Samples
anesthetic samples to be converted
- samples
- Returns:
- getdist_samples
getdist.mcsamples.MCSamples
getdist equivalent samples
- getdist_samples
anesthetic.kde module
Kernel density estimation tools.
These act as a wrapper around fastKDE, but could be replaced in future by alternative kernel density estimators
- anesthetic.kde.fastkde_1d(d, xmin=None, xmax=None)[source]
Perform a one-dimensional kernel density estimation.
Wrapper around fastkde.fastKDE. Boundary corrections implemented by reflecting boundary conditions.
- Parameters:
- dnp.array
Data to perform kde on
- xmin, xmaxfloat
lower/upper prior bounds optional, default None
- Returns:
- xnp.array
x-coordinates of kernel density estimates
- pnp.array
kernel density estimates
- anesthetic.kde.fastkde_2d(d_x, d_y, xmin=None, xmax=None, ymin=None, ymax=None)[source]
Perform a two-dimensional kernel density estimation.
Wrapper round fastkde.fastKDE. Boundary corrections implemented by reflecting boundary conditions.
- Parameters:
- d_x, d_ynp.array
x/y coordinates of data to perform kde on
- xmin, xmax, ymin, ymaxfloat
lower/upper prior bounds in x/y coordinates optional, default None
- Returns:
- x, ynp.array
x/y-coordinates of kernel density estimates. One-dimensional array
- pnp.array
kernel density estimates. Two-dimensional array
anesthetic.labelled_pandas module
Pandas DataFrame and Series with labelled columns.
- class anesthetic.labelled_pandas.LabelledDataFrame(*args, **kwargs)[source]
Bases:
_LabelledObject
,DataFrame
Labelled version of
pandas.DataFrame
.- property T
Transpose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property
T
is an accessor to the methodtranspose()
.- Parameters:
- *argstuple, optional
Accepted for compatibility with NumPy.
- copybool, default False
Whether to copy the data after transposing, even for DataFrames with a single dtype.
Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.
Note
The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.
You can already get the future behavior and improvements through enabling copy on write
pd.options.mode.copy_on_write = True
- Returns:
- DataFrame
The transposed DataFrame.
See also
numpy.transpose
Permute the dimensions of a given array.
Notes
Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.
Examples
Square DataFrame with homogeneous dtype
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]} >>> df1 = pd.DataFrame(data=d1) >>> df1 col1 col2 0 1 3 1 2 4
>>> df1_transposed = df1.T # or df1.transpose() >>> df1_transposed 0 1 col1 1 2 col2 3 4
When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:
>>> df1.dtypes col1 int64 col2 int64 dtype: object >>> df1_transposed.dtypes 0 int64 1 int64 dtype: object
Non-square DataFrame with mixed dtypes
>>> d2 = {'name': ['Alice', 'Bob'], ... 'score': [9.5, 8], ... 'employed': [False, True], ... 'kids': [0, 0]} >>> df2 = pd.DataFrame(data=d2) >>> df2 name score employed kids 0 Alice 9.5 False 0 1 Bob 8.0 True 0
>>> df2_transposed = df2.T # or df2.transpose() >>> df2_transposed 0 1 name Alice Bob score 9.5 8.0 employed False True kids 0 0
When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:
>>> df2.dtypes name object score float64 employed bool kids int64 dtype: object >>> df2_transposed.dtypes 0 object 1 object dtype: object
- transpose(copy=False)[source]
Transpose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property
T
is an accessor to the methodtranspose()
.- Parameters:
- *argstuple, optional
Accepted for compatibility with NumPy.
- copybool, default False
Whether to copy the data after transposing, even for DataFrames with a single dtype.
Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.
Note
The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.
You can already get the future behavior and improvements through enabling copy on write
pd.options.mode.copy_on_write = True
- Returns:
- DataFrame
The transposed DataFrame.
See also
numpy.transpose
Permute the dimensions of a given array.
Notes
Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.
Examples
Square DataFrame with homogeneous dtype
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]} >>> df1 = pd.DataFrame(data=d1) >>> df1 col1 col2 0 1 3 1 2 4
>>> df1_transposed = df1.T # or df1.transpose() >>> df1_transposed 0 1 col1 1 2 col2 3 4
When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:
>>> df1.dtypes col1 int64 col2 int64 dtype: object >>> df1_transposed.dtypes 0 int64 1 int64 dtype: object
Non-square DataFrame with mixed dtypes
>>> d2 = {'name': ['Alice', 'Bob'], ... 'score': [9.5, 8], ... 'employed': [False, True], ... 'kids': [0, 0]} >>> df2 = pd.DataFrame(data=d2) >>> df2 name score employed kids 0 Alice 9.5 False 0 1 Bob 8.0 True 0
>>> df2_transposed = df2.T # or df2.transpose() >>> df2_transposed 0 1 name Alice Bob score 9.5 8.0 employed False True kids 0 0
When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:
>>> df2.dtypes name object score float64 employed bool kids int64 dtype: object >>> df2_transposed.dtypes 0 object 1 object dtype: object
- class anesthetic.labelled_pandas.LabelledSeries(*args, **kwargs)[source]
Bases:
_LabelledObject
,Series
Labelled version of
pandas.Series
.
- class anesthetic.labelled_pandas._LabelledObject(*args, **kwargs)[source]
Bases:
object
Common methods for LabelledSeries and LabelledDataFrame.
- property at
- property loc
- anesthetic.labelled_pandas.ac(funcs, *args)[source]
Accessor function helper.
Given a list of callables funcs, and their arguments *args, evaluate each of these, catching exceptions, and then sort results by their dimensionality, smallest first. Return the non-exceptional result with the smallest dimensionality.
anesthetic.plot module
Lower-level plotting tools.
Routines that may be of use to users wishing for more fine-grained control may wish to use.
to create a set of axes and legend proxies.
- class anesthetic.plot.AxesDataFrame(data=None, index=None, columns=None, fig=None, lower=True, diagonal=True, upper=True, labels=None, ticks='inner', logx=None, logy=None, gridspec_kw=None, subplot_spec=None, *args, **kwargs)[source]
Bases:
DataFrame
Anesthetic’s axes version of
pandas.DataFrame
.- Parameters:
- indexlist(str)
Parameters to be placed on the y-axes.
- columnslist(str)
Parameters to be placed on the x-axes.
- fig
matplotlib.figure.Figure
- lower, diagonal, upperbool, default=True
Whether to create 2D marginalised plots above or below the diagonal, or to create a 1D marginalised plot on the diagonal.
- labelsdict(str:str), optional
Dictionary mapping params to plot labels. Default: params
- ticksstr, default=’inner’
If ‘outer’, plot ticks only on the very left and very bottom. If ‘inner’, plot ticks also in inner subplots. If None, plot no ticks at all.
- logx, logylist(str), optional
Lists of parameters to be plotted on a log scale on the x-axis or y-axis, respectively.
- gridspec_kwdict, optional
Dict with keywords passed to the
matplotlib.gridspec.GridSpec
constructor used to create the grid the subplots are placed on.- subplot_spec
matplotlib.gridspec.GridSpec
, default=None GridSpec instance to plot array as part of a subfigure.
Methods
axlines:
Add vertical and horizontal lines across all axes.
axspans:
Add vertical and horizontal spans across all axes.
scatter:
Add scatter points across all axes.
set_labels:
Set the labels for the axes.
set_margins:
Set margins across all axes.
tick_params:
Set tick parameters across all axes.
- axlines(params, lower=True, diagonal=True, upper=True, **kwargs)[source]
Add vertical and horizontal lines across all axes.
- Parameters:
- paramsdict(array_like)
Dictionary of parameter labels and desired values. Can provide more than one value per label.
- lower, diagonal, upperbool, default=True
Whether to plot the lines on the lower, diagonal, and/or upper triangle plots.
- kwargs
Any kwarg that can be passed to
matplotlib.axes.Axes.axvline()
ormatplotlib.axes.Axes.axhline()
.
- axspans(params, lower=True, diagonal=True, upper=True, **kwargs)[source]
Add vertical and horizontal spans across all axes.
- Parameters:
- paramsdict(array_like(2-tuple))
Dictionary of parameter labels and desired value tuples. Can provide more than one value tuple per label. Each value tuple provides the min and max value for an axis span.
- lower, diagonal, upperbool, default=True
Whether to plot the spans on the lower, diagonal, and/or upper triangle plots.
- kwargs
Any kwarg that can be passed to
matplotlib.axes.Axes.axvspan()
ormatplotlib.axes.Axes.axhspan()
.
- scatter(params, lower=True, upper=True, **kwargs)[source]
Add scatter points across all axes.
- Parameters:
- paramsdict(array_like)
Dictionary of parameter labels and desired values. Can provide more than one value per label, but length has to match for all parameter labels.
- lower, upperbool, default=True
Whether to plot the spans on the lower and/or upper triangle plots.
- kwargs
Any kwarg that can be passed to
matplotlib.axes.Axes.scatter()
.
- set_labels(labels, **kwargs)[source]
Set the labels for the axes.
- Parameters:
- labelsdict
Dictionary of the axes labels.
- kwargs
Any kwarg that can be passed to
matplotlib.axes.Axes.set_xlabel()
ormatplotlib.axes.Axes.set_ylabel()
.
- set_margins(m)[source]
Apply
matplotlib.axes.Axes.set_xmargin()
across all axes.
- tick_params(*args, **kwargs)[source]
Apply
matplotlib.axes.Axes.tick_params()
across all axes.
- class anesthetic.plot.AxesSeries(data=None, index=None, fig=None, ncol=None, labels=None, logx=None, gridspec_kw=None, subplot_spec=None, *args, **kwargs)[source]
Bases:
Series
Anesthetic’s axes version of
pandas.Series
.- Parameters:
- indexlist(str)
Parameters to be placed on the y-axes.
- fig
matplotlib.figure.Figure
- ncolint
Number of axes columns. Decides after how many axes the AxesSeries is split to continue in a new row.
- labelsdict(str:str), optional
Dictionary mapping params to plot labels. Default: params
- logxlist(str), optional
List of parameters to be plotted on a log scale.
- gridspec_kwdict, optional
Dict with keywords passed to the
matplotlib.gridspec.GridSpec
constructor used to create the grid the subplots are placed on.- subplot_spec
matplotlib.gridspec.GridSpec
, default=None GridSpec instance to plot array as part of a subfigure.
Methods
set_xlabels:
Set the labels for the x-axes.
tick_params:
Set tick parameters across all axes.
- static axes_series(index, fig, ncol=None, gridspec_kw=None, subplot_spec=None)[source]
Set up subplots for
AxesSeries
.
- set_xlabels(labels, **kwargs)[source]
Set the labels for the x-axes.
- Parameters:
- labelsdict
Dictionary of the axes labels.
- kwargs
Any kwarg that can be passed to
matplotlib.axes.Axes.set_xlabel()
.
- tick_params(*args, **kwargs)[source]
Apply
matplotlib.axes.Axes.tick_params()
across all axes.
- anesthetic.plot.fastkde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]
Plot a 2d marginalised distribution as contours.
This functions as a wrapper around
matplotlib.axes.Axes.contour()
, andmatplotlib.axes.Axes.contourf()
with a kernel density estimation (KDE) computation in-between. All remaining keyword arguments are passed onwards to both functions.- Parameters:
- ax
matplotlib.axes.Axes
Axis object to plot on.
- data_x, data_ynp.array
The x and y coordinates of uniformly weighted samples to generate kernel density estimator.
- levelslist
Amount of mass within each iso-probability contour. Has to be ordered from outermost to innermost contour. Default: [0.95, 0.68]
- xmin, xmax, ymin, ymaxfloat, default=None
The lower/upper prior bounds in x/y coordinates.
- ax
- Returns:
- c
matplotlib.contour.QuadContourSet
A set of contourlines or filled regions.
- c
- anesthetic.plot.fastkde_plot_1d(ax, data, *args, **kwargs)[source]
Plot a 1d marginalised distribution.
This functions as a wrapper around
matplotlib.axes.Axes.plot()
, with a kernel density estimation (KDE) computation provided by the package fastkde in-between. All remaining keyword arguments are passed onwards.- Parameters:
- ax
matplotlib.axes.Axes
Axis object to plot on.
- datanp.array
Uniformly weighted samples to generate kernel density estimator.
- xmin, xmaxfloat, default=None
lower/upper prior bound
- levelslist
Values at which to draw iso-probability lines. Optional, Default: [0.95, 0.68]
- qint or float or tuple, default=5
Quantile to determine the data range to be plotted.
0
: full data range, i.e.q=0
–> quantile range (0, 1)int
: q-sigma range, e.g.q=1
–> quantile range (0.16, 0.84)float
: percentile, e.g.q=0.8
–> quantile range (0.1, 0.9)tuple
: quantile range, e.g. (0.16, 0.84)
- facecolorbool or string, default=False
If set to True then the 1d plot will be shaded with the value of the
color
kwarg. Set to a string such as ‘blue’, ‘k’, ‘r’, ‘C1’ ect. to define the color of the shading directly.
- ax
- Returns:
- lines
matplotlib.lines.Line2D
A list of line objects representing the plotted data (same as
matplotlib.axes.Axes.plot()
command).
- lines
- anesthetic.plot.hist_plot_1d(ax, data, *args, **kwargs)[source]
Plot a 1d histogram.
This functions is a wrapper around
matplotlib.axes.Axes.hist()
. All remaining keyword arguments are passed onwards.- Parameters:
- ax
matplotlib.axes.Axes
Axis object to plot on.
- datanp.array
Samples to generate histogram from
- weightsnp.array, optional
Sample weights.
- qint or float or tuple, default=5
Quantile to determine the data range to be plotted.
0
: full data range, i.e.q=0
–> quantile range (0, 1)int
: q-sigma range, e.g.q=1
–> quantile range (0.16, 0.84)float
: percentile, e.g.q=0.8
–> quantile range (0.1, 0.9)tuple
: quantile range, e.g. (0.16, 0.84)
- ax
- Returns:
- patcheslist or list of lists
Silent list of individual patches used to create the histogram or list of such list if multiple input datasets.
- Other Parameters:
- **kwargs
matplotlib.axes.Axes.hist()
properties
- **kwargs
- anesthetic.plot.hist_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]
Plot a 2d marginalised distribution as a histogram.
This functions as a wrapper around
matplotlib.axes.Axes.hist2d()
.- Parameters:
- ax
matplotlib.axes.Axes
Axis object to plot on.
- data_x, data_ynp.array
The x and y coordinates of uniformly weighted samples to generate a two-dimensional histogram.
- levelslist, default=None
Shade iso-probability contours containing these levels of probability mass. If None defaults to usual
matplotlib.axes.Axes.hist2d()
colouring.- qint or float or tuple, default=5
Quantile to determine the data range to be plotted.
0
: full data range, i.e.q=0
–> quantile range (0, 1)int
: q-sigma range, e.g.q=1
–> quantile range (0.16, 0.84)float
: percentile, e.g.q=0.8
–> quantile range (0.1, 0.9)tuple
: quantile range, e.g. (0.16, 0.84)
- ax
- Returns:
- c
matplotlib.collections.QuadMesh
A set of colors.
- c
- anesthetic.plot.kde_contour_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]
Plot a 2d marginalised distribution as contours.
This functions as a wrapper around
matplotlib.axes.Axes.contour()
andmatplotlib.axes.Axes.contourf()
with a kernel density estimation (KDE) computation provided byscipy.stats.gaussian_kde
in-between. All remaining keyword arguments are passed onwards to both functions.- Parameters:
- ax
matplotlib.axes.Axes
Axis object to plot on.
- data_x, data_ynp.array
The x and y coordinates of uniformly weighted samples to generate kernel density estimator.
- weightsnp.array, optional
Sample weights.
- levelslist, optional
Amount of mass within each iso-probability contour. Has to be ordered from outermost to innermost contour. Default: [0.95, 0.68]
- ncompressint, str, default=’equal’
Degree of compression.
If
int
: desired number of samples after compression.If
False
: no compression.If
True
: compresses to the channel capacity, equivalent toncompress='entropy'
.If
str
: determine number from the Huggins-Roy family of effective samples inanesthetic.utils.neff()
withbeta=ncompress
.
- nplot_2dint, default=1000
Number of plotting points to use.
- bw_methodstr, scalar or callable, optional
Forwarded to
scipy.stats.gaussian_kde
.
- ax
- Returns:
- c
matplotlib.contour.QuadContourSet
A set of contourlines or filled regions.
- c
- anesthetic.plot.kde_plot_1d(ax, data, *args, **kwargs)[source]
Plot a 1d marginalised distribution.
This functions as a wrapper around
matplotlib.axes.Axes.plot()
, with a kernel density estimation computation provided byscipy.stats.gaussian_kde
in-between. All remaining keyword arguments are passed onwards.- Parameters:
- ax
matplotlib.axes.Axes
Axis object to plot on.
- datanp.array
Samples to generate kernel density estimator.
- weightsnp.array, optional
Sample weights.
- ncompressint, str, default=False
Degree of compression.
If
False
: no compression.If
True
: compresses to the channel capacity, equivalent toncompress='entropy'
.If
int
: desired number of samples after compression.If
str
: determine number from the Huggins-Roy family of effective samples inanesthetic.utils.neff()
withbeta=ncompress
.
- nplot_1dint, default=100
Number of plotting points to use.
- levelslist
Values at which to draw iso-probability lines. Default: [0.95, 0.68]
- qint or float or tuple, default=5
Quantile to determine the data range to be plotted.
0
: full data range, i.e.q=0
–> quantile range (0, 1)int
: q-sigma range, e.g.q=1
–> quantile range (0.16, 0.84)float
: percentile, e.g.q=0.8
–> quantile range (0.1, 0.9)tuple
: quantile range, e.g. (0.16, 0.84)
- facecolorbool or string, default=False
If set to True then the 1d plot will be shaded with the value of the
color
kwarg. Set to a string such as ‘blue’, ‘k’, ‘r’, ‘C1’ ect. to define the color of the shading directly.- bw_methodstr, scalar or callable, optional
Forwarded to
scipy.stats.gaussian_kde
.- betaint, float, default = 1
The value of beta used to calculate the number of effective samples
- ax
- Returns:
- lines
matplotlib.lines.Line2D
A list of line objects representing the plotted data (same as
matplotlib.axes.Axes.plot()
command).
- lines
- anesthetic.plot.make_1d_axes(params, ncol=None, labels=None, logx=None, gridspec_kw=None, subplot_spec=None, **fig_kw)[source]
Create a set of axes for plotting 1D marginalised posteriors.
- Parameters:
- paramslist(str)
names of parameters.
- ncolint
Number of columns of the subplot grid. Default: ceil(sqrt(num_params))
- labelsdict(str:str), optional
Dictionary mapping params to plot labels. Default: params
- logxlist(str), optional
List of parameters to be plotted on a log scale.
- gridspec_kwdict, optional
Dict with keywords passed to the
matplotlib.gridspec.GridSpec
constructor used to create the grid the subplots are placed on.- subplot_spec
matplotlib.gridspec.GridSpec
, default=None GridSpec instance to plot array as part of a subfigure.
- **fig_kw
All additional keyword arguments are passed to the
matplotlib.pyplot.figure()
call. Or directly pass the figure to plot on via the keyword ‘fig’.
- Returns:
- fig
matplotlib.figure.Figure
New or original (if supplied) figure object.
- axes
anesthetic.plot.AxesSeries
Pandas array of axes objects.
- fig
- anesthetic.plot.make_2d_axes(params, labels=None, lower=True, diagonal=True, upper=True, ticks='inner', logx=None, logy=None, gridspec_kw=None, subplot_spec=None, **fig_kw)[source]
Create a set of axes for plotting 2D marginalised posteriors.
- Parameters:
- paramslists of parameters
Can be either:
list(str)
if the x and y axes are the same[list(str), list(str)]
if the x and y axes are different
Strings indicate the names of the parameters.
- labelsdict(str:str), optional
Dictionary mapping params to plot labels. Default: params
- lower, diagonal, upperlogical, default=True
Whether to create 2D marginalised plots above or below the diagonal, or to create a 1D marginalised plot on the diagonal.
- ticksstr, default=’inner’
Can be one of ‘outer’, ‘inner’, or None.
'outer'
: plot ticks only on the very left and very bottom.'inner'
: plot ticks also in inner subplots.None
: plot no ticks at all.
- logx, logylist(str), optional
Lists of parameters to be plotted on a log scale on the x-axis or y-axis, respectively.
- gridspec_kwdict, optional
Dict with keywords passed to the
matplotlib.gridspec.GridSpec
constructor used to create the grid the subplots are placed on.- subplot_spec
matplotlib.gridspec.GridSpec
, default=None GridSpec instance to plot array as part of a subfigure.
- **fig_kw
All additional keyword arguments are passed to the
matplotlib.pyplot.figure()
call. Or directly pass the figure to plot on via the keyword ‘fig’.
- Returns:
- fig
matplotlib.figure.Figure
New or original (if supplied) figure object.
- axes
anesthetic.plot.AxesDataFrame
Pandas array of axes objects.
- fig
- anesthetic.plot.normalize_kwargs(kwargs, alias_mapping=None, drop=None)[source]
Normalize kwarg inputs.
Works the same way as
matplotlib.cbook.normalize_kwargs()
, but additionally allows to drop kwargs.
- anesthetic.plot.quantile_plot_interval(q)[source]
Interpret quantile
q
input to quantile plot range tuple.
- anesthetic.plot.scatter_plot_2d(ax, data_x, data_y, *args, **kwargs)[source]
Plot samples from a 2d marginalised distribution.
This functions as a wrapper around
matplotlib.axes.Axes.plot()
, enforcing any prior bounds. All remaining keyword arguments are passed onwards.- Parameters:
- ax
matplotlib.axes.Axes
axis object to plot on
- data_x, data_ynp.array
x and y coordinates of uniformly weighted samples to plot.
- ncompressint, str, default=’equal’
Degree of compression.
If
int
: desired number of samples after compression.If
False
: no compression.If
True
: compresses to the channel capacity, equivalent toncompress='entropy'
.If
str
: determine number from the Huggins-Roy family of effective samples inanesthetic.utils.neff()
withbeta=ncompress
.
- ax
- Returns:
- lines
matplotlib.lines.Line2D
A list of line objects representing the plotted data (same as
matplotlib.axes.Axes.plot()
command).
- lines
anesthetic.samples module
Main classes for the anesthetic module.
- class anesthetic.samples.MCMCSamples(*args, **kwargs)[source]
Storage and plotting tools for MCMC samples.
Any new functionality specific to MCMC (e.g. convergence criteria etc.) should be put here.
- Parameters:
- datanp.array
Coordinates of samples. shape = (nsamples, ndims).
- columnsarray-like
reference names of parameters
- weightsnp.array
weights of samples.
- logLnp.array
loglikelihoods of samples.
- labelsdict or array-like
mapping from columns to plotting labels
- labelstr
Legend label
- logzerofloat, default=-1e30
The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf.
- Gelman_Rubin(params=None, per_param=False)[source]
Gelman–Rubin convergence statistic of multiple MCMC chains.
Determine the Gelman–Rubin convergence statistic
R-1
by computing and comparing the within-chain variance and the between-chain variance. This follows the routine as outlined in Lewis (2013), section IV.A.Note that this requires more than one chain. To circumvent this, you could overwrite the
'chain'
column, splitting the samples into two or more sets.- Parameters:
- paramslist(str)
List of column names (i.e. parameters) to be included in the convergence calculation. Default: all parameters (except those parameters that contain ‘prior’, ‘chi2’, or ‘logL’ in their names)
- per_parambool or str, default=False
Whether to return the per-parameter convergence statistic
R-1
.If
False
: returns only the total convergence statistic.If
True
: returns the total convergence statistic and the per-parameter convergence statistic.If
'par'
: returns only the per-parameter convergence statistic.If
'cov'
: returns only the per-parameter covariant convergence statistic.If
'all'
: returns the total convergence statistic and the per-parameter covariant convergence statistic.
- Returns:
- Rminus1float
Total Gelman–Rubin convergence statistic
R-1
. The smaller, the better converged. Aiming forRminus1~0.01
should normally work well.- Rminus1_par
pandas.DataFrame
Per-parameter Gelman–Rubin convergence statistic.
- Rminus1_cov
pandas.DataFrame
Per-parameter covariant Gelman–Rubin convergence statistic.
- remove_burn_in(burn_in, reset_index=False, inplace=False)[source]
Remove burn-in samples from each MCMC chain.
- Parameters:
- burn_inint or float or array_like
Fraction or number of samples to remove or keep:
if 0 < burn_in < 1
: remove first fraction of sampleselif 1 < burn_in
: remove first number of sampleselif -1 < burn_in < 0
: keep last fraction of sampleselif burn_in < -1
: keep last number of sampleselif type(burn_in)==list
: different burn-in for each chain
- reset_indexbool, default=False
Whether to reset the index counter to start at zero or not.
- inplacebool, default=False
Indicates whether to modify the existing array or return a copy.
- class anesthetic.samples.NestedSamples(*args, **kwargs)[source]
Storage and plotting tools for Nested Sampling samples.
We extend the
Samples
class with the additional methods:self.live_points(logL)
self.set_beta(beta)
self.prior()
self.posterior_points(beta)
self.prior_points()
self.stats()
self.logZ()
self.D_KL()
self.d()
self.recompute()
self.gui()
self.importance_sample()
- Parameters:
- datanp.array
Coordinates of samples. shape = (nsamples, ndims).
- columnslist(str)
reference names of parameters
- logLnp.array
loglikelihoods of samples.
- logL_birthnp.array or int
birth loglikelihoods, or number of live points.
- labelsdict
optional mapping from column names to plot labels
- labelstr
Legend label default: basename of root
- betafloat
thermodynamic inverse temperature default: 1.
- logzerofloat
The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf. default: -1e30
- D_KL(nsamples=None, beta=None)[source]
Kullback–Leibler divergence.
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw
- betafloat, array-like, optional
inverse temperature(s) beta=1/kT. Default self.beta
- Returns:
- if nsamples is array-like:
pandas.Series
, index nsamples.columns- elif beta is scalar and nsamples is None:
float
- elif beta is array-like and nsamples is None:
pandas.Series
, index beta- elif beta is scalar and nsamples is int:
pandas.Series
, index range(nsamples)- elif beta is array-like and nsamples is int:
pandas.Series
,pandas.MultiIndex
columns the product of beta and range(nsamples)
- property beta
Thermodynamic inverse temperature.
- contour(logL=None)[source]
Convert contour from (index or None) to a float loglikelihood.
Convention is that live points are inclusive of the contour.
- Helper function for:
NestedSamples.live_points,
NestedSamples.dead_points,
NestedSamples.truncate.
- Parameters:
- logLfloat or int, optional
Loglikelihood or iteration number If not provided, return the contour containing the last set of live points.
- Returns:
- logLfloat
Loglikelihood of contour
- d_G(nsamples=None, beta=None)[source]
Bayesian model dimensionality.
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw
- betafloat, array-like, optional
inverse temperature(s) beta=1/kT. Default self.beta
- Returns:
- if nsamples is array-like:
pandas.Series
, index nsamples.columns- elif beta is scalar and nsamples is None:
float
- elif beta is array-like and nsamples is None:
pandas.Series
, index beta- elif beta is scalar and nsamples is int:
pandas.Series
, index range(nsamples)- elif beta is array-like and nsamples is int:
pandas.Series
,pandas.MultiIndex
columns the product of beta and range(nsamples)
- dead_points(logL=None)[source]
Get the dead points at a given contour.
Convention is that dead points are exclusive of the contour.
- Parameters:
- logLfloat or int, optional
Loglikelihood or iteration number to return dead points. If not provided, return the last set of dead points.
- Returns:
- dead_pointsSamples
- Dead points at either:
contour logL (if input is float)
ith iteration (if input is integer)
last set of dead points if no argument provided
- importance_sample(logL_new, action='add', inplace=False)[source]
Perform importance re-weighting on the log-likelihood.
- Parameters:
- logL_newnp.array
New log-likelihood values. Should have the same shape as logL.
- actionstr, default=’add’
Can be any of {‘add’, ‘replace’, ‘mask’}.
add: Add the new logL_new to the current logL.
replace: Replace the current logL with the new logL_new.
mask: treat logL_new as a boolean mask and only keep the corresponding (True) samples.
- inplacebool, optional
Indicates whether to modify the existing array, or return a new frame with importance sampling applied. default: False
- Returns:
- samples
NestedSamples
Importance re-weighted samples.
- samples
- live_points(logL=None)[source]
Get the live points within a contour.
- Parameters:
- logLfloat or int, optional
Loglikelihood or iteration number to return live points. If not provided, return the last set of active live points.
- Returns:
- live_pointsSamples
- Live points at either:
contour logL (if input is float)
ith iteration (if input is integer)
last set of live points if no argument provided
- logL_P(nsamples=None, beta=None)[source]
Posterior averaged loglikelihood.
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw
- betafloat, array-like, optional
inverse temperature(s) beta=1/kT. Default self.beta
- Returns:
- if nsamples is array-like:
pandas.Series
, index nsamples.columns- elif beta is scalar and nsamples is None:
float
- elif beta is array-like and nsamples is None:
pandas.Series
, index beta- elif beta is scalar and nsamples is int:
pandas.Series
, index range(nsamples)- elif beta is array-like and nsamples is int:
pandas.Series
,pandas.MultiIndex
columns the product of beta and range(nsamples)
- logX(nsamples=None)[source]
Log-Volume.
The log of the prior volume contained within each iso-likelihood contour.
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
- Returns:
- if nsamples is None:
WeightedSeries like self
- elif nsamples is int:
WeightedDataFrame like self, columns range(nsamples)
- logZ(nsamples=None, beta=None)[source]
Log-Evidence.
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw
- betafloat, array-like, optional
inverse temperature(s) beta=1/kT. Default self.beta
- Returns:
- if nsamples is array-like:
pandas.Series
, index nsamples.columns- elif beta is scalar and nsamples is None:
float
- elif beta is array-like and nsamples is None:
pandas.Series
, index beta- elif beta is scalar and nsamples is int:
pandas.Series
, index range(nsamples)- elif beta is array-like and nsamples is int:
pandas.Series
,pandas.MultiIndex
columns the product of beta and range(nsamples)
- logdX(nsamples=None)[source]
Compute volume of shell of loglikelihood.
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
- Returns:
- if nsamples is None:
WeightedSeries like self
- elif nsamples is int:
WeightedDataFrame like self, columns range(nsamples)
- logw(nsamples=None, beta=None)[source]
Log-nested sampling weight.
The logarithm of the (unnormalised) sampling weight log(L**beta*dX).
- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
If nsamples is array, nsamples is assumed to be logw and returned (implementation convenience functionality)
- betafloat, array-like, optional
inverse temperature(s) beta=1/kT. Default self.beta
- Returns:
- if nsamples is array-like:
WeightedDataFrame equal to nsamples
- elif beta is scalar and nsamples is None:
WeightedSeries like self
- elif beta is array-like and nsamples is None:
WeightedDataFrame like self, columns of beta
- elif beta is scalar and nsamples is int:
WeightedDataFrame like self, columns of range(nsamples)
- elif beta is array-like and nsamples is int:
WeightedDataFrame like self, MultiIndex columns the product of beta and range(nsamples)
- recompute(logL_birth=None, inplace=False)[source]
Re-calculate the nested sampling contours and live points.
- Parameters:
- logL_birtharray-like or int, optional
array-like: the birth contours.
int: the number of live points.
default: use the existing birth contours to compute nlive
- inplacebool, default=False
Indicates whether to modify the existing array, or return a new frame with contours resorted and nlive recomputed
- set_beta(beta, inplace=False)[source]
Change the inverse temperature.
- Parameters:
- betafloat
Inverse temperature to set. (
beta=0
corresponds to the prior distribution.)- inplacebool, default=False
Indicates whether to modify the existing array, or return a copy with the inverse temperature changed.
- stats(nsamples=None, beta=None)[source]
Compute Nested Sampling statistics.
Using nested sampling we can compute:
logZ
: Bayesian evidence\[\log Z = \int L \pi d\theta\]D_KL
: Kullback–Leibler divergence\[D_{KL} = \int P \log(P / \pi) d\theta\]logL_P
: posterior averaged log-likelihood\[\langle\log L\rangle_P = \int P \log L d\theta\]d_G
: Gaussian model dimensionality (or posterior variance of the log-likelihood)\[d_G/2 = \langle(\log L)^2\rangle_P - \langle\log L\rangle_P^2\]see Handley and Lemos (2019) for more details on model dimensionalities.
(Note that all of these are available as individual functions with the same signature.)
In addition to point estimates nested sampling provides an error bar or more generally samples from a (correlated) distribution over the variables. Samples from this distribution can be computed by providing an integer nsamples.
Nested sampling as an athermal algorithm is also capable of producing these as a function of inverse thermodynamic temperature beta. This is provided as a vectorised function. If nsamples is also provided a MultiIndex dataframe is generated.
These obey Occam’s razor equation:
\[\log Z = \langle\log L\rangle_P - D_{KL},\]which splits a model’s quality
logZ
into a goodness-of-fitlogL_P
and a complexity penaltyD_KL
. See Hergt et al. (2021) for more detail.- Parameters:
- nsamplesint, optional
If nsamples is not supplied, calculate mean value
If nsamples is integer, draw nsamples from the distribution of values inferred by nested sampling
- betafloat, array-like, optional
inverse temperature(s) beta=1/kT. Default self.beta
- Returns:
- if beta is scalar and nsamples is None:
Series, index [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]
- elif beta is scalar and nsamples is int:
Samples
, index range(nsamples), columns [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]- elif beta is array-like and nsamples is None:
Samples
, index beta, columns [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]- elif beta is array-like and nsamples is int:
Samples
, indexpandas.MultiIndex
the product of beta and range(nsamples) columns [‘logZ’, ‘d_G’, ‘DK_L’, ‘logL_P’]
- truncate(logL=None)[source]
Truncate the run at a given contour.
Returns the union of the live_points and dead_points.
- Parameters:
- logLfloat or int, optional
Loglikelihood or iteration number to truncate run. If not provided, truncate at the last set of dead points.
- Returns:
- truncated_runNestedSamples
- Run truncated at either:
contour logL (if input is float)
ith iteration (if input is integer)
last set of dead points if no argument provided
- class anesthetic.samples.Samples(*args, **kwargs)[source]
Storage and plotting tools for general samples.
Extends the
pandas.DataFrame
by providing plotting methods and standardising sample storage.- Example plotting commands include
samples.plot_1d(['paramA', 'paramB'])
samples.plot_2d(['paramA', 'paramB'])
samples.plot_2d([['paramA', 'paramB'], ['paramC', 'paramD']])
- Parameters:
- datanp.array
Coordinates of samples. shape = (nsamples, ndims).
- columnslist(str)
reference names of parameters
- weightsnp.array
weights of samples.
- logLnp.array
loglikelihoods of samples.
- labelsdict or array-like
mapping from columns to plotting labels
- labelstr
Legend label
- logzerofloat, default=-1e30
The threshold for log(0) values assigned to rejected sample points. Anything equal or below this value is set to -np.inf.
- importance_sample(logL_new, action='add', inplace=False)[source]
Perform importance re-weighting on the log-likelihood.
- Parameters:
- logL_newnp.array
New log-likelihood values. Should have the same shape as logL.
- actionstr, default=’add’
Can be any of {‘add’, ‘replace’, ‘mask’}.
add: Add the new logL_new to the current logL.
replace: Replace the current logL with the new logL_new.
mask: treat logL_new as a boolean mask and only keep the corresponding (True) samples.
- inplacebool, default=False
Indicates whether to modify the existing array, or return a new frame with importance sampling applied.
- Returns:
- samples
Samples
/MCMCSamples
/NestedSamples
Importance re-weighted samples.
- samples
- plot_1d(axes=None, *args, **kwargs)[source]
Create an array of 1D plots.
- Parameters:
- axesplotting axes, optional
Can be:
list(str) or str
If a
pandas.Series
is provided as an existing set of axes, then this is used for creating the plot. Otherwise, a new set of axes are created using the list or lists of strings.If not provided, then all parameters are plotted. This is intended for plotting a sliced array (e.g. samples[[‘x0’,’x1]].plot_1d().
- kindstr, default=’kde_1d’
What kind of plots to produce. Alongside the usual pandas options {‘hist’, ‘box’, ‘kde’, ‘density’}, anesthetic also provides
‘hist_1d’:
anesthetic.plot.hist_plot_1d()
‘kde_1d’:
anesthetic.plot.kde_plot_1d()
‘fastkde_1d’:
anesthetic.plot.fastkde_plot_1d()
Warning – while the other pandas plotting options {‘line’, ‘bar’, ‘barh’, ‘area’, ‘pie’} are also accessible, these can be hard to interpret/expensive for
Samples
,MCMCSamples
, orNestedSamples
.- logxlist(str), optional
Which parameters/columns to plot on a log scale. Needs to match if plotting on top of a pre-existing axes.
- labelstr, optional
Legend label added to each axis.
- Returns:
- axes
pandas.Series
ofmatplotlib.axes.Axes
Pandas array of axes objects
- axes
- plot_2d(axes=None, *args, **kwargs)[source]
Create an array of 2D plots.
To avoid interfering with y-axis sharing, one-dimensional plots are created on a separate axis, which is monkey-patched onto the argument ax as the attribute ax.twin.
- Parameters:
- axesplotting axes, optional
- Can be:
list(str) if the x and y axes are the same
[list(str),list(str)] if the x and y axes are different
If a
pandas.DataFrame
is provided as an existing set of axes, then this is used for creating the plot. Otherwise, a new set of axes are created using the list or lists of strings.If not provided, then all parameters are plotted. This is intended for plotting a sliced array (e.g. samples[[‘x0’,’x1]].plot_2d(). It is not advisible to plot an entire frame, as it is computationally expensive, and liable to run into linear algebra errors for degenerate derived parameters.
- kind/kindsdict, optional
What kinds of plots to produce. Dictionary takes the keys ‘diagonal’ for the 1D plots and ‘lower’ and ‘upper’ for the 2D plots. The options for ‘diagonal’ are:
‘kde_1d’:
anesthetic.plot.kde_plot_1d()
‘hist_1d’:
anesthetic.plot.hist_plot_1d()
‘fastkde_1d’:
anesthetic.plot.fastkde_plot_1d()
‘kde’:
pandas.Series.plot.kde()
‘hist’:
pandas.Series.plot.hist()
‘box’:
pandas.Series.plot.box()
‘density’:
pandas.Series.plot.density()
The options for ‘lower’ and ‘upper’ are:
‘kde_2d’:
anesthetic.plot.kde_contour_plot_2d()
‘hist_2d’:
anesthetic.plot.hist_plot_2d()
‘scatter_2d’:
anesthetic.plot.scatter_plot_2d()
‘fastkde_2d’:
anesthetic.plot.fastkde_contour_plot_2d()
‘kde’:
pandas.DataFrame.plot.kde()
‘scatter’:
pandas.DataFrame.plot.scatter()
‘hexbin’:
pandas.DataFrame.plot.hexbin()
There are also a set of shortcuts provided in
plot_2d_default_kinds
:‘kde_1d’: 1d kde plots down the diagonal
‘kde_2d’: 2d kde plots in lower triangle
‘kde’: 1d & 2d kde plots in lower & diagonal
‘hist_1d’: 1d histograms down the diagonal
‘hist_2d’: 2d histograms in lower triangle
‘hist’: 1d & 2d histograms in lower & diagonal
‘scatter_2d’: 2d scatter in lower triangle
- ‘scatter’: 1d histograms down diagonal
& 2d scatter in lower triangle
Feel free to add your own to this list! Default: {‘diagonal’: ‘kde_1d’, ‘lower’: ‘kde_2d’, ‘upper’:’scatter_2d’}
- diagonal_kwargs, lower_kwargs, upper_kwargsdict, optional
kwargs for the diagonal (1D)/lower or upper (2D) plots. This is useful when there is a conflict of kwargs for different kinds of plots. Note that any kwargs directly passed to plot_2d will overwrite any kwarg with the same key passed to <sub>_kwargs. Default: {}
- logx, logylist(str), optional
Which parameters/columns to plot on a log scale for the x-axis and y-axis, respectively. Needs to match if plotting on top of a pre-existing axes.
- labelstr, optional
Legend label added to each axis.
- Returns:
- axes
pandas.DataFrame
ofmatplotlib.axes.Axes
Pandas array of axes objects
- axes
- plot_2d_default_kinds = {'default': {'diagonal': 'kde_1d', 'lower': 'kde_2d', 'upper': 'scatter_2d'}, 'fastkde': {'diagonal': 'fastkde_1d', 'lower': 'fastkde_2d'}, 'hist': {'diagonal': 'hist_1d', 'lower': 'hist_2d'}, 'hist_1d': {'diagonal': 'hist_1d'}, 'hist_2d': {'lower': 'hist_2d'}, 'kde': {'diagonal': 'kde_1d', 'lower': 'kde_2d'}, 'kde_1d': {'diagonal': 'kde_1d'}, 'kde_2d': {'lower': 'kde_2d'}, 'scatter': {'diagonal': 'hist_1d', 'lower': 'scatter_2d'}, 'scatter_2d': {'lower': 'scatter_2d'}}
- property tex
- to_hdf(path_or_buf, key, *args, **kwargs)[source]
Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.
In order to add another pandas.DataFrame or Series to an existing HDF file please use append mode and a different a key.
Warning
One can store a subclass of
pandas.DataFrame
orSeries
to HDF5, but the type of the subclass is lost upon storing.For more information see the user guide.
- Parameters:
- path_or_bufstr or pandas.HDFStore
File path or HDFStore object.
- keystr
Identifier for the group in the store.
- mode{‘a’, ‘w’, ‘r+’}, default ‘a’
Mode to open file:
‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.
- complevel{0-9}, default None
Specifies a compression level for data. A value of 0 or None disables compression.
- complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
- appendbool, default False
For Table formats, append the input data to the existing.
- format{‘fixed’, ‘table’, None}, default ‘fixed’
Possible values:
‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.
- indexbool, default True
Write pandas.DataFrame index as a column.
- min_itemsizedict or int, optional
Map column names to minimum string sizes for columns.
- nan_repAny, optional
How to represent null values as str. Not allowed with append=True.
- dropnabool, default False, optional
Remove missing values.
- data_columnslist of columns or True, optional
List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for open for a full list of options.
- encodingstr, default “UTF-8”
See also
pandas.read_hdf
Read from HDF file.
pandas.DataFrame.to_orc
Write a pandas.DataFrame to the binary orc format.
pandas.DataFrame.to_parquet
Write a pandas.DataFrame to the binary parquet format.
pandas.DataFrame.to_sql
Write to a SQL table.
pandas.DataFrame.to_feather
Write out feather-format for pandas.DataFrames.
pandas.DataFrame.to_csv
Write out to a csv file.
Examples
>>> df = pandas.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, ... index=['a', 'b', 'c']) >>> df.to_hdf('data.h5', key='df', mode='w')
We can add another object to the same file:
>>> s = pd.Series([1, 2, 3, 4]) >>> s.to_hdf('data.h5', key='s')
Reading from HDF file:
>>> pandas.read_hdf('data.h5', 'df') A B a 1 4 b 2 5 c 3 6 >>> pandas.read_hdf('data.h5', 's') 0 1 1 2 2 3 3 4 dtype: int64
- anesthetic.samples.merge_nested_samples(runs)[source]
Merge one or more nested sampling runs.
- Parameters:
- runslist(
NestedSamples
) List or array-like of one or more nested sampling runs. If only a single run is provided, this recalculates the live points and as such can be used for masked runs.
- runslist(
- Returns:
- samples
NestedSamples
Merged run.
- samples
- anesthetic.samples.merge_samples_weighted(samples, weights=None, label=None)[source]
Merge sets of samples with weights.
Combine two (or more) samples so the new PDF is P(x|new) = weight_A P(x|A) + weight_B P(x|B). The number of samples and internal weights do not affect the result.
- Parameters:
- sampleslist(
NestedSamples
) or list(MCMCSamples
) List or array-like of one or more MCMC or nested sampling runs.
- weightslist(double) or None
Weight for each run in samples (normalized internally). Can be omitted if samples are
NestedSamples
, then exp(logZ) is used as weight.- labelstr or None, default=None
Label for the new samples.
- sampleslist(
- Returns:
- new_samples
Samples
Merged (weighted) run.
- new_samples
anesthetic.scripts module
Command-line scripts for anesthetic.
- anesthetic.scripts.gui(args=None)[source]
Launch the anesthetic GUI.
See
anesthetic.gui.plot.RunPlotter
for details.
anesthetic.testing module
Anesthetic testing utilities.
anesthetic.utils module
Data-processing utility functions.
- anesthetic.utils.adjust_docstrings(obj, pattern, repl, *args, **kwargs)[source]
Adjust the docstrings of a class using regular expressions.
After the first argument, the remaining arguments are identical to re.sub.
- Parameters:
- clsclass
class to adjust
- patternstr
regular expression pattern
- replstr
replacement string
- anesthetic.utils.compress_weights(w, u=None, ncompress=True)[source]
Compresses weights to their approximate channel capacity.
- anesthetic.utils.compute_insertion_indexes(death, birth)[source]
Compute the live point insertion index for each point.
For more detail, see Fowlie et al. (2020)
- Parameters:
- death, birtharray-like
list of birth and death contours
- Returns:
- indexesnp.array
live point index at which each live point was inserted
- anesthetic.utils.compute_nlive(death, birth)[source]
Compute number of live points from birth and death contours.
- Parameters:
- death, birtharray-like
list of birth and death contours
- Returns:
- nlivenp.array
number of live points at each contour
- anesthetic.utils.histogram(a, **kwargs)[source]
Produce a histogram for path-based plotting.
This is a cheap histogram. Necessary if one wants to update the histogram dynamically, and redrawing and filling is very expensive.
This has the same arguments and keywords as
numpy.histogram()
, but is normalised to 1.
- anesthetic.utils.histogram_bin_edges(samples, weights, bins='fd', range=None, beta='equal')[source]
Compute a good number of bins dynamically from weighted samples.
- Parameters:
- samplesarray_like
Input data.
- weightsarray-like
Array of sample weights.
- binsstr, default=’fd’
String defining the rule used to automatically compute a good number of bins for the weighted samples:
‘fd’ : Freedman–Diaconis rule (modified for weighted data)
‘scott’ : Scott’s rule (modified for weighted data)
‘sqrt’ : Square root estimator (modified for weighted data)
- range(float, float), optional
The lower and upper range of the bins. If not provided, range is simply
(a.min(), a.max())
. Values outside the range are ignored. The first element of the range must be less than or equal to the second.- betafloat, default=’equal’
The value of beta>0 used to calculate the number of effective samples via
neff()
.
- Returns:
- bin_edgesarray of dtype float
The edges to pass to
numpy.histogram()
.
- anesthetic.utils.insertion_p_value(indexes, nlive, batch=0)[source]
Compute the p-value from insertion indexes, assuming constant nlive.
Note that this function doesn’t use
scipy.stats.kstest()
as the latter assumes continuous distributions.For more detail, see Fowlie et al. (2020)
For a rolling test, you should provide the optional parameter
batch!=0
. In this case the test computes the p-value on consecutive batches of sizenlive * batch
, selects the smallest one and adjusts for multiple comparisons using a Bonferroni correction.- Parameters:
- indexesarray-like
list of insertion indexes, sorted by death contour
- nliveint
number of live points
- batchfloat
batch size in units of nlive for a rolling p-value
- Returns:
- ks_resultdict
Kolmogorov-Smirnov test results:
D
: Kolmogorov-Smirnov statisticsample_size
: sample sizep-value
: p-value
if
batch != 0
:iterations
: bounds of batch with minimum p-valuenbatches
: the number of batches in totaluncorrected p-value
: p-value without Bonferroni correction
- anesthetic.utils.iso_probability_contours(pdf, contours=[0.95, 0.68])[source]
Compute the iso-probability contour values.
- anesthetic.utils.iso_probability_contours_from_samples(pdf, contours=[0.95, 0.68], weights=None)[source]
Compute the iso-probability contour values.
- anesthetic.utils.logsumexp(a, axis=None, b=None, keepdims=False, return_sign=False)[source]
Compute the log of the sum of exponentials of input elements.
This function has the same call signature as
scipy.special.logsumexp()
and mirrors scipy’s behaviour except for-np.inf
input. If a and b are both-inf
then scipy’s function will outputnan
whereas here we use:\[\lim_{x \to -\infty} x \exp(x) = 0\]Thus, if
a=-inf
inlog(sum(b * exp(a))
then we can setb=0
such that that term is ignored in the sum.
- anesthetic.utils.match_contour_to_contourf(contours, vmin, vmax)[source]
Get needed vmin, vmax to match contour colors to contourf colors.
contourf uses the arithmetic mean of contour levels to assign colors, whereas contour uses the contour level directly. To get the same colors for contour lines as for contourf faces, we need some fiddly algebra.
- anesthetic.utils.mirror_1d(d, xmin=None, xmax=None)[source]
If necessary apply reflecting boundary conditions.
- anesthetic.utils.mirror_2d(d_x_, d_y_, xmin=None, xmax=None, ymin=None, ymax=None)[source]
If necessary apply reflecting boundary conditions.
- anesthetic.utils.neff(w, beta=1)[source]
Calculate effective number of samples.
Using the Huggins-Roy family of effective samples (https://aakinshin.net/posts/huggins-roy-ess/).
- Parameters:
- betaint, float, str, default = 1
The value of beta used to calculate the number of effective samples according to
\[ \begin{align}\begin{aligned}N_{eff} &= \bigg(\sum_{i=0}^n w_i^\beta \bigg)^{\frac{1}{1-\beta}}\\w_i &= \frac{w_i}{\sum_j w_j}\end{aligned}\end{align} \]Beta can take any positive value. Larger beta corresponds to a greater compression such that:
\[\beta_1 < \beta_2 \Rightarrow N_{eff}(\beta_1) > N_{eff}(\beta_2)\]Alternatively, beta can take one of the following strings as input:
If ‘inf’ or ‘equal’ is supplied (equivalent to beta=inf), then the resulting number of samples is the number of samples when compressed to equal weights, and given by:
\[ \begin{align}\begin{aligned}w_i &= \frac{w_i}{\sum_j w_j}\\N_{eff} &= \frac{1}{\max_i[w_i]}\end{aligned}\end{align} \]If ‘entropy’ is supplied (equivalent to beta=1), then the estimate is determined via the entropy based calculation, also referred to as the channel capacity:
\[ \begin{align}\begin{aligned}H &= -\sum_i p_i \ln p_i\\p_i &= \frac{w_i}{\sum_j w_j}\\N_{eff} &= e^{H}\end{aligned}\end{align} \]If ‘kish’ is supplied (equivalent to beta=2), then a Kish estimate is computed (Kish, Leslie (1965). Survey Sampling. New York: John Wiley & Sons, Inc. ISBN 0-471-10949-5):
\[N_{eff} = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}\]str(float) input gets converted to the corresponding float value.
- anesthetic.utils.quantile(a, q, w=None, interpolation='linear')[source]
Compute the weighted quantile for a one dimensional array.
- anesthetic.utils.sample_compression_1d(x, w=None, ncompress=True)[source]
Histogram a 1D set of weighted samples via subsampling.
This compresses the number of samples, combining weights.
- Parameters:
- xarray-like
x coordinate of samples for compressing
- w
pandas.Series
, optional weights of samples
- ncompressint, default=True
Degree of compression.
If
int
: number of samples returned.If
True
: compresses to the channel capacity (same asncompress='entropy'
).If
False
: no compression.If
str
: determine number from the Huggins-Roy family of effective samples inneff()
withbeta=ncompress
.
- Returns:
- x, w: array-like
Compressed samples and weights
- anesthetic.utils.scaled_triangulation(x, y, cov)[source]
Triangulation scaled by a covariance matrix.
- Parameters:
- x, yarray-like
x and y coordinates of samples
- covarray-like, 2d
Covariance matrix for scaling
- Returns:
matplotlib.tri.Triangulation
Triangulation with the appropriate scaling
- anesthetic.utils.triangular_sample_compression_2d(x, y, cov, w=None, n=1000)[source]
Histogram a 2D set of weighted samples via triangulation.
This defines bins via a triangulation of the subsamples and sums weights within triangles surrounding each point
- Parameters:
- x, yarray-like
x and y coordinates of samples for compressing
- covarray-like, 2d
Covariance matrix for scaling
- w
pandas.Series
, optional weights of samples
- nint, default=1000
number of samples returned.
- Returns:
- tri
matplotlib.tri.Triangulation
with an appropriate scaling- warray-like
Compressed samples and weights
anesthetic.weighted_labelled_pandas module
Pandas DataFrame with weights and labels.
- class anesthetic.weighted_labelled_pandas.WeightedLabelledDataFrame(*args, **kwargs)[source]
Bases:
WeightedDataFrame
,LabelledDataFrame
pandas.DataFrame
with weights and labels.
- class anesthetic.weighted_labelled_pandas.WeightedLabelledSeries(*args, **kwargs)[source]
Bases:
WeightedSeries
,LabelledSeries
Series with weights and labels.
anesthetic.weighted_pandas module
Pandas DataFrame and Series with weighted samples.
- class anesthetic.weighted_pandas.WeightedDataFrame(*args, **kwargs)[source]
Weighted version of
pandas.DataFrame
.- compress(ncompress=True, axis=0)[source]
Reduce the number of samples by discarding low-weights.
- Parameters:
- ncompressint, str, default=True
Degree of compression.
If
True
(default): reduce to the channel capacity (theoretical optimum compression), equivalent toncompress='entropy'
.If
> 0
: desired number of samples after compression.If
<= 0
: compress so that all remaining weights are unity.If
str
: determine number from the Huggins-Roy family of effective samples inanesthetic.utils.neff()
withbeta=ncompress
.
- corr(method='pearson', skipna=True, *args, **kwargs)[source]
Compute pairwise correlation of columns, excluding NA/null values.
- Parameters:
- method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method of correlation:
pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.
- min_periodsint, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.
- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.
- Returns:
- WeightedDataFrame
Correlation matrix.
See also
WeightedDataFrame.corrwith
Compute pairwise correlation with another WeightedDataFrame or WeightedSeries.
WeightedSeries.corr
Compute the correlation between two WeightedSeries.
Notes
Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.
Examples
>>> def histogram_intersection(a, b): ... v = np.minimum(a, b).sum().round(decimals=1) ... return v >>> df = pd.WeightedDataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) >>> df.corr(method=histogram_intersection) dogs cats dogs 1.0 0.3 cats 0.3 1.0
>>> df = pd.WeightedDataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)], ... columns=['dogs', 'cats']) >>> df.corr(min_periods=3) dogs cats dogs 1.0 NaN cats NaN 1.0
- corrwith(other, axis=0, drop=False, method='pearson', *args, **kwargs)[source]
Compute pairwise correlation.
Pairwise correlation is computed between rows or columns of WeightedDataFrame with rows or columns of WeightedSeries or WeightedDataFrame. WeightedDataFrames are first aligned along both axes before computing the correlations.
- Parameters:
- otherWeightedDataFrame, WeightedSeries
Object with which to compute correlations.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.
- dropbool, default False
Drop missing indices from result.
- method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method of correlation:
pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
and returning a float.
- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.
- Returns:
- WeightedSeries
Pairwise correlations.
See also
WeightedDataFrame.corr
Compute pairwise correlation of columns.
Examples
>>> index = ["a", "b", "c", "d", "e"] >>> columns = ["one", "two", "three", "four"] >>> df1 = pd.WeightedDataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns) >>> df2 = pd.WeightedDataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns) >>> df1.corrwith(df2) one 1.0 two 1.0 three 1.0 four 1.0 dtype: float64
>>> df2.corrwith(df1, axis=1) a 1.0 b 1.0 c 1.0 d 1.0 e NaN dtype: float64
- cov(*args, **kwargs)[source]
Compute pairwise covariance of columns, excluding NA/null values.
Compute the pairwise covariance among the series of a WeightedDataFrame. The returned data frame is the covariance matrix of the columns of the WeightedDataFrame.
Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as
NaN
.This method is generally used for the analysis of time series data to understand the relationship between different measures across time.
- Parameters:
- min_periodsint, optional
Minimum number of observations required per pair of columns to have a valid result.
- ddofint, default 1
Delta degrees of freedom. The divisor used in calculations is
N - ddof
, whereN
represents the number of elements. This argument is applicable only when nonan
is in the dataframe.- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.
- Returns:
- WeightedDataFrame
The covariance matrix of the series of the WeightedDataFrame.
See also
WeightedSeries.cov
Compute covariance with another WeightedSeries.
pandas.core.window.ewm.ExponentialMovingWindow.cov
Exponential weighted sample covariance.
pandas.core.window.expanding.Expanding.cov
Expanding sample covariance.
pandas.core.window.rolling.Rolling.cov
Rolling sample covariance.
Notes
Returns the covariance matrix of the WeightedDataFrame’s time series. The covariance is normalized by N-ddof.
For WeightedDataFrames that have WeightedSeries that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member WeightedSeries.
However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.
Examples
>>> df = pd.WeightedDataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], ... columns=['dogs', 'cats']) >>> df.cov() dogs cats dogs 0.666667 -1.000000 cats -1.000000 1.666667
>>> np.random.seed(42) >>> df = pd.WeightedDataFrame(np.random.randn(1000, 5), ... columns=['a', 'b', 'c', 'd', 'e']) >>> df.cov() a b c d e a 0.998438 -0.020161 0.059277 -0.008943 0.014144 b -0.020161 1.059352 -0.008543 -0.024738 0.009826 c 0.059277 -0.008543 1.010670 -0.001486 -0.000271 d -0.008943 -0.024738 -0.001486 0.921297 -0.013692 e 0.014144 0.009826 -0.000271 -0.013692 0.977795
Minimum number of periods
This method also supports an optional
min_periods
keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:>>> np.random.seed(42) >>> df = pd.WeightedDataFrame(np.random.randn(20, 3), ... columns=['a', 'b', 'c']) >>> df.loc[df.index[:5], 'a'] = np.nan >>> df.loc[df.index[5:10], 'b'] = np.nan >>> df.cov(min_periods=12) a b c a 0.316741 NaN -0.150812 b NaN 1.248003 0.191417 c -0.150812 0.191417 0.895202
- groupby(by=None, axis=_NoDefault.no_default, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]
Group WeightedDataFrame using a mapper or by a WeightedSeries of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters:
- bymapping, function, label, pd.Grouper or list of such
Used to determine the groups for the groupby. If
by
is a function, it’s called on each value of the object’s index. If a dict or WeightedSeries is passed, the WeightedSeries or dict VALUES will be used to determine the groups (the WeightedSeries’ values are first aligned; see.align()
method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns inself
. Notice that a tuple is interpreted as a (single) key.- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1). For WeightedSeries this parameter is unused and defaults to 0.
Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For
axis=1
, doframe.T.groupby(...)
instead.- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both
by
andlevel
.- as_indexbool, default True
Return object with group labels as the index. Only relevant for WeightedDataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as
head()
,tail()
,nth()
and in transformations (see the transformations in the user guide).- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original WeightedDataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as
head()
,tail()
,nth()
and in transformations (see the transformations in the user guide).Changed in version 2.0.0: Specifying
sort=False
with an ordered categorical grouper will no longer sort the values.- group_keysbool, default True
When calling apply and the
by
argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.Changed in version 1.5.0: Warns that
group_keys
will no longer be ignored when the result fromapply
is a like-indexed WeightedSeries or WeightedDataFrame. Specifygroup_keys
explicitly to include the group keys or not.Changed in version 2.0.0:
group_keys
now defaults toTrue
.- observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.
- dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.
- Returns:
- pandas.api.typing.WeightedDataFrameGroupBy
Returns a groupby object that contains information about the groups.
See also
pandas.DataFrame.resample
Convenience method for frequency conversion and resampling of time series.
Notes
See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.
Examples
>>> df = pd.WeightedDataFrame({'Animal': ['Falcon', 'Falcon', ... 'Parrot', 'Parrot'], ... 'Max Speed': [380., 370., 24., 26.]}) >>> df Animal Max Speed 0 Falcon 380.0 1 Falcon 370.0 2 Parrot 24.0 3 Parrot 26.0 >>> df.groupby(['Animal']).mean() Max Speed Animal Falcon 375.0 Parrot 25.0
Hierarchical Indexes
We can groupby different levels of a hierarchical index using the level parameter:
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], ... ['Captive', 'Wild', 'Captive', 'Wild']] >>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type')) >>> df = pd.WeightedDataFrame({'Max Speed': [390., 350., 30., 20.]}, ... index=index) >>> df Max Speed Animal Type Falcon Captive 390.0 Wild 350.0 Parrot Captive 30.0 Wild 20.0 >>> df.groupby(level=0).mean() Max Speed Animal Falcon 370.0 Parrot 25.0 >>> df.groupby(level="Type").mean() Max Speed Type Captive 210.0 Wild 185.0
We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True.
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]] >>> df = pd.WeightedDataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum() a c b 1.0 2 3 2.0 2 5
>>> df.groupby(by=["b"], dropna=False).sum() a c b 1.0 2 3 2.0 2 5 NaN 1 4
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]] >>> df = pd.WeightedDataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by="a").sum() b c a a 13.0 13.0 b 12.3 123.0
>>> df.groupby(by="a", dropna=False).sum() b c a a 13.0 13.0 b 12.3 123.0 NaN 12.3 33.0
When using
.apply()
, usegroup_keys
to include or exclude the group keys. Thegroup_keys
argument defaults toTrue
(include).>>> df = pd.WeightedDataFrame({'Animal': ['Falcon', 'Falcon', ... 'Parrot', 'Parrot'], ... 'Max Speed': [380., 370., 24., 26.]}) >>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x) Max Speed Animal Falcon 0 380.0 1 370.0 Parrot 2 24.0 3 26.0
>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x) Max Speed 0 380.0 1 370.0 2 24.0 3 26.0
- kurt(axis=0, skipna=True, *args, **kwargs)[source]
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters:
- axis{index (0), columns (1)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- WeightedSeries or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse']) >>> s cat 1 dog 2 dog 2 mouse 3 dtype: int64 >>> s.kurt() 1.5
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]}, ... index=['cat', 'dog', 'dog', 'mouse']) >>> df a b cat 1 3 dog 2 4 dog 2 4 mouse 3 4 >>> df.kurt() a 1.5 b 4.0 dtype: float64
With axis=None
>>> df.kurt(axis=None).round(6) -0.988693
Using axis=1
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]}, ... index=['cat', 'dog']) >>> df.kurt(axis=1) cat -6.0 dog -6.0 dtype: float64
- kurtosis(*args, **kwargs)[source]
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters:
- axis{index (0), columns (1)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- WeightedSeries or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse']) >>> s cat 1 dog 2 dog 2 mouse 3 dtype: int64 >>> s.kurt() 1.5
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]}, ... index=['cat', 'dog', 'dog', 'mouse']) >>> df a b cat 1 3 dog 2 4 dog 2 4 mouse 3 4 >>> df.kurt() a 1.5 b 4.0 dtype: float64
With axis=None
>>> df.kurt(axis=None).round(6) -0.988693
Using axis=1
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]}, ... index=['cat', 'dog']) >>> df.kurt(axis=1) cat -6.0 dog -6.0 dtype: float64
- mean(axis=0, skipna=True, *args, **kwargs)[source]
Return the mean of the values over the requested axis.
- Parameters:
- axis{index (0), columns (1)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- WeightedSeries or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.mean() 2.0
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) >>> df a b tiger 1 2 zebra 2 3 >>> df.mean() a 1.5 b 2.5 dtype: float64
Using axis=1
>>> df.mean(axis=1) tiger 1.5 zebra 2.5 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']}, ... index=['tiger', 'zebra']) >>> df.mean(numeric_only=True) a 1.5 dtype: float64
- median(*args, **kwargs)[source]
Return the median of the values over the requested axis.
- Parameters:
- axis{index (0), columns (1)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- WeightedSeries or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.median() 2.0
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) >>> df a b tiger 1 2 zebra 2 3 >>> df.median() a 1.5 b 2.5 dtype: float64
Using axis=1
>>> df.median(axis=1) tiger 1.5 zebra 2.5 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']}, ... index=['tiger', 'zebra']) >>> df.median(numeric_only=True) a 1.5 dtype: float64
- quantile(q=0.5, axis=0, numeric_only=None, interpolation='linear', method=None)[source]
Return values at the given quantile over requested axis.
- Parameters:
- qfloat or array-like, default 0.5 (50% quantile)
Value between 0 <= q <= 1, the quantile(s) to compute.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
- numeric_onlybool, default False
Include only float, int or boolean data.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.- interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
lower: i.
higher: j.
nearest: i or j whichever is nearest.
midpoint: (i + j) / 2.
- method{‘single’, ‘table’}, default ‘single’
Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). When ‘table’, the only allowed interpolation methods are ‘nearest’, ‘lower’, and ‘higher’.
- Returns:
- WeightedSeries or WeightedDataFrame
- If
q
is an array, a WeightedDataFrame will be returned where the index is
q
, the columns are the columns of self, and the values are the quantiles.- If
q
is a float, a WeightedSeries will be returned where the index is the columns of self and the values are the quantiles.
- If
See also
pandas.core.window.rolling.Rolling.quantile
Rolling quantile.
numpy.percentile
Numpy function to compute the percentile.
Examples
>>> df = pd.WeightedDataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), ... columns=['a', 'b']) >>> df.quantile(.1) a 1.3 b 3.7 Name: 0.1, dtype: float64 >>> df.quantile([.1, .5]) a b 0.1 1.3 3.7 0.5 2.5 55.0
Specifying method=’table’ will compute the quantile over all columns.
>>> df.quantile(.1, method="table", interpolation="nearest") a 1 b 1 Name: 0.1, dtype: int64 >>> df.quantile([.1, .5], method="table", interpolation="nearest") a b 0.1 1 1 0.5 3 100
Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.
>>> df = pd.WeightedDataFrame({'A': [1, 2], ... 'B': [pd.Timestamp('2010'), ... pd.Timestamp('2011')], ... 'C': [pd.Timedelta('1 days'), ... pd.Timedelta('2 days')]}) >>> df.quantile(0.5, numeric_only=False) A 1.5 B 2010-07-02 12:00:00 C 1 days 12:00:00 Name: 0.5, dtype: object
- sample(*args, **kwargs)[source]
Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters:
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once.
- weightsstr or ndarray-like, optional
Default ‘None’ results in equal probability weighting. If passed a WeightedSeries, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a WeightedDataFrame, will accept the name of a column when axis = 0. Unless weights are a WeightedSeries, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.
- random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
Changed in version 1.4.0: np.random.Generator objects now accepted
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For WeightedSeries this parameter is unused and defaults to None.
- ignore_indexbool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
New in version 1.3.0.
- Returns:
- WeightedSeries or WeightedDataFrame
A new object of same type as caller containing n items randomly sampled from the caller object.
See also
WeightedDataFrameGroupBy.sample
Generates random samples from each group of a WeightedDataFrame object.
WeightedSeriesGroupBy.sample
Generates random samples from each group of a WeightedSeries object.
numpy.random.choice
Generates a random sample from a given 1-D numpy array.
Notes
If frac > 1, replacement should be set to True.
Examples
>>> df = pd.WeightedDataFrame({'num_legs': [2, 4, 8, 0], ... 'num_wings': [2, 0, 0, 0], ... 'num_specimen_seen': [10, 2, 1, 8]}, ... index=['falcon', 'dog', 'spider', 'fish']) >>> df num_legs num_wings num_specimen_seen falcon 2 2 10 dog 4 0 2 spider 8 0 1 fish 0 0 8
Extract 3 random elements from the
WeightedSeries
df['num_legs']
: Note that we use random_state to ensure the reproducibility of the examples.>>> df['num_legs'].sample(n=3, random_state=1) fish 0 spider 8 falcon 2 Name: num_legs, dtype: int64
A random 50% sample of the
WeightedDataFrame
with replacement:>>> df.sample(frac=0.5, replace=True, random_state=1) num_legs num_wings num_specimen_seen dog 4 0 2 fish 0 0 8
An upsample sample of the
WeightedDataFrame
with replacement: Note that replace parameter has to be True for frac parameter > 1.>>> df.sample(frac=2, replace=True, random_state=1) num_legs num_wings num_specimen_seen dog 4 0 2 fish 0 0 8 falcon 2 2 10 falcon 2 2 10 fish 0 0 8 dog 4 0 2 fish 0 0 8 dog 4 0 2
Using a WeightedDataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1) num_legs num_wings num_specimen_seen falcon 2 2 10 fish 0 0 8
- sem(axis=0, skipna=True)[source]
Return unbiased standard error of the mean over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters:
- axis{index (0), columns (1)}
For WeightedSeries this parameter is unused and defaults to 0.
Warning
The behavior of WeightedDataFrame.sem with
axis=None
is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- Returns:
- WeightedSeries or WeightedDataFrame (if level specified)
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.sem().round(6) 0.57735
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) >>> df a b tiger 1 2 zebra 2 3 >>> df.sem() a 0.5 b 0.5 dtype: float64
Using axis=1
>>> df.sem(axis=1) tiger 0.5 zebra 0.5 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']}, ... index=['tiger', 'zebra']) >>> df.sem(numeric_only=True) a 0.5 dtype: float64
- skew(axis=0, skipna=True, *args, **kwargs)[source]
Return unbiased skew over requested axis.
Normalized by N-1.
- Parameters:
- axis{index (0), columns (1)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- WeightedSeries or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.skew() 0.0
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]}, ... index=['tiger', 'zebra', 'cow']) >>> df a b c tiger 1 2 1 zebra 2 3 3 cow 3 4 5 >>> df.skew() a 0.0 b 0.0 c 0.0 dtype: float64
Using axis=1
>>> df.skew(axis=1) tiger 1.732051 zebra -1.732051 cow 0.000000 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']}, ... index=['tiger', 'zebra', 'cow']) >>> df.skew(numeric_only=True) a 0.0 dtype: float64
- std(*args, **kwargs)[source]
Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
- Parameters:
- axis{index (0), columns (1)}
For WeightedSeries this parameter is unused and defaults to 0.
Warning
The behavior of WeightedDataFrame.std with
axis=None
is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- Returns:
- WeightedSeries or WeightedDataFrame (if level specified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
Examples
>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3], ... 'age': [21, 25, 62, 43], ... 'height': [1.61, 1.87, 1.49, 2.01]} ... ).set_index('person_id') >>> df age height person_id 0 21 1.61 1 25 1.87 2 62 1.49 3 43 2.01
The standard deviation of the columns can be found as follows:
>>> df.std() age 18.786076 height 0.237417 dtype: float64
Alternatively, ddof=0 can be set to normalize by N instead of N-1:
>>> df.std(ddof=0) age 16.269219 height 0.205609 dtype: float64
- var(axis=0, skipna=True, *args, **kwargs)[source]
Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
- Parameters:
- axis{index (0), columns (1)}
For WeightedSeries this parameter is unused and defaults to 0.
Warning
The behavior of WeightedDataFrame.var with
axis=None
is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- Returns:
- WeightedSeries or WeightedDataFrame (if level specified)
Examples
>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3], ... 'age': [21, 25, 62, 43], ... 'height': [1.61, 1.87, 1.49, 2.01]} ... ).set_index('person_id') >>> df age height person_id 0 21 1.61 1 25 1.87 2 62 1.49 3 43 2.01
>>> df.var() age 352.916667 height 0.056367 dtype: float64
Alternatively,
ddof=0
can be set to normalize by N instead of N-1:>>> df.var(ddof=0) age 264.687500 height 0.042275 dtype: float64
- class anesthetic.weighted_pandas.WeightedDataFrameGroupBy(*args, **kwargs)[source]
Weighted version of
pandas.core.groupby.DataFrameGroupBy
.- cov(*args, **kwargs)[source]
Compute pairwise covariance of columns, excluding NA/null values.
Compute the pairwise covariance among the series of a WeightedDataFrame. The returned data frame is the covariance matrix of the columns of the WeightedDataFrame.
Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as
NaN
.This method is generally used for the analysis of time series data to understand the relationship between different measures across time.
- Parameters:
- min_periodsint, optional
Minimum number of observations required per pair of columns to have a valid result.
- ddofint, default 1
Delta degrees of freedom. The divisor used in calculations is
N - ddof
, whereN
represents the number of elements. This argument is applicable only when nonan
is in the dataframe.- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.
- Returns:
- WeightedDataFrame
The covariance matrix of the series of the WeightedDataFrame.
See also
WeightedSeries.cov
Compute covariance with another WeightedSeries.
pandas.core.window.ewm.ExponentialMovingWindow.cov
Exponential weighted sample covariance.
pandas.core.window.expanding.Expanding.cov
Expanding sample covariance.
pandas.core.window.rolling.Rolling.cov
Rolling sample covariance.
Notes
Returns the covariance matrix of the WeightedDataFrame’s time series. The covariance is normalized by N-ddof.
For WeightedDataFrames that have WeightedSeries that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member WeightedSeries.
However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.
Examples
>>> df = pd.WeightedDataFrame([(1, 2), (0, 3), (2, 0), (1, 1)], ... columns=['dogs', 'cats']) >>> df.cov() dogs cats dogs 0.666667 -1.000000 cats -1.000000 1.666667
>>> np.random.seed(42) >>> df = pd.WeightedDataFrame(np.random.randn(1000, 5), ... columns=['a', 'b', 'c', 'd', 'e']) >>> df.cov() a b c d e a 0.998438 -0.020161 0.059277 -0.008943 0.014144 b -0.020161 1.059352 -0.008543 -0.024738 0.009826 c 0.059277 -0.008543 1.010670 -0.001486 -0.000271 d -0.008943 -0.024738 -0.001486 0.921297 -0.013692 e 0.014144 0.009826 -0.000271 -0.013692 0.977795
Minimum number of periods
This method also supports an optional
min_periods
keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:>>> np.random.seed(42) >>> df = pd.WeightedDataFrame(np.random.randn(20, 3), ... columns=['a', 'b', 'c']) >>> df.loc[df.index[:5], 'a'] = np.nan >>> df.loc[df.index[5:10], 'b'] = np.nan >>> df.cov(min_periods=12) a b c a 0.316741 NaN -0.150812 b NaN 1.248003 0.191417 c -0.150812 0.191417 0.895202
- sample(*args, **kwargs)[source]
Return a random sample of items from each group.
You can use random_state for reproducibility.
- Parameters:
- nint, optional
Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.
- fracfloat, optional
Fraction of items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once.
- weightslist-like, optional
Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying WeightedDataFrame or WeightedSeries object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.
- random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
Changed in version 1.4.0: np.random.Generator objects now accepted
- Returns:
- WeightedSeries or WeightedDataFrame
A new object of same type as caller containing items randomly sampled within each group from the caller object.
See also
WeightedDataFrame.sample
Generate random samples from a WeightedDataFrame object.
numpy.random.choice
Generate a random sample from a given 1-D numpy array.
Examples
>>> df = pd.WeightedDataFrame( ... {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)} ... ) >>> df a b 0 red 0 1 red 1 2 blue 2 3 blue 3 4 black 4 5 black 5
Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:
>>> df.groupby("a").sample(n=1, random_state=1) a b 4 black 4 2 blue 2 1 red 1
Set frac to sample fixed proportions rather than counts:
>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2) 5 5 2 2 0 0 Name: b, dtype: int64
Control sample probabilities within groups by setting weights:
>>> df.groupby("a").sample( ... n=1, ... weights=[1, 1, 1, 0, 0, 1], ... random_state=1, ... ) a b 5 black 5 2 blue 2 0 red 0
- class anesthetic.weighted_pandas.WeightedGroupBy(*args, **kwargs)[source]
Weighted version of
pandas.core.groupby.GroupBy
.- mean(*args, **kwargs)[source]
Compute mean of groups, excluding missing values.
- Parameters:
- numeric_onlybool, default False
Include only float, int, boolean columns.
Changed in version 2.0.0: numeric_only no longer accepts
None
and defaults toFalse
.- enginestr, default None
'cython'
: Runs the operation through C-extensions from cython.'numba'
: Runs the operation through JIT compiled code from numba.None
: Defaults to'cython'
or globally settingcompute.use_numba
New in version 1.4.0.
- engine_kwargsdict, default None
For
'cython'
engine, there are no acceptedengine_kwargs
For
'numba'
engine, the engine can acceptnopython
,nogil
andparallel
dictionary keys. The values must either beTrue
orFalse
. The defaultengine_kwargs
for the'numba'
engine is{{'nopython': True, 'nogil': False, 'parallel': False}}
New in version 1.4.0.
- Returns:
- pandas.WeightedSeries or pandas.WeightedDataFrame
See also
WeightedSeries.groupby
Apply a function groupby to a WeightedSeries.
WeightedDataFrame.groupby
Apply a function groupby to each row or column of a WeightedDataFrame.
Examples
>>> df = pd.WeightedDataFrame({'A': [1, 1, 2, 1, 2], ... 'B': [np.nan, 2, 3, 4, 5], ... 'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])
Groupby one column and return the mean of the remaining columns in each group.
>>> df.groupby('A').mean() B C A 1 3.0 1.333333 2 4.0 1.500000
Groupby two columns and return the mean of the remaining column.
>>> df.groupby(['A', 'B']).mean() C A B 1 2.0 2.0 4.0 1.0 2 3.0 1.0 5.0 2.0
Groupby one column and return the mean of only particular column in the group.
>>> df.groupby('A')['B'].mean() A 1 3.0 2 4.0 Name: B, dtype: float64
- median(*args, **kwargs)[source]
Compute median of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex
- Parameters:
- numeric_onlybool, default False
Include only float, int, boolean columns.
Changed in version 2.0.0: numeric_only no longer accepts
None
and defaults to False.
- Returns:
- WeightedSeries or WeightedDataFrame
Median of values within each group.
Examples
For WeightedSeriesGroupBy:
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b'] >>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst) >>> ser a 7 a 2 a 8 b 4 b 3 b 3 dtype: int64 >>> ser.groupby(level=0).median() a 7.0 b 3.0 dtype: float64
For WeightedDataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]} >>> df = pd.WeightedDataFrame(data, index=['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse']) >>> df a b dog 1 1 dog 3 4 dog 5 8 mouse 7 4 mouse 7 4 mouse 8 2 mouse 3 1 >>> df.groupby(level=0).median() a b dog 3.0 4.0 mouse 7.0 3.0
For Resampler:
>>> ser = pd.WeightedSeries([1, 2, 3, 3, 4, 5], ... index=pd.DatetimeIndex(['2023-01-01', ... '2023-01-10', ... '2023-01-15', ... '2023-02-01', ... '2023-02-10', ... '2023-02-15'])) >>> ser.resample('MS').median() 2023-01-01 2.0 2023-02-01 4.0 Freq: MS, dtype: float64
- quantile(*args, **kwargs)[source]
Return group values at the given quantile, a la numpy.percentile.
- Parameters:
- qfloat or array-like, default 0.5 (50% quantile)
Value(s) between 0 and 1 providing the quantile(s) to compute.
- interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
Method to use when the desired quantile falls between two points.
- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: numeric_only now defaults to
False
.
- Returns:
- WeightedSeries or WeightedDataFrame
Return type determined by caller of GroupBy object.
See also
WeightedSeries.quantile
Similar method for WeightedSeries.
WeightedDataFrame.quantile
Similar method for WeightedDataFrame.
numpy.percentile
NumPy method to compute qth percentile.
Examples
>>> df = pd.WeightedDataFrame([ ... ['a', 1], ['a', 2], ['a', 3], ... ['b', 1], ['b', 3], ['b', 5] ... ], columns=['key', 'val']) >>> df.groupby('key').quantile() val key a 2.0 b 3.0
- sem(*args, **kwargs)[source]
Compute standard error of the mean of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
- Parameters:
- ddofint, default 1
Degrees of freedom.
- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: numeric_only now defaults to
False
.
- Returns:
- WeightedSeries or WeightedDataFrame
Standard error of the mean of values within each group.
Examples
For WeightedSeriesGroupBy:
>>> lst = ['a', 'a', 'b', 'b'] >>> ser = pd.WeightedSeries([5, 10, 8, 14], index=lst) >>> ser a 5 a 10 b 8 b 14 dtype: int64 >>> ser.groupby(level=0).sem() a 2.5 b 3.0 dtype: float64
For WeightedDataFrameGroupBy:
>>> data = [[1, 12, 11], [1, 15, 2], [2, 5, 8], [2, 6, 12]] >>> df = pd.WeightedDataFrame(data, columns=["a", "b", "c"], ... index=["tuna", "salmon", "catfish", "goldfish"]) >>> df a b c tuna 1 12 11 salmon 1 15 2 catfish 2 5 8 goldfish 2 6 12 >>> df.groupby("a").sem() b c a 1 1.5 4.5 2 0.5 2.0
For Resampler:
>>> ser = pd.WeightedSeries([1, 3, 2, 4, 3, 8], ... index=pd.DatetimeIndex(['2023-01-01', ... '2023-01-10', ... '2023-01-15', ... '2023-02-01', ... '2023-02-10', ... '2023-02-15'])) >>> ser.resample('MS').sem() 2023-01-01 0.577350 2023-02-01 1.527525 Freq: MS, dtype: float64
- std(*args, **kwargs)[source]
Compute standard deviation of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
- Parameters:
- ddofint, default 1
Degrees of freedom.
- enginestr, default None
'cython'
: Runs the operation through C-extensions from cython.'numba'
: Runs the operation through JIT compiled code from numba.None
: Defaults to'cython'
or globally settingcompute.use_numba
New in version 1.4.0.
- engine_kwargsdict, default None
For
'cython'
engine, there are no acceptedengine_kwargs
For
'numba'
engine, the engine can acceptnopython
,nogil
andparallel
dictionary keys. The values must either beTrue
orFalse
. The defaultengine_kwargs
for the'numba'
engine is{{'nopython': True, 'nogil': False, 'parallel': False}}
New in version 1.4.0.
- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: numeric_only now defaults to
False
.
- Returns:
- WeightedSeries or WeightedDataFrame
Standard deviation of values within each group.
See also
WeightedSeries.groupby
Apply a function groupby to a WeightedSeries.
WeightedDataFrame.groupby
Apply a function groupby to each row or column of a WeightedDataFrame.
Examples
For WeightedSeriesGroupBy:
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b'] >>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst) >>> ser a 7 a 2 a 8 b 4 b 3 b 3 dtype: int64 >>> ser.groupby(level=0).std() a 3.21455 b 0.57735 dtype: float64
For WeightedDataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]} >>> df = pd.WeightedDataFrame(data, index=['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse']) >>> df a b dog 1 1 dog 3 4 dog 5 8 mouse 7 4 mouse 7 4 mouse 8 2 mouse 3 1 >>> df.groupby(level=0).std() a b dog 2.000000 3.511885 mouse 2.217356 1.500000
- var(*args, **kwargs)[source]
Compute variance of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
- Parameters:
- ddofint, default 1
Degrees of freedom.
- enginestr, default None
'cython'
: Runs the operation through C-extensions from cython.'numba'
: Runs the operation through JIT compiled code from numba.None
: Defaults to'cython'
or globally settingcompute.use_numba
New in version 1.4.0.
- engine_kwargsdict, default None
For
'cython'
engine, there are no acceptedengine_kwargs
For
'numba'
engine, the engine can acceptnopython
,nogil
andparallel
dictionary keys. The values must either beTrue
orFalse
. The defaultengine_kwargs
for the'numba'
engine is{{'nopython': True, 'nogil': False, 'parallel': False}}
New in version 1.4.0.
- numeric_onlybool, default False
Include only float, int or boolean data.
New in version 1.5.0.
Changed in version 2.0.0: numeric_only now defaults to
False
.
- Returns:
- WeightedSeries or WeightedDataFrame
Variance of values within each group.
See also
WeightedSeries.groupby
Apply a function groupby to a WeightedSeries.
WeightedDataFrame.groupby
Apply a function groupby to each row or column of a WeightedDataFrame.
Examples
For WeightedSeriesGroupBy:
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b'] >>> ser = pd.WeightedSeries([7, 2, 8, 4, 3, 3], index=lst) >>> ser a 7 a 2 a 8 b 4 b 3 b 3 dtype: int64 >>> ser.groupby(level=0).var() a 10.333333 b 0.333333 dtype: float64
For WeightedDataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]} >>> df = pd.WeightedDataFrame(data, index=['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse']) >>> df a b dog 1 1 dog 3 4 dog 5 8 mouse 7 4 mouse 7 4 mouse 8 2 mouse 3 1 >>> df.groupby(level=0).var() a b dog 4.000000 12.333333 mouse 4.916667 2.250000
- class anesthetic.weighted_pandas.WeightedSeries(*args, **kwargs)[source]
Weighted version of
pandas.Series
.- compress(ncompress=True)[source]
Reduce the number of samples by discarding low-weights.
- Parameters:
- ncompressint, str, default=True
Degree of compression.
If
True
(default): reduce to the channel capacity (theoretical optimum compression), equivalent toncompress='entropy'
.If
> 0
: desired number of samples after compression.If
<= 0
: compress so that all remaining weights are unity.If
str
: determine number from the Huggins-Roy family of effective samples inanesthetic.utils.neff()
withbeta=ncompress
.
- corr(other, *args, **kwargs)[source]
Compute correlation with other WeightedSeries, excluding missing values.
The two WeightedSeries objects are not required to be the same length and will be aligned internally before the correlation function is applied.
- Parameters:
- otherWeightedSeries
WeightedSeries with which to compute the correlation.
- method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method used to compute correlation:
pearson : Standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: Callable with input two 1d ndarrays and returning a float.
Warning
Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.
- min_periodsint, optional
Minimum number of observations needed to have a valid result.
- Returns:
- float
Correlation with other.
See also
WeightedDataFrame.corr
Compute pairwise correlation between columns.
WeightedDataFrame.corrwith
Compute pairwise correlation with another WeightedDataFrame or WeightedSeries.
Notes
Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.
Automatic data alignment: as with all pandas operations, automatic data alignment is performed for this method.
corr()
automatically considers values with matching indices.Examples
>>> def histogram_intersection(a, b): ... v = np.minimum(a, b).sum().round(decimals=1) ... return v >>> s1 = pd.WeightedSeries([.2, .0, .6, .2]) >>> s2 = pd.WeightedSeries([.3, .6, .0, .1]) >>> s1.corr(s2, method=histogram_intersection) 0.3
Pandas auto-aligns the values with matching indices
>>> s1 = pd.WeightedSeries([1, 2, 3], index=[0, 1, 2]) >>> s2 = pd.WeightedSeries([1, 2, 3], index=[2, 1, 0]) >>> s1.corr(s2) -1.0
- cov(other, *args, **kwargs)[source]
Compute covariance with WeightedSeries, excluding missing values.
The two WeightedSeries objects are not required to be the same length and will be aligned internally before the covariance is calculated.
- Parameters:
- otherWeightedSeries
WeightedSeries with which to compute the covariance.
- min_periodsint, optional
Minimum number of observations needed to have a valid result.
- ddofint, default 1
Delta degrees of freedom. The divisor used in calculations is
N - ddof
, whereN
represents the number of elements.
- Returns:
- float
Covariance between WeightedSeries and other normalized by N-1 (unbiased estimator).
See also
WeightedDataFrame.cov
Compute pairwise covariance of columns.
Examples
>>> s1 = pd.WeightedSeries([0.90010907, 0.13484424, 0.62036035]) >>> s2 = pd.WeightedSeries([0.12528585, 0.26962463, 0.51111198]) >>> s1.cov(s2) -0.01685762652715874
- groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]
Group WeightedSeries using a mapper or by a WeightedSeries of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
- Parameters:
- bymapping, function, label, pd.Grouper or list of such
Used to determine the groups for the groupby. If
by
is a function, it’s called on each value of the object’s index. If a dict or WeightedSeries is passed, the WeightedSeries or dict VALUES will be used to determine the groups (the WeightedSeries’ values are first aligned; see.align()
method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns inself
. Notice that a tuple is interpreted as a (single) key.- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1). For WeightedSeries this parameter is unused and defaults to 0.
Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For
axis=1
, doframe.T.groupby(...)
instead.- levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both
by
andlevel
.- as_indexbool, default True
Return object with group labels as the index. Only relevant for WeightedDataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as
head()
,tail()
,nth()
and in transformations (see the transformations in the user guide).- sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original WeightedDataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as
head()
,tail()
,nth()
and in transformations (see the transformations in the user guide).Changed in version 2.0.0: Specifying
sort=False
with an ordered categorical grouper will no longer sort the values.- group_keysbool, default True
When calling apply and the
by
argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.Changed in version 1.5.0: Warns that
group_keys
will no longer be ignored when the result fromapply
is a like-indexed WeightedSeries or WeightedDataFrame. Specifygroup_keys
explicitly to include the group keys or not.Changed in version 2.0.0:
group_keys
now defaults toTrue
.- observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.
- dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.
- Returns:
- pandas.api.typing.WeightedSeriesGroupBy
Returns a groupby object that contains information about the groups.
See also
pandas.Series.resample
Convenience method for frequency conversion and resampling of time series.
Notes
See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.
Examples
>>> ser = pd.WeightedSeries([390., 350., 30., 20.], ... index=['Falcon', 'Falcon', 'Parrot', 'Parrot'], ... name="Max Speed") >>> ser Falcon 390.0 Falcon 350.0 Parrot 30.0 Parrot 20.0 Name: Max Speed, dtype: float64 >>> ser.groupby(["a", "b", "a", "b"]).mean() a 210.0 b 185.0 Name: Max Speed, dtype: float64 >>> ser.groupby(level=0).mean() Falcon 370.0 Parrot 25.0 Name: Max Speed, dtype: float64 >>> ser.groupby(ser > 100).mean() Max Speed False 25.0 True 370.0 Name: Max Speed, dtype: float64
Grouping by Indexes
We can groupby different levels of a hierarchical index using the level parameter:
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], ... ['Captive', 'Wild', 'Captive', 'Wild']] >>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type')) >>> ser = pd.WeightedSeries([390., 350., 30., 20.], index=index, name="Max Speed") >>> ser Animal Type Falcon Captive 390.0 Wild 350.0 Parrot Captive 30.0 Wild 20.0 Name: Max Speed, dtype: float64 >>> ser.groupby(level=0).mean() Animal Falcon 370.0 Parrot 25.0 Name: Max Speed, dtype: float64 >>> ser.groupby(level="Type").mean() Type Captive 210.0 Wild 185.0 Name: Max Speed, dtype: float64
We can also choose to include NA in group keys or not by defining dropna parameter, the default setting is True.
>>> ser = pd.WeightedSeries([1, 2, 3, 3], index=["a", 'a', 'b', np.nan]) >>> ser.groupby(level=0).sum() a 3 b 3 dtype: int64
>>> ser.groupby(level=0, dropna=False).sum() a 3 b 3 NaN 3 dtype: int64
>>> arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot'] >>> ser = pd.WeightedSeries([390., 350., 30., 20.], index=arrays, name="Max Speed") >>> ser.groupby(["a", "b", "a", np.nan]).mean() a 210.0 b 350.0 Name: Max Speed, dtype: float64
>>> ser.groupby(["a", "b", "a", np.nan], dropna=False).mean() a 210.0 b 350.0 NaN 20.0 Name: Max Speed, dtype: float64
- kurt(skipna=True)[source]
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters:
- axis{index (0)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- scalar or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse']) >>> s cat 1 dog 2 dog 2 mouse 3 dtype: int64 >>> s.kurt() 1.5
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]}, ... index=['cat', 'dog', 'dog', 'mouse']) >>> df a b cat 1 3 dog 2 4 dog 2 4 mouse 3 4 >>> df.kurt() a 1.5 b 4.0 dtype: float64
With axis=None
>>> df.kurt(axis=None).round(6) -0.988693
Using axis=1
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]}, ... index=['cat', 'dog']) >>> df.kurt(axis=1) cat -6.0 dog -6.0 dtype: float64
- kurtosis(*args, **kwargs)[source]
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
- Parameters:
- axis{index (0)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- scalar or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse']) >>> s cat 1 dog 2 dog 2 mouse 3 dtype: int64 >>> s.kurt() 1.5
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]}, ... index=['cat', 'dog', 'dog', 'mouse']) >>> df a b cat 1 3 dog 2 4 dog 2 4 mouse 3 4 >>> df.kurt() a 1.5 b 4.0 dtype: float64
With axis=None
>>> df.kurt(axis=None).round(6) -0.988693
Using axis=1
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]}, ... index=['cat', 'dog']) >>> df.kurt(axis=1) cat -6.0 dog -6.0 dtype: float64
- mean(skipna=True)[source]
Return the mean of the values over the requested axis.
- Parameters:
- axis{index (0)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- scalar or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.mean() 2.0
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) >>> df a b tiger 1 2 zebra 2 3 >>> df.mean() a 1.5 b 2.5 dtype: float64
Using axis=1
>>> df.mean(axis=1) tiger 1.5 zebra 2.5 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']}, ... index=['tiger', 'zebra']) >>> df.mean(numeric_only=True) a 1.5 dtype: float64
- median(*args, **kwargs)[source]
Return the median of the values over the requested axis.
- Parameters:
- axis{index (0)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- scalar or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.median() 2.0
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) >>> df a b tiger 1 2 zebra 2 3 >>> df.median() a 1.5 b 2.5 dtype: float64
Using axis=1
>>> df.median(axis=1) tiger 1.5 zebra 2.5 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']}, ... index=['tiger', 'zebra']) >>> df.median(numeric_only=True) a 1.5 dtype: float64
- quantile(q=0.5, interpolation='linear')[source]
Return value at the given quantile.
- Parameters:
- qfloat or array-like, default 0.5 (50% quantile)
The quantile(s) to compute, which can lie in range: 0 <= q <= 1.
- interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
linear: i + (j - i) * (x-i)/(j-i), where (x-i)/(j-i) is the fractional part of the index surrounded by i > j.
lower: i.
higher: j.
nearest: i or j whichever is nearest.
midpoint: (i + j) / 2.
- Returns:
- float or WeightedSeries
If
q
is an array, a WeightedSeries will be returned where the index isq
and the values are the quantiles, otherwise a float will be returned.
See also
pandas.core.window.rolling.Rolling.quantile
Calculate the rolling quantile.
numpy.percentile
Returns the q-th percentile(s) of the array elements.
Examples
>>> s = pd.WeightedSeries([1, 2, 3, 4]) >>> s.quantile(.5) 2.5 >>> s.quantile([.25, .5, .75]) 0.25 1.75 0.50 2.50 0.75 3.25 dtype: float64
- sample(*args, **kwargs)[source]
Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
- Parameters:
- nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
- fracfloat, optional
Fraction of axis items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once.
- weightsstr or ndarray-like, optional
Default ‘None’ results in equal probability weighting. If passed a WeightedSeries, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a WeightedDataFrame, will accept the name of a column when axis = 0. Unless weights are a WeightedSeries, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.
- random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
Changed in version 1.4.0: np.random.Generator objects now accepted
- axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For WeightedSeries this parameter is unused and defaults to None.
- ignore_indexbool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
New in version 1.3.0.
- Returns:
- WeightedSeries or WeightedDataFrame
A new object of same type as caller containing n items randomly sampled from the caller object.
See also
WeightedDataFrameGroupBy.sample
Generates random samples from each group of a WeightedDataFrame object.
WeightedSeriesGroupBy.sample
Generates random samples from each group of a WeightedSeries object.
numpy.random.choice
Generates a random sample from a given 1-D numpy array.
Notes
If frac > 1, replacement should be set to True.
Examples
>>> df = pd.WeightedDataFrame({'num_legs': [2, 4, 8, 0], ... 'num_wings': [2, 0, 0, 0], ... 'num_specimen_seen': [10, 2, 1, 8]}, ... index=['falcon', 'dog', 'spider', 'fish']) >>> df num_legs num_wings num_specimen_seen falcon 2 2 10 dog 4 0 2 spider 8 0 1 fish 0 0 8
Extract 3 random elements from the
WeightedSeries
df['num_legs']
: Note that we use random_state to ensure the reproducibility of the examples.>>> df['num_legs'].sample(n=3, random_state=1) fish 0 spider 8 falcon 2 Name: num_legs, dtype: int64
A random 50% sample of the
WeightedDataFrame
with replacement:>>> df.sample(frac=0.5, replace=True, random_state=1) num_legs num_wings num_specimen_seen dog 4 0 2 fish 0 0 8
An upsample sample of the
WeightedDataFrame
with replacement: Note that replace parameter has to be True for frac parameter > 1.>>> df.sample(frac=2, replace=True, random_state=1) num_legs num_wings num_specimen_seen dog 4 0 2 fish 0 0 8 falcon 2 2 10 falcon 2 2 10 fish 0 0 8 dog 4 0 2 fish 0 0 8 dog 4 0 2
Using a WeightedDataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1) num_legs num_wings num_specimen_seen falcon 2 2 10 fish 0 0 8
- sem(skipna=True)[source]
Return unbiased standard error of the mean over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
- Parameters:
- axis{index (0)}
For WeightedSeries this parameter is unused and defaults to 0.
Warning
The behavior of WeightedDataFrame.sem with
axis=None
is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- Returns:
- scalar or WeightedSeries (if level specified)
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.sem().round(6) 0.57735
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) >>> df a b tiger 1 2 zebra 2 3 >>> df.sem() a 0.5 b 0.5 dtype: float64
Using axis=1
>>> df.sem(axis=1) tiger 0.5 zebra 0.5 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2], 'b': ['T', 'Z']}, ... index=['tiger', 'zebra']) >>> df.sem(numeric_only=True) a 0.5 dtype: float64
- skew(skipna=True)[source]
Return unbiased skew over requested axis.
Normalized by N-1.
- Parameters:
- axis{index (0)}
Axis for the function to be applied on. For WeightedSeries this parameter is unused and defaults to 0.
For WeightedDataFrames, specifying
axis=None
will apply the aggregation across both axes.New in version 2.0.0.
- skipnabool, default True
Exclude NA/null values when computing the result.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- **kwargs
Additional keyword arguments to be passed to the function.
- Returns:
- scalar or scalar
Examples
>>> s = pd.WeightedSeries([1, 2, 3]) >>> s.skew() 0.0
With a WeightedDataFrame
>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]}, ... index=['tiger', 'zebra', 'cow']) >>> df a b c tiger 1 2 1 zebra 2 3 3 cow 3 4 5 >>> df.skew() a 0.0 b 0.0 c 0.0 dtype: float64
Using axis=1
>>> df.skew(axis=1) tiger 1.732051 zebra -1.732051 cow 0.000000 dtype: float64
In this case, numeric_only should be set to True to avoid getting an error.
>>> df = pd.WeightedDataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']}, ... index=['tiger', 'zebra', 'cow']) >>> df.skew(numeric_only=True) a 0.0 dtype: float64
- std(*args, **kwargs)[source]
Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
- Parameters:
- axis{index (0)}
For WeightedSeries this parameter is unused and defaults to 0.
Warning
The behavior of WeightedDataFrame.std with
axis=None
is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- Returns:
- scalar or WeightedSeries (if level specified)
Notes
To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
Examples
>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3], ... 'age': [21, 25, 62, 43], ... 'height': [1.61, 1.87, 1.49, 2.01]} ... ).set_index('person_id') >>> df age height person_id 0 21 1.61 1 25 1.87 2 62 1.49 3 43 2.01
The standard deviation of the columns can be found as follows:
>>> df.std() age 18.786076 height 0.237417 dtype: float64
Alternatively, ddof=0 can be set to normalize by N instead of N-1:
>>> df.std(ddof=0) age 16.269219 height 0.205609 dtype: float64
- var(skipna=True)[source]
Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
- Parameters:
- axis{index (0)}
For WeightedSeries this parameter is unused and defaults to 0.
Warning
The behavior of WeightedDataFrame.var with
axis=None
is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).- skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
- numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for WeightedSeries.
- Returns:
- scalar or WeightedSeries (if level specified)
Examples
>>> df = pd.WeightedDataFrame({'person_id': [0, 1, 2, 3], ... 'age': [21, 25, 62, 43], ... 'height': [1.61, 1.87, 1.49, 2.01]} ... ).set_index('person_id') >>> df age height person_id 0 21 1.61 1 25 1.87 2 62 1.49 3 43 2.01
>>> df.var() age 352.916667 height 0.056367 dtype: float64
Alternatively,
ddof=0
can be set to normalize by N instead of N-1:>>> df.var(ddof=0) age 264.687500 height 0.042275 dtype: float64
- class anesthetic.weighted_pandas.WeightedSeriesGroupBy(*args, **kwargs)[source]
Weighted version of
pandas.core.groupby.SeriesGroupBy
.- cov(*args, **kwargs)[source]
Compute covariance with WeightedSeries, excluding missing values.
The two WeightedSeries objects are not required to be the same length and will be aligned internally before the covariance is calculated.
- Parameters:
- otherWeightedSeries
WeightedSeries with which to compute the covariance.
- min_periodsint, optional
Minimum number of observations needed to have a valid result.
- ddofint, default 1
Delta degrees of freedom. The divisor used in calculations is
N - ddof
, whereN
represents the number of elements.
- Returns:
- float
Covariance between WeightedSeries and other normalized by N-1 (unbiased estimator).
See also
WeightedDataFrame.cov
Compute pairwise covariance of columns.
Examples
>>> s1 = pd.WeightedSeries([0.90010907, 0.13484424, 0.62036035]) >>> s2 = pd.WeightedSeries([0.12528585, 0.26962463, 0.51111198]) >>> s1.cov(s2) -0.01685762652715874
- sample(*args, **kwargs)[source]
Return a random sample of items from each group.
You can use random_state for reproducibility.
- Parameters:
- nint, optional
Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.
- fracfloat, optional
Fraction of items to return. Cannot be used with n.
- replacebool, default False
Allow or disallow sampling of the same row more than once.
- weightslist-like, optional
Default None results in equal probability weighting. If passed a list-like then values must have the same length as the underlying WeightedDataFrame or WeightedSeries object and will be used as sampling probabilities after normalization within each group. Values must be non-negative with at least one positive element within each group.
- random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
Changed in version 1.4.0: np.random.Generator objects now accepted
- Returns:
- WeightedSeries or WeightedDataFrame
A new object of same type as caller containing items randomly sampled within each group from the caller object.
See also
WeightedDataFrame.sample
Generate random samples from a WeightedDataFrame object.
numpy.random.choice
Generate a random sample from a given 1-D numpy array.
Examples
>>> df = pd.WeightedDataFrame( ... {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)} ... ) >>> df a b 0 red 0 1 red 1 2 blue 2 3 blue 3 4 black 4 5 black 5
Select one row at random for each distinct value in column a. The random_state argument can be used to guarantee reproducibility:
>>> df.groupby("a").sample(n=1, random_state=1) a b 4 black 4 2 blue 2 1 red 1
Set frac to sample fixed proportions rather than counts:
>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2) 5 5 2 2 0 0 Name: b, dtype: int64
Control sample probabilities within groups by setting weights:
>>> df.groupby("a").sample( ... n=1, ... weights=[1, 1, 1, 0, 0, 1], ... random_state=1, ... ) a b 5 black 5 2 blue 2 0 red 0
- class anesthetic.weighted_pandas._WeightedObject(*args, **kwargs)[source]
Common methods for WeightedSeries and WeightedDataFrame.
- reset_index(level=None, drop=False, inplace=False, *args, **kwargs)[source]
Reset the index, retaining weights.
- set_weights(weights, axis=0, inplace=False, level=None)[source]
Set sample weights along an axis.
- Parameters:
- weights1d array-like
The sample weights to put in an index.
- axisint (0,1), default=0
Whether to put weights in an index or column.
- inplacebool, default=False
Whether to operate inplace, or return a new array.
- levelint
Which level in the index to insert before. Defaults to inserting at back
- anesthetic.weighted_pandas.cls
alias of
WeightedSeriesGroupBy