Data.outlier_detection(column=None, window=7, shape='boxcar', certainty=3.0, action='mask', width=1, func=None, **kargs)

Detect outliers in a column of data.


column (column index) – specifing column for outlier detection. If not set, defaults to the current y set column.

Keyword Arguments
  • window (int) – data window for anomoly detection

  • shape (str) – The name of a scipy.signal windowing function to use when averaging the data. Defaults to ‘boxcar’ for a flat average.

  • certainty (float) – eg 3 detects data 3 standard deviations from average

  • action (str or callable) – what to do with outlying points, options are * ‘mask’ outlier points are masked (default) * ‘mask row’ outlier rows are masked * ‘delete’ outlier rows are deleted * callable the value of the action keyword is called with the outlier row * anything else defaults to do nothing.

  • width (odd integer) – Number of rows that an outliing spike could occupy. Defaults to 1.

  • func (callable) – A function that determines if the current row is an outlier.

  • action_args (tuple) – if action is callable, then action_args can be used to pass extra arguments to the action callable

  • action_kargs (dict) – If action is callable, then action_kargs can be useed to pass extra keyword arguments to the action callable.


(Stoner.Data) – The newly modified Data object.

outlier_detection will add row numbers of detected outliers to the metadata of d, also will perform action depending on request eg ‘mask’, ‘delete’ (any other action defaults to doing nothing).

The detection looks at a window of the data, takes the average and looks to see if the current data point falls certainty * std deviations away from data average.

The outlier detection function has the signatrure:

def outlier(row,column,window,certainity,**kargs)
    return True # or False

All extra keyword arguments are passed to the outlier detector.

IF action is a callable function then it should take the form of:

def action(i,column, data, *action_args, **action_kargs):

where i is the number of the outlier row, column the same value as above and data is the complete set of data.

In all cases the indices of the outlier rows are added to the ;outlier’ metadata.


"""Detect outlying points from a lione."""
import numpy as np

from Stoner import Data
from Stoner.analysis.utils import poly_outlier

x = np.linspace(0, 100, 201)
y = 0.01 * x ** 2 + 5 * np.sin(x / 10.0)

i = np.random.randint(len(x) - 20, size=20) + 10
y[i] += np.random.normal(size=len(i), scale=20)

d = Data(np.column_stack((x, y)), column_headers=["x", "y"], setas="xy")
d.plot(fmt="b.", label="raw data")
e = d.clone
e.outlier_detection(window=5, action="delete")
e.plot(fmt="r-", label="Default Outliers removed")
h = d.clone
h.outlier_detection(window=3, action="delete", shape="hanning")
h.plot(color="orange", label="Default Outliers removed with Hanning window")
f = d.clone
    window=21, order=3, certainty=2, width=3, action="delete", func=poly_outlier
f.plot(fmt="g-", label="Poly Outliers removed")
g = d.clone
g = g.outlier_detection(
    window=21, order=3, certainty=3, width=3, action="delete", func=poly_outlier
g.plot(color="purple", label="Masked outliers")
g = d.clone
e.title = "Outlier detection test"

(png, hires.png, pdf)