cleanup_data#

agabpylib.stats.robustrollingstats.cleanup_data(dframe, colname, window=50)#

Remove outliers from the data frame by appling a filtering based on the rolling median and RSE. Any points further than 3*RSE from the local median in the series are removed.

Parameters:
  • dframe (Pandas DataFrame) – Pandas data frame with the data. ASSUMED TO HAVE BEEN SORTED IN THE PROPER ORDER.

  • colname (str) – Name of column for which the data is to be cleaned.

  • Window (int) – Number of data points from which to calculate the local median.

Returns:

  • clean_dframe (Pandas DataFrame)

  • rmedian (Pandas Series)

  • rolling_rse (Pandas Series)

  • cleanset (Pandas Series) – The cleaned up data frame, the rolling median series, and the rolling rse series. The boolean vector cleanset is the series indicating which points were selected for the clean data set.