Preprocessing

preprocessing.get_config_path() str

Returns the full path to the config.ini file in the working directory.

preprocessing.import_data(path: str, name: str = None, sensor: str = 'dexcom', id_template: str = None, glucose: str = None, time: str = None, interval: int | None = None, max_gap: int = 45, output=<built-in function print>) DataFrame

Returns a Multiindexed Pandas DataFrame containing all of the csv data found at the given path. The path can lead to a directory, .zip file, or a .csv file. The returned DataFrame holds columns for timestamps and glucose values, and is indexed by patient identifications

Parameters:
  • path (str) – the path of the directory/zip/csv to be parsed through

  • sensor (str, optional) – the CGM device model used (either dexcom, freestyle libre pro, or freestyle libre 2 / freestyle libre 3), defaults to ‘dexcom’

  • id_template (str, optional) – regex dictating how to parse each CSV file’s name for the proper patient identification, defaults to None

  • glucose (str, optional) – the name of the column containing the glucose values in the .csv files (if different than the default for the CGM sensor being used), defaults to None

  • time (str, optional) – the name of the column containing the timestamps in the .csv files (if different than the default for the CGM sensor being used), defaults to None

  • interval (int | None, optional) – the resampling interval (in minutes) that the data should follow. If None, uses the interval from config, defaults to None

  • max_gap (int, optional) – the maximum amount of minutes a gap in the data can be interpolated, defaults to 45 (filling in a gap with a longer duration would be considered extrapolation)

Returns:

A Pandas DataFrame containing the preprocessed data found at the given path. This DataFrame holds columns for timestamps, glucose values, weekday/weekend chunking, and waking/sleeping time chunking.

Return type:

pandas.DataFrame

Example:

>>> path_to_data = "datasets/patient_data.csv"
>>> df = import_data(path_to_data)
preprocessing.load_config() ConfigParser

Loads the working-directory ‘config.ini’ if present; otherwise loads the default config from the package and writes it out to the working dir. Returns a ConfigParser object.

preprocessing.preprocess_data(df: DataFrame, interval: int | None = None, max_gap: int = 45) DataFrame

Returns a Pandas DataFrame containing the preprocessed CGM data within the given dataframe. As part of the preprocessing phase, the data will be converted into the proper data types, resampled, interpolated, chunked, and indexed by identification (alongside all ‘Low’s and ‘High’s being replaced and all edge null values being dropped)

Parameters:
  • df (pandas.DataFrame) – the Pandas DataFrame containing the CGM data to preprocess

  • interval (int | None, optional) – the resampling interval (in minutes) that the data should follow. If None, uses the interval from config, defaults to None

  • max_gap (int, optional) – the maximum duration (in minutes) of a gap in the data that should be interpolated, defaults to 45

Returns:

A Pandas DataFrame containing the preprocessed CGM data. This DataFrame is indexed by identification and holds columns for timestamps, glucose values, day chunking, and time chunking.

Return type:

pandas.DataFrame

Example:

>>> # 'df' is a Pandas DataFrame already containing your CGM data, with columns for glucose values, timestamps, and identification
>>> preprocessed_df = preprocess_data(df)
preprocessing.save_config(config: ConfigParser)

Helper to save the current state of the config back to ‘config.ini’ in the working directory.

preprocessing.segment_data(path: str, df: DataFrame) DataFrame

Splits patients’ data into multiple segments based on a given .csv file containing ID’s and DateTimes.

Parameters:
  • path (str) – path of the .csv file containing identifications and timestamps indicating where to split the given DataFrame

  • df (pandas.DataFrame) – the DataFrame to split based on the given .csv file

Returns:

a Pandas DataFrame with the data split accordingly

Return type:

pandas.DataFrame