Preprocessing
- preprocessing.get_config_path() str
Returns the full path to the config.ini file in the working directory.
- preprocessing.import_data(path: str, name: str = None, sensor: str = 'dexcom', id_template: str = None, glucose: str = None, time: str = None, interval: int | None = None, max_gap: int = 45, output=<built-in function print>) DataFrame
Returns a Multiindexed Pandas DataFrame containing all of the csv data found at the given path. The path can lead to a directory, .zip file, or a .csv file. The returned DataFrame holds columns for timestamps and glucose values, and is indexed by patient identifications
- Parameters:
path (str) – the path of the directory/zip/csv to be parsed through
sensor (str, optional) – the CGM device model used (either dexcom, freestyle libre pro, or freestyle libre 2 / freestyle libre 3), defaults to ‘dexcom’
id_template (str, optional) – regex dictating how to parse each CSV file’s name for the proper patient identification, defaults to None
glucose (str, optional) – the name of the column containing the glucose values in the .csv files (if different than the default for the CGM sensor being used), defaults to None
time (str, optional) – the name of the column containing the timestamps in the .csv files (if different than the default for the CGM sensor being used), defaults to None
interval (int | None, optional) – the resampling interval (in minutes) that the data should follow. If None, uses the interval from config, defaults to None
max_gap (int, optional) – the maximum amount of minutes a gap in the data can be interpolated, defaults to 45 (filling in a gap with a longer duration would be considered extrapolation)
- Returns:
A Pandas DataFrame containing the preprocessed data found at the given path. This DataFrame holds columns for timestamps, glucose values, weekday/weekend chunking, and waking/sleeping time chunking.
- Return type:
pandas.DataFrame
- Example:
>>> path_to_data = "datasets/patient_data.csv" >>> df = import_data(path_to_data)
- preprocessing.load_config() ConfigParser
Loads the working-directory ‘config.ini’ if present; otherwise loads the default config from the package and writes it out to the working dir. Returns a ConfigParser object.
- preprocessing.preprocess_data(df: DataFrame, interval: int | None = None, max_gap: int = 45) DataFrame
Returns a Pandas DataFrame containing the preprocessed CGM data within the given dataframe. As part of the preprocessing phase, the data will be converted into the proper data types, resampled, interpolated, chunked, and indexed by identification (alongside all ‘Low’s and ‘High’s being replaced and all edge null values being dropped)
- Parameters:
df (pandas.DataFrame) – the Pandas DataFrame containing the CGM data to preprocess
interval (int | None, optional) – the resampling interval (in minutes) that the data should follow. If None, uses the interval from config, defaults to None
max_gap (int, optional) – the maximum duration (in minutes) of a gap in the data that should be interpolated, defaults to 45
- Returns:
A Pandas DataFrame containing the preprocessed CGM data. This DataFrame is indexed by identification and holds columns for timestamps, glucose values, day chunking, and time chunking.
- Return type:
pandas.DataFrame
- Example:
>>> # 'df' is a Pandas DataFrame already containing your CGM data, with columns for glucose values, timestamps, and identification >>> preprocessed_df = preprocess_data(df)
- preprocessing.save_config(config: ConfigParser)
Helper to save the current state of the config back to ‘config.ini’ in the working directory.
- preprocessing.segment_data(path: str, df: DataFrame) DataFrame
Splits patients’ data into multiple segments based on a given .csv file containing ID’s and DateTimes.
- Parameters:
path (str) – path of the .csv file containing identifications and timestamps indicating where to split the given DataFrame
df (pandas.DataFrame) – the DataFrame to split based on the given .csv file
- Returns:
a Pandas DataFrame with the data split accordingly
- Return type:
pandas.DataFrame