mcetl.functions

Contains the classes for Functions objects.

There are three main types of Functions:
  1. PreprocessFunction: preprocesses the imported data entry; for example, can

    separate into multiple data entries or remove data columns.

  2. CalculationFunction: performs a calculation on each of the entries within

    each sample within each dataset.

  3. SummaryFunction: performs a calculation once per sample or once per

    dataset.

@author: Donald Erb Created on Jul 31, 2020

Module Contents

Classes

CalculationFunction

Function that performs a calculation for every entry in each sample.

PreprocessFunction

Function for processing data before performing any calculations.

SummaryFunction

Calculation that is only performed once per sample or once per dataset.

class mcetl.functions.CalculationFunction(name, target_columns, functions, added_columns, function_kwargs=None)

Bases: mcetl.functions._FunctionBase

Function that performs a calculation for every entry in each sample.

Parameters
  • name (str) -- The string representation for this object.

  • target_columns (str or list(str) or tuple(str)) -- A string or list/tuple of strings designating the target columns for this object.

  • functions (Callable or list(Callable, Callable) or tuple(Callable, Callable)) -- The functions that this object uses to process data. If only one function is given, it is assumed that the same function is used for both calculations for the data to be written to Excel and the data to be used in python. If a list/tuple of functions are given, it is assumed that the first function is used for processing the data to write to Excel, and the second function is used for processing the data to be used in python. The function should take args of list(list(pd.DataFrame)), target indices (a list of lists of lists of numbers, corresponding to the column index in each dataframe for each of the target columns), added columns (a list of lists of numbers, corresponding to the column index in the dataframe for each added column), the Excel columns (a list of column names corresponding to the column in Excel for each added column, eg 'A', 'B', if doing the Excel calculations, otherwise None), the first row (the row number of the first row of data in Excel, eg 3). The function should output a list of lists of pd.DataFrames. The Excel columns and first row values are meant to ease the writing of formulas for Excel.

  • added_columns (int or str or list(str) or tuple(str)) -- The columns that will be acted upon by this object's functions. If the input is an integer, then it denotes that the functions act on columns that need to be added, with the number of columns affected by the functions being equal to the input integer. If the input is a string or list/tuple of strings, it denotes that the functions will change the contents of an existing column(s), whose column names are the inputs.

  • function_kwargs (dict or list(dict, dict), optional) -- A dictionary or a list of two dictionaries containing keyword arguments to be passed to the functions. If a list of two dictionaries is given, the first and second dictionaries will be the keyword arguments to pass to the function for processing the data to write to Excel and the function for processing the data to be used in python, respectively. If a single dictionary is given, then it is used for both functions. The default is None, which passes an empty dictionary to both functions.

Raises

ValueError -- Raised if there is an issue with added_columns or target_columns, or if any key in the input function_kwargs is within self._forbidden_keys.

class mcetl.functions.PreprocessFunction(name, target_columns, function, function_kwargs=None, deleted_columns=None)

Bases: mcetl.functions._FunctionBase

Function for processing data before performing any calculations.

For example, can separate a single data entry into multiple entries depending on a criteria or delete unneeded columns.

Parameters
  • name (str) -- The string representation for this object.

  • target_columns (str or list(str) or tuple(str)) -- A string or list/tuple of strings designating the target columns for this object.

  • function (Callable) -- The function that this object uses to process data. The function should take args of dataframe and target indices (a list of numbers, corresponding to the column index in the dataframe for each of the target columns), and should return a list of dataframes.

  • function_kwargs (dict, optional) -- A dictionary of keywords and values to be passed to the function. The default is None.

  • deleted_columns (str or list(str) or tuple(str), optional) -- The names of columns that will be deleted by this object's function.

Raises

ValueError -- Raised if there is an issue with the input name or target_columns.

class mcetl.functions.SummaryFunction(name, target_columns, functions, added_columns, function_kwargs=None, sample_summary=True)

Bases: mcetl.functions.CalculationFunction

Calculation that is only performed once per sample or once per dataset.

Parameters
  • name (str) -- The string representation for this object.

  • target_columns (str or list(str) or tuple(str)) -- A string or list/tuple of strings designating the target columns for this object.

  • functions (Callable or list(Callable, Callable) or tuple(Callable, Callable)) -- The functions that this object uses to process data. If only one function is given, it is assumed that the same function is used for both calculations for the data to be written to Excel and the data to be used in python. If a list/tuple of functions are given, it is assumed that the first function is used for processing the data to write to Excel, and the second function is used for processing the data to be used in python. The function should take args of list(list(pd.DataFrame)), target indices (a list of lists of lists of numbers, corresponding to the column index in each dataframe for each of the target columns), added columns (a list of lists of numbers, corresponding to the column index in the dataframe for each added column), the Excel columns (a list of column names corresponding to the column in Excel for each added column, eg 'A', 'B', if doing the Excel calculations, otherwise None), the first row (the row number of the first row of data in Excel, eg 3). The function should output a list of lists of pd.DataFrames. The Excel columns and first row values are meant to ease the writing of formulas for Excel.

  • added_columns (int or str or list(str) or tuple(str)) -- The columns that will be acted upon by this object's functions. If the input is an integer, then it denotes that the functions act on columns that need to be added, with the number of columns affected by the functions being equal to the input integer. If the input is a string or list/tuple of strings, it denotes that the functions will change the contents of an existing column(s), whose column names are the inputs. Further, SummaryFunctions can only modify other SummaryFunction columns with matching sample_summary attributes.

  • function_kwargs (dict or list(dict, dict), optional) -- A dictionary or a list of two dictionaries containing keyword arguments to be passed to the functions. If a list of two dictionaries is given, the first and second dictionaries will be the keyword arguments to pass to the function for processing the data to write to Excel and the function for processing the data to be used in python, respectively. The default is None, which passes an empty dictionary to both functions.

  • sample_summary (bool, optional) -- If True (default), denotes that the SummaryFunction summarizes a sample; if False, denotes that the SummaryFunction summarizes a dataset.

Raises

ValueError -- Raised if there is an issue with added_columns or target_columns, or if any key in the input function_kwargs is within self._forbidden_keys.