mcetl.utils

Provides utility functions, classes, and constants.

Useful functions are put here in order to prevent circular importing within the other files.

The functions contained within this module ease the use of user-interfaces, selecting options for opening files, and working with Excel.

@author: Donald Erb Created on Jul 15, 2020

mcetl.utils.PROCEED_COLOR

The button color for all buttons that proceed to the next window. The default is ('white', '#00A949'), where '#00A949' is a bright green.

Type

tuple(str, str)

Module Contents

Functions

check_availability

Checks whether an optional dependency is available to import.

doc_lru_cache

Decorator that allows keeping a function's docstring when using functools.lru_cache.

excel_column_name

Converts 1-based index to the Excel column name.

get_min_size

Returns the minimum size for a GUI element to match the screen size.

open_multiple_files

Creates a prompt to open multiple files and add their contents to a dataframe.

optimize_memory

Optimizes dataframe memory usage by converting data types.

raw_data_import

Used to import data from the specified file into pandas DataFrames.

safely_close_window

Closes a PySimpleGUI window and removes the window and its layout.

select_file_gui

GUI to select a file and input the necessary options to import its data.

series_to_numpy

Tries to convert a pandas Series to a numpy array with the desired dtype.

set_dpi_awareness

Sets DPI awareness for Windows operating system so that GUIs are not blurry.

show_dataframes

Used to show data to help select the right columns or datasets from the data.

string_to_unicode

Converts strings to unicode by replacing '\\' with '\'.

stringify_backslash

Fixes strings containing backslash, such as '\n', so that they display properly in GUIs.

validate_inputs

Validates entries from a PySimpleGUI window and converts to the desired type.

validate_sheet_name

Ensures that the desired Excel sheet name is valid.

exception mcetl.utils.WindowCloseError

Bases: Exception

Custom exception to allow exiting a GUI window to stop the program.

Initialize self. See help(type(self)) for accurate signature.

with_traceback()

Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.

mcetl.utils.check_availability(module)

Checks whether an optional dependency is available to import.

Does not check the module version since it is assumed that the parent module will do a version check if the module is actually usable.

Parameters

module (str) -- The name of the module.

Returns

True if the module can be imported, False if it cannot.

Return type

bool

Notes

It is faster to use importlib to check the availability of the module rather than doing a try-except block to try and import the module, since importlib does not actually import the module.

mcetl.utils.doc_lru_cache(function=None, **lru_cache_kwargs)

Decorator that allows keeping a function's docstring when using functools.lru_cache.

Parameters
  • function (Callable) -- The function to use. If used as a decorator and lru_cache_kwargs are specified, then function will be None.

  • **lru_cache_kwargs -- Any keyword arguments to pass to functools.lru_cache (maxsize and/or typed, as of Python 3.9).

Examples

A basic usage of this decorator would look like:

>>> @doc_lru_cache(maxsize=200)
    def function(arg, kwarg=1)
        return arg + kwarg
mcetl.utils.excel_column_name(index)

Converts 1-based index to the Excel column name.

Parameters

index (int) -- The column number. Must be 1-based, ie. the first column number is 1 rather than 0.

Returns

col_name -- The column name for the input index, eg. an index of 1 returns 'A'.

Return type

str

Raises

ValueError -- Raised if the input index is not in the range 1 <= index <= 18278, meaning the column name is not within 'A'...'ZZZ'.

Notes

Caches the result so that any repeated index lookups are faster, and uses recursion to make better usage of the cache.

chr(64 + remainder) converts the remainder to a character, where 64 denotes ord('A') - 1, so if remainder = 1, chr(65) = 'A'.

mcetl.utils.get_min_size(default_size, scale, dimension='both')

Returns the minimum size for a GUI element to match the screen size.

Parameters
  • default_size (int) -- The default number of pixels to use. Needed because sg.Window.get_screen_size() can return the total screen size when using multiple screens on some linux systems.

  • scale (float) -- The scale factor to apply to the screen size as reported by sg.Window.get_screen_size. For example, if the element size was desired to be at most 50% of the minimum screen dimension, then the scale factor is 0.5.

  • dimension (str) -- The screen dimension to compare. Can be either 'width', 'height', or 'both'.

Returns

The minimum pixel count among scale * screen height, scale * screen width, and default_size.

Return type

int

mcetl.utils.open_multiple_files()

Creates a prompt to open multiple files and add their contents to a dataframe.

Returns

dataframes -- A list of dataframes containing the imported data from the selected files.

Return type

list

mcetl.utils.optimize_memory(dataframe, convert_objects=False)

Optimizes dataframe memory usage by converting data types.

Optimizes object dtypes by trying to convert to other dtypes, if the pandas version is greater than 1.0.0. Optimizes numerical dtypes by downcasting to the most appropriate dtype.

Parameters
  • dataframe (pd.DataFrame) -- The dataframe to optimize.

  • convert_objects (bool, optional) -- If True, will attempt to convert columns with object dtype if the pandas version is >= 1.0.0.

Returns

dataframe -- The memory-optimized dataframe.

Return type

pd.DataFrame

Notes

convert_objects is needed because currently, when object columns are converted to a dtype of string, the row becomes a StringArray object, which does not have the tolist() method curently implemented (as of pandas version 1.0.5). openpyxl's dataframe_to_rows method uses each series's series.values.tolist() method to convert the dataframe into a generator of rows, so having a StringArray row without a tolist method causes an exception when using openpyxl's dataframe_to_rows.

Do not convert object dtypes to pandas's Int and Float dtypes since they do not mesh well with other modules.

Iterate through columns one at a time rather using dataframe.select_dtypes so that each column is overwritten immediately, rather than making a copy of all the selected columns, reducing memory usage.

mcetl.utils.raw_data_import(window_values, file, show_popup=True)

Used to import data from the specified file into pandas DataFrames.

Also used to show how data will look after using certain import values.

Parameters
  • window_values (dict) -- A dictionary with keys 'row_start', 'row_end', columns', 'separator', and optionally 'sheet'.

  • file (str or pathlib.Path or pd.ExcelFile) -- A string or Path for the file to be imported, or a pandas ExcelFile, to use for reading spreadsheet data.

  • show_popup (bool) -- If True, will display a popup window showing a table of the data.

Returns

dataframes -- A list of dataframes containing the data after importing if show_popup is False, otherwise returns None.

Return type

list(pd.DataFrame) or None

Notes

If using a spreadsheet format ('xls', 'xlsx', 'odf', etc.), allows using any of the available engines for pandas.read_excel, and will just let pandas notify the user if the proper engine is not installed.

Optimizes the memory usage of the imported data before returning.

mcetl.utils.safely_close_window(window)

Closes a PySimpleGUI window and removes the window and its layout.

Used when exiting a window early by manually closing the window. Ensures that the window is properly closed and then raises a WindowCloseError exception, which can be used to determine that the window was manually closed.

Parameters

window (sg.Window) -- The window that will be closed.

Raises

WindowCloseError -- Custom exception to notify that the window has been closed earlier than expected.

mcetl.utils.select_file_gui(data_source=None, file=None, previous_inputs=None, assign_columns=False)

GUI to select a file and input the necessary options to import its data.

Parameters
  • data_source (DataSource, optional) -- The DataSource object used for opening the file.

  • file (str, optional) -- A string containing the path to the file to be imported.

  • previous_inputs (dict, optional) -- A dictionary containing the values from a previous usage of this function, that will be used to overwrite the defaults. Note, if opening Excel files, the previous_inputs will have no effect.

  • assign_columns (bool, optional) -- If True, designates that the columns for each unique variable in the data source need to be identified. If False (or if data_source is None), then will not prompt user to select columns for variables.

Returns

values -- A dictionary containing the items necessary for importing data from the selected file.

Return type

dict

Notes

If using a spreadsheet format ('xls', 'xlsx', 'odf', etc.), allows using any of the available engines for pandas.read_excel, and will just let pandas notify the user if the proper engine is not installed. The file selection window, however, will only show 'xlsx', 'xlsm', 'csv', 'txt', and potentially 'xls', so that users are not steered towards selecting a format that does not work with the default mcetl libraries.

mcetl.utils.series_to_numpy(series, dtype=float)

Tries to convert a pandas Series to a numpy array with the desired dtype.

If initial conversion does not work, tries to convert series to object first. If that is not successful and if the first item is a string, assumes the first item is a header, converts it to None, and tries the conversion. If that is still unsuccessful, then an array of the series is returned without changing the dtype.

Parameters
  • series (pd.Series) -- The series to convert to numpy with the desired dtype.

  • dtype (type, optional) -- The dtype to use in the numpy array of the series. Default is float.

Returns

output -- The input series with the specified dtype if conversion was successful. Otherwise, the output is an ndarray of the input series without dtype conversion.

Return type

np.ndarray

Notes

This function is needed because pandas's pd.NA and extension arrays do not work well with other modules and can be difficult to convert.

mcetl.utils.set_dpi_awareness(awareness_level=1)

Sets DPI awareness for Windows operating system so that GUIs are not blurry.

Fixes blurry tkinter GUIs due to weird dpi scaling in Windows os. Other operating systems are ignored.

Parameters

awareness_level ({1, 0, 2}) -- The dpi awareness level to set. 0 turns off dpi awareness, 1 sets dpi awareness to scale with the system dpi and automatically changes when the system dpi changes, and 2 sets dpi awareness per monitor and does not change when system dpi changes. Default is 1.

Raises

ValueError -- Raised if awareness_level is not 0, 1, or 2.

Notes

Will only work on Windows 8.1 or Windows 10. Not sure if earlier versions of Windows have this issue anyway.

mcetl.utils.show_dataframes(dataframes, title='Raw Data')

Used to show data to help select the right columns or datasets from the data.

Parameters
  • dataframes (list or pd.DataFrame) -- Either (1) a pandas DataFrame, (2) a list of DataFrames, or (3) a list of lists of DataFrames. The layout of the window will depend on the input type.

  • title (str, optional) -- The title for the popup window.

Returns

window -- If no exceptions occur, a PySimpleGUI window will be returned; otherwise, None will be returned.

Return type

sg.Window or None

mcetl.utils.string_to_unicode(input_list)

Converts strings to unicode by replacing '\\' with '\'.

Necessary because user input from text elements in GUIs are raw strings and will convert any '\' input by the user to '\\', which will not be converted to the desired unicode. If the string already has unicode characters, it will be left alone.

Also converts things like '\\n' and '\\t' to '\n' and '\t', respectively, so that inputs are correctly interpreted.

Parameters

input_list ((list, tuple) or str) -- A container of strings or a single string.

Returns

output -- A container of strings or a single string, depending on the input, with the unicode correctly converted.

Return type

(list, tuple) or str

Notes

Uses raw_unicode_escape encoding to ensure that any existing unicode is correctly decoded; otherwise, it would translate incorrectly.

If using mathtext in matplotlib and want to do something like $\nu$, input $\\nu$ in the GUI, which gets converted to $\\\\nu$ by the GUI, and in turn will be converted back to $\\nu$ by this fuction, which matplotlib considers equivalent to $\nu$.

mcetl.utils.stringify_backslash(input_string)

Fixes strings containing backslash, such as '\n', so that they display properly in GUIs.

Parameters

input_string (str) -- The string that potentially contains a backslash character.

Returns

output_string -- The string after replacing various backslash characters with their double backslash versions.

Return type

str

Notes

It is necessary to replace multiple characters because things like '\n' are considered unique characters, so simply replacing the '\' would not work.

mcetl.utils.validate_inputs(window_values, integers=None, floats=None, strings=None, user_inputs=None, constraints=None)

Validates entries from a PySimpleGUI window and converts to the desired type.

Parameters
  • window_values (dict) -- A dictionary of values from a PySimpleGUI window, generated by using window.read().

  • integers (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be integers.

  • floats (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be floats.

  • strings (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be non-empty strings.

  • user_inputs (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be a certain data type; the values are first determined by separating each value using ',' (default) or the last index.

  • constraints (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be ints or floats constrained between upper and lower bounds.

Returns

True if all data in the window_values dictionary is correct. False if there is any error with the values in the window_values dictionary.

Return type

bool

Notes

Inputs for integers, floats, and strings are

[[key, display text],].

For example: [['peak_width', 'peak width']]

Inputs for user_inputs are

[[key, display text, data type, allow_empty_input (optional), separator (optional)],],

where separator is a string, and allow_empty_input is a boolean. If no separator is given, it is assumed to be a comma (','), and if no allow_empty_input value is given, it is assumed to be False. user_inputs can also be used to run the inputs through a function by setting the data type to a custom function. Use None as the separator if only a single value is wanted. For example: [

['peak_width', 'peak width', float], # ensures each entry is a float ['peak_width_2', 'peak width 2', int, False, ';'], # uses ';' as the separator ['peak_width_3', 'peak width 3', function, False, None], # no separator, verify with function ['peak_width_4', 'peak width 4', function, True, None] # allows empty input

]

Inputs for constraints are

[[key, display text, lower bound, upper bound (optional)],],

where lower and upper bounds are strings with the operator and bound, such as "> 10". If lower bound or upper bound is None, then the operator and bound is assumed to be >=, -np.inf and <=, np.inf, respectively. For example: [

['peak_width', 'peak width', '> 10', '< 20'], # 10 < peak_width < 20 ['peak_width_2', 'peak width 2', None, '<= 5'] # -inf <= peak_width_2 <= 5 ['peak_width_3', 'peak width 3', '> 1'] # 1 < peak_width_2 <= inf

]

The display text will be the text that is shown to the user if the value of window_values[key] fails the validation.

#TODO eventually collect all errors so they can all be fixed at once.

mcetl.utils.validate_sheet_name(sheet_name)

Ensures that the desired Excel sheet name is valid.

Parameters

sheet_name (str) -- The desired sheet name.

Returns

sheet_name -- The input sheet name. Only returned if it is valid.

Return type

str

Raises

ValueError -- Raised if the sheet name is greater than 31 characters or if it contains any of the following: \, /, ?, *, [, ], :