mcetl Documentation

_images/logo.png

mcetl is a small-scale, GUI-based Extract-Transform-Load framework focused on materials characterization.

mcetl is focused on reducing the time required to repeatedly process data files and write the results to Excel. It does this by allowing the user to define DataSource objects for each separate source of data. Each DataSource contains information such as the options needed to import data from files, the calculations that will be performed on the data, and the options for writing the data to Excel. Once a DataSource is created, it can be selected within mcetl's main user interface.

In addition, mcetl provides fitting and plotting user interfaces that can be used without any prior setup.

Introduction

Purpose

The aim of mcetl is to ease the repeated processing of data files. Contrary to its name, mcetl can process any tabulated files (txt, csv, tsv, xlsx, etc.), and does not require that the files originate from materials characterization. However, the focus on materials characterization was selected because:

  • Most data files from materials characterization are relatively small in size (a few kB or MB).

  • Materials characterization files are typically cleanly tabulated and do not require handling messy or missing data.

  • It was the author's area of usage and naming things is hard...

mcetl requires only a very basic understanding of Python to use, and allows a single person to create a tool that their entire group can use to process data and produce Excel files with a consistent style. mcetl can create new Excel files when processing data or saving peak fitting results, or it can append to an existing Excel file to easily work with already created files.

Limitations

  • Data from files is fully loaded into memory for processing, so mcetl is not suited for processing files whose total memory size is large (e.g. cannot load a 10 GB file on a computer with only 8 GB of RAM).

  • mcetl does not provide any resources for processing data files directly from characterization equipment (such as .XRDML, .PAR, etc.). Other libraries such as xylib already exist and are capable of converting many such files to a format mcetl can use (txt, csv, etc.).

  • The peak fitting and plotting modules in mcetl are not as feature-complete as other alternatives such as Origin, fityk, SciDAVis, etc. The modules are included in mcetl in case those better alternatives are not available, and the author highly recommends using those alternatives over mcetl if available.

Installation

Dependencies

mcetl requires Python version 3.7 or later and the following libraries:

All of the required libraries should be automatically installed when installing mcetl using either of the two installation methods below.

Additionally, mcetl can optionally use Pillow to allow for additional options when saving figures in the plotting GUI.

Stable Release

mcetl is easily installed from pypi using pip, simply by running the following command in your terminal:

pip install --upgrade mcetl

This is the preferred method to install mcetl, as it will always install the most recent stable release. Note that the --upgrade tag is used to ensure that the most recent version of mcetl is downloaded and installed, even if an older version is currently installed.

Development Version

The sources for mcetl can be downloaded from the Github repo.

The public repository can be cloned using:

git clone https://github.com/derb12/mcetl.git

Once the repository is downloaded, it can be installed with:

cd mcetl
python setup.py install

Quick Start

The sections below give a quick introduction to using mcetl, requiring no setup. For a more detailed introduction, refer to the tutorials section of the documentation.

Note

On Windows operating systems, the GUIs can appear blurry due to how dpi scaling is handled. To fix, simply do:

import mcetl
mcetl.set_dpi_awareness()

The above code must be called before opening any GUIs, or else the dpi scaling will be incorrect.

Main GUI

The main GUI for mcetl contains options for processing data, fitting, plotting, writing data to Excel, and moving files.

Before using the main GUI, DataSource objects must be created. Each DataSource contains the information for reading files for that DataSource (such as what separator to use, which rows and columns to use, labels for the columns, etc.), the calculations that will be performed on the data, and the options for writing the data to Excel (formatting, placement in the worksheet, etc.).

The following will create a DataSource named 'tutorial' with the default settings, and will then open the main GUI.

import mcetl

simple_datasource = mcetl.DataSource(name='tutorial')
mcetl.launch_main_gui([simple_datasource])

Fitting Data

To use the fitting module in mcetl, simply do:

from mcetl import fitting
fitting.launch_fitting_gui()

A window will then appear to select the data file(s) to be fit and the Excel file for saving the results. No other setup is required for doing fitting.

After doing the fitting, the fit results and plots will be saved to Excel.

Plotting

To use the plotting module in mcetl, simply do:

from mcetl import plotting
plotting.launch_plotting_gui()

Similar to fitting, a window will then appear to select the data file(s) to be plotted, and no other setup is required for doing plotting.

When plotting, the image of the plots can be saved to all formats supported by Matplotlib, including tiff, jpg, png, svg, and pdf.

In addition, the layout of the plots can be saved to apply to other figures later, and the data for the plots can be saved so that the entire plot can be recreated.

To reopen a figure saved through mcetl, do:

plotting.load_previous_figure()

Tutorials

Tutorials are available for using the main, fitting, and plotting GUIs, as well as for importing data, changing the style of the Excel output, and generating example raw data.

Main GUI

The Main GUI for mcetl contains options for processing data, fitting data, plotting data, writing data to Excel, and moving files. Before using the Main GUI, DataSource and Function objects must be created.

Each DataSource object contains the information for reading files for that DataSource (such as what separator to use, which rows and columns to use, labels for the columns, etc.), the calculations that will be performed on the data, and the options for writing the data to Excel (formatting, placement in the worksheet, etc.).

Function objects come in three types:

Naming Conventions

Within the tutorials, data is sectioned into datasets, samples, and entries. This is done to help understand the grouping of data. A dataset can be considered a single, large grouping of data whose contents are all related somehow. If writing to Excel, each dataset will occupy its own sheet. A sample is a grouping within a dataset, and can contain multiple data entries. Each entry can be considered as the contents of a single file, but could be broken down further if desired.

For example, consider a study of the effects of processing temperature and carbon percentage on the mechanical properties of steel. If there are two processing temperatures (700°C and 800°C) five different carbon amounts (0 wt%, 1 wt%, 2 wt%, 3 wt%, and 4 wt%), and three measurements per sample, then the following grouping could be made:

  • Dataset 1 (700°C)

    • Sample 1 (0 wt% carbon)

      • Entry 1 (first measurement file)

      • Entry 2 (second measurement file)

      • Entry 3 (third measurement file)

    • Sample 2 (1 wt% carbon)

      • Entry 1 (first measurement file)

      • Entry 2 (second measurement file)

      • Entry 3 (third measurement file)

    • Sample 3 etc...

  • Dataset 2 (800°C)

    • Sample 1 (0 wt% carbon)

      • Entry 1 (first measurement file)

      • Entry 2 (second measurement file)

      • Entry 3 (third measurement file)

    • Sample 2 etc...

Creating Function Objects

Function objects are simple wrappers around functions that allow referencing the function by a name, specifying which columns the function should act on, and other relevant information about the function's behavior. Function objects come in three types:

  • PreprocessFunctions: process each data entry as soon as it is imported from the files

  • CalculationFunctions: process each entry in each sample in each dataset

  • SummaryFunctions: process each sample or each dataset; performed last

Note that all functions will process pandas DataFrames. To speed up calculations within the functions, it is suggested to make use of vectorization of the pandas DataFrame or Series or of the underlying numpy arrays, since vectorization can be significantly faster than iterating over each index using a for loop.

This tutorial will cover some basic examples for the three Function types. To see more advanced examples, see the example programs in the GitHub respository. The use_main_gui.py example program shows many different uses of the three Function objects, and the creating_functions.py example program walks through the internals of how launch_main_gui() calls the Functions so that each individual step can be understood.

Note

For functions of all three Function types, it is a good idea to add **kwargs to the function input. In later versions of mcetl, additional items may be passed to functions, but they will always be added as keyword arguments (ie. passed as name=value). By adding **kwargs, any unwanted keyword arguments for functions will be ignored and not cause issues when upgrading mcetl.

PreprocessFunctions

A PreprocessFunction will perform its function on each data entry individually. Example usage of a PreprocessFunction can be to split a single data entry into multiple entries, collect information on all data entries in each sample in each dataset for usage later, or just simple processes that are easier to do per entry rather than when each dataset is grouped together.

The function for a PreprocessFunction must take at least two arguments: the dataframe containing the data for the entry, and a list of indices that tell which columns of the dataframe contain the data used by the function. The function must return a list of dataframes after processing, even if only one dataframe is used within the function.

A simple function to split dataframes based on the segment number, and then remove the segment number column is shown below.

import mcetl
import numpy as np

def split_segments(df, target_indices, **kwargs):
    """
    Preprocess function that separates each entry based on the segment number.

    Also removes the segment column after processing since it is not needed
    in the final output.

    Parameters
    ----------
    df : pandas.DataFrame
        The dataframe of the entry.
    target_indices : list(int)
        The indices of the target columns.

    Returns
    -------
    output_dataframes : list(pandas.DataFrame)
        The list of dataframes after splitting by segment number.

    """

    segment_index = target_indices[0]
    segment_col = df[df.columns[segment_index]].to_numpy()
    mask = np.where(segment_col[:-1] != segment_col[1:])[0] + 1 # + 1 since mask loses one index

    output_dataframes = np.array_split(df, mask)

    for dataframe in output_dataframes:
        dataframe.drop(segment_index, 1, inplace=True)

    return output_dataframes

# targets the 'segment' column from the imported data
segment_separator = mcetl.PreprocessFunction(
    name='segment_separator', target_columns='segment',
    function=split_segments, deleted_columns=['segment']
)

In addition, PreprocessFunctions can be used for procedures that are easier to perform on each data entry separately, rather than when all of the data is collected together. For example, sorting each entry based on the values in one of its columns.

def sort_columns(df, target_indices, **kwargs):
    """Sorts the dataframe based on the values of the target column."""
    return [df.sort_values(target_indices[0])]

# targets the 'diameter' column from the imported data
pore_preprocess = mcetl.PreprocessFunction('sort_data', 'diameter', sort_columns)
CalculationFunctions

A CalculationFunction will perform its function on each merged dataset. Each merged dataset is composed of all data entries for each sample concatenated together, resembling how the dataset will look when written to an Excel sheet. This makes the functions more difficult to create since target columns are given as nested lists of lists for each sample, but allows access to all data in a dataset, if required for complex calculations.

Each CalculationFunction can have two functions: one for performing the calculations on data for Excel and one for performing calculations on data for Python. This way, the Excel calculations can create strings to match Excel-specific functions, like using '=SUM(A1:A3)' to make the data dynamic within the Excel workbook, while the Python functions can calculate the actual numerical data (eg. sum(data[0:2]) to match the previous Excel formula). If only a single, numerical calculation is desired, regardless of whether the data is being output to Excel or Python, then only a single function needs to be specified (an example of such a function is given below).

The functions for a CalculationFunction must take at least three arguments: the dataframe containing the data for the dataset, a list of lists of integers that tell which columns in the dataframe contain the data used by the function, and a list of lists of integers that tell which columns in the dataframe are used for the output of the function. Additionally, two keyword arguments are passed to the function: excel_columns, which is a list of strings corresponding to the columns in Excel used by the dataset (eg. ['A', 'B', 'C', 'D']) if doing Excel functions and is None if doing Python functions, and first_row, which is an integer telling the first row in Excel that the data will begin on (3 by default, since the first row is the sample name and the second is the column name). The functions must return a DataFrame after processing, even if all changes to the input DataFrame were done in place.

A simple function that adds a column of offset data is shown below.

import mcetl
import numpy as np

def offset_data_excel(df, target_indices, calc_indices, excel_columns,
                      first_row, offset, **kwargs):
    """Creates a string that will add an offset to data in Excel."""

    total_count = 0
    for i, sample in enumerate(calc_indices):
        for j, calc_col in enumerate(sample):
            y = df[target_indices[0][i][j]]
            y_col = excel_columns[target_indices[0][i][j]]
            calc = [
                f'= {y_col}{k + first_row} + {offset * total_count}' for k in range(len(y))
            ]
            # use np.where(~np.isnan(y)) so that the calculation works for unequally-sized
            # datasets
            df[calc_col] = np.where(~np.isnan(y), calc, None)
            total_count += 1

    return df

def offset_data_python(df, target_indices, calc_indices, first_row, **kwargs):
    """Adds an offset to data."""

    total_count = 0
    for i, sample in enumerate(calc_indices):
        for j, calc_col in enumerate(sample):
            y = df[target_indices[0][i][j]]
            df[calc_col] = y + (kwargs['offset'] * total_count)
            total_count += 1

    return df

# targets the 'data' column from the imported data
offset = mcetl.CalculationFunction(
    name='offset', target_columns='data', functions=(offset_data_excel, offset_data_python),
    added_columns=1, function_kwargs={'offset': 10}
)

Alternatively, the two functions could be combined into one, and the calculation route could be decided by examining the value of the excel_columns input, which is a list of strings if processing for Excel, and None when processing for Python.

import mcetl
import numpy as np

def offset_data(df, target_indices, calc_indices, excel_columns,
                first_row, offset, **kwargs):
    """Adds an offset to data."""

    total_count = 0
    for i, sample in enumerate(calc_indices):
        for j, calc_col in enumerate(sample):
            if excel_columns is not None: # do Excel calculations
                y = df[target_indices[0][i][j]]
                y_col = excel_columns[target_indices[0][i][j]]
                calc = [
                    f'= {y_col}{k + first_row} + {offset * total_count}' for k in range(len(y))
                ]
                df[calc_col] = np.where(~np.isnan(y), calc, None)

            else: # do Python calculations
                y = df[target_indices[0][i][j]]
                df[calc_col] = y + (offset * total_count)
            total_count += 1

    return df

# targets the 'data' column from the imported data
offset = mcetl.CalculationFunction(
    name='offset', target_columns='data', functions=offset_data,
    added_columns=1, function_kwargs={'offset': 10}
)

To modify the contents of an existing column, the input for added_columns for CalculationFunction should be a string designating the target: either a variable from the imported data, or the name of a CalculationFunction.

import mcetl
import numpy as np

def normalize(df, target_indices, calc_indices, excel_columns, first_row, **kwargs):
    """Performs a min-max normalization to bound values between 0 and 1."""

    for i, sample in enumerate(calc_indices):
        for j, calc_col in enumerate(sample):
            if excel_columns is not None:
                y = df[target_indices[0][i][j]]
                y_col = excel_columns[target_indices[0][i][j]]
                end = y.count() + 2
                calc = [
                    (f'=({y_col}{k + first_row} - MIN({y_col}$3:{y_col}${end})) / '
                     f'(MAX({y_col}$3:{y_col}${end}) - MIN({y_col}$3:{y_col}${end}))')
                    for k in range(len(y))
                ]

                df[calc_col] = np.where(~np.isnan(y), calc, None)

            else:
                y_col = df.columns[target_indices[0][i][j]]
                min_y = df[y_col].min()
                max_y = df[y_col].max()

                df[calc_col] = (df[y_col] - min_y) / (max_y - min_y)

    return df

def offset_normalized(df, target_indices, calc_indices, excel_columns,
                      offset, **kwargs):
    """Adds an offset to normalized data."""

    total_count = 0
    for i, sample in enumerate(calc_indices):
        for j, calc_col in enumerate(sample):
            y_col = df[target_indices[0][i][j]]
            offset_amount = offset * total_count
            if excel_columns is not None:
                df[calc_col] = y_col + f' + {offset_amount}'
            else:
                df[calc_col] = y_col + offset_amount
            total_count += 1

    return df

# targets the 'data' column from the imported data
normalize_func = mcetl.CalculationFunction(
    name='normalize', target_columns='data',
    functions=normalize, added_columns=1
)
# targets the 'normalize' column from the the 'normalize' CalculationFunction
# and also alters its contents
offset_func = mcetl.CalculationFunction(
    name='offset', target_columns='normalize', functions=offset_normalized,
    added_columns='normalize', function_kwargs={'offset': 10}
)

If the CalculationFunction does the same calculation, regardless of whether the data is going to Excel or for later processing in Python, then a mutable object, like a list, can be used in function_kwargs to signify that the calculation has been performed to prevent processing twice.

import mcetl

def offset_numerical(df, target_indices, calc_indices, excel_columns, **kwargs):
    """Adds a numerical offset to data."""

    # Add this section to prevent doing numerical calculations twice.
    if excel_columns is None and kwargs['processed'][0]:
        return df # return to prevent processing twice
    elif excel_columns is not None:
        kwargs['processed'][0] = True

    # Regular calculation section
    offset = kwargs['offset']
    total_count = 0
    for i, sample in enumerate(calc_indices):
        for j, calc_col in enumerate(sample):
            df[calc_col] = df[target_indices[0][i][j]] + (offset * total_count)
            total_count += 1

    return df

# targets the 'data' column from the imported data
numerical_offset = mcetl.CalculationFunction(
    name='numerical offset', target_columns='data', functions=offset_numerical,
    added_columns=1, function_kwargs={'offset': 10, 'processed': [False]}
)
SummaryFunctions

A SummaryFunction is very similar to CalculationFunctions, performing its functions on each merged dataset and requiring outputting a single DataFrame. However, SummaryFunctions differ from CalculationFunctions in that their added columns are not within the data entries themselves. Instead, SummaryFunctions can either be a sample SummaryFunction (by using sample_summary=True when creating the object), which is equivalent to appending a data entry to each sample in each dataset, or a dataset SummaryFunction (by using sample_summary=False when creating the object), which is equivalent to appending a sample to each dataset.

For example, consider calculating the elatic modulus from tensile tests. Each sample in the dataset will have multiple measurements/entries, so a sample SummaryFunction could be used to calculate the average elastic modulus for each sample, and a dataset SummaryFunction could be used to create a table listing the average elastic modulus for each sample in the dataset for easy referencing.

import mcetl
import numpy as np
import pandas as pd
from scipy import optimize

def stress_model(strain, modulus):
    """
    The linear estimate of the stress-strain curve using the strain and estimated modulus.

    Parameters
    ----------
    strain : array-like
        The array of experimental strain values, unitless (with cancelled
        units, such as mm/mm).
    modulus : float
        The estimated elastic modulus for the data, with units of GPa (Pa * 10**9).

    Returns
    -------
    array-like
        The estimated stress data following the linear model, with units of Pa.

    """
    return strain * modulus * 1e9


def tensile_calculation(df, target_indices, calc_indices, excel_columns, **kwargs):
    """Calculates the elastic modulus from the stress-strain curve for each entry."""

    if excel_columns is None and kwargs['processed'][0]:
        return df # return to prevent processing twice
    elif excel_columns is not None:
        kwargs['processed'][0] = True

    num_columns = 2 # the number of calculation columns per entry
    for i, sample in enumerate(calc_indices):
        for j in range(len(sample) // num_columns):
            strain_index = target_indices[0][i][j]
            stress_index = target_indices[1][i][j]
            nan_mask = (~np.isnan(df[strain_index])) & (~np.isnan(df[stress_index]))

            # to convert strain from % to unitless
            strain = df[strain_index].to_numpy()[nan_mask] / 100
             # to convert stress from MPa to Pa
            stress = df[stress_index].to_numpy()[nan_mask] * 1e6

            # only use data where stress varies linearly with respect to strain
            linear_mask = (
                (strain >= kwargs['lower_limit']) & (strain <= kwargs['upper_limit'])
            )
            initial_guess = 80 # initial guess of the elastic modulus, in GPa

            modulus, covariance = optimize.curve_fit(
                stress_model, strain[linear_mask], stress[linear_mask],
                p0=[initial_guess]
            )

            df[sample[0 + (j * num_columns)]] = pd.Series(('Value', 'Standard Error'))
            df[sample[1 + (j * num_columns)]] = pd.Series(
                (modulus[0], np.sqrt(np.diag(covariance)[0]))
            )

    return df


def tensile_sample_summary(df, target_indices, calc_indices, excel_columns, **kwargs):
    """Summarizes the mechanical properties for each sample."""

    if excel_columns is None and kwargs['processed'][0]:
        return df # to prevent processing twice

    num_cols = 2 # the number of calculation columns per entry from tensile_calculation
    for i, sample in enumerate(calc_indices):
        if not sample: # skip empty lists
            continue

        entries = []
        for j in range(len(target_indices[0][i]) // num_cols):
            entries.append(target_indices[0][i][j * num_cols:(j + 1) * num_cols])

        df[sample[0]] = pd.Series(['Elastic Modulus (GPa)'])
        df[sample[1]] = pd.Series([np.mean([df[entry[1]][0] for entry in entries])])
        df[sample[2]] = pd.Series([np.std([df[entry[1]][0] for entry in entries])])

    return df


def tensile_dataset_summary(df, target_indices, calc_indices, excel_columns, **kwargs):
    """Summarizes the mechanical properties for each dataset."""

    if excel_columns is None and kwargs['processed'][0]:
        return df # to prevent processing twice

    # the number of samples is the number of lists in calc_indices - 1
    num_samples = len(calc_indices[:-1])

    # calc index is -1 since only the last dataframe is the dataset summary dataframe
    df[calc_indices[-1][0]] = pd.Series(
        [''] + [f'Sample {num + 1}' for num in range(num_samples)]
    )
    df[calc_indices[-1][1]] = pd.Series(
        ['Average'] + [df[indices[1]][0] for indices in target_indices[0][:-1]]
    )
    df[calc_indices[-1][2]] = pd.Series(
        ['Standard Deviation'] + [df[indices[2]][0] for indices in target_indices[0][:-1]]
    )

    return df

# share the keyword arguments between all function objects
tensile_kwargs = {'lower_limit': 0.0015, 'upper_limit': 0.005, 'processed': [False]}

# targets the 'data' column from the imported data
tensile_calc = mcetl.CalculationFunction(
    name='tensile calc', target_columns=['strain', 'stress'],
    functions=tensile_calculation, added_columns=2,
    function_kwargs=tensile_kwargs
)
# targets the columns from the the 'tensile calc' CalculationFunction
stress_sample_summary = mcetl.SummaryFunction(
    name='tensile sample summary', target_columns=['tensile calc'],
    functions=tensile_sample_summary, added_columns=3,
    function_kwargs=tensile_kwargs, sample_summary=True
)
# targets the columns from the the 'tensile sample summary' SummaryFunction
stress_dataset_summary = mcetl.SummaryFunction(
    name='tensile dataset summary', target_columns=['tensile sample summary'],
    functions=tensile_dataset_summary, added_columns=3,
    function_kwargs=tensile_kwargs, sample_summary=False
)

Creating DataSources

DataSource objects are the main controlling object when using mcetl's main GUI.

DataSource Inputs

This section groups the inputs for DataSource by their usage, and explains possible inputs. The only necessary input for a DataSource is its name, which is displayed in the GUI when selecting which DataSource to use.

Note that most inputs for DataSource will supply defaults for entering information into the GUIs, and can be overwritten when using the main GUI.

Using Raw Data Files
File Searching

The keyword file_type should be a string of the file extension of data files for the DataSource, such as 'csv' or 'txt'. When searching for files, either manually or using a keyword search, the file extension is used to limit the number of files to select.

The keyword num_files should be an interger telling how many files should be associated for each sample for the DataSource. Only used if using a keyword search for files.

Importing Data

column_numbers should be a list of integers telling which columns to import from the raw data file. The order of columns follows the ordering within column_numbers, so [0, 1, 2] would import the first, second, and third columns from the data file and keep their order, while [1, 2, 0] would still import the three columns but rearrange them so that the first column is now the third column.

start_row and end_row should be integers telling which row in the raw data file should be the first and last row, respectively. Note that end_row counts up from the bottom of the data file, so that end_row=0 would be the last row, end_row=1 would be the second to last row, etc.

separator should be a string designating the separator used in the raw data file to separate columns. For example, comma-separated files (csv) have ',' as the separator.

See the Importing Data section of the tutorial for more in depth discussion on importing data.

Processing Data
Functions

The keyword functions should be a list or tuple of all the PreprocessFunctions, CalculationFunctions, and/or SummaryFunctions that are needed for processing data for the DataSource. PreprocessFunctions will be performed first, followed by CalculationFunctions and then SummaryFunctions. For each Function type, the order in which the functions are performed is according to their ordering in the functions input.

The keyword unique_variables should be a list of strings designating the names to give to columns that contain necessary data. The Function objects of the DataSource will then be able set their target_columns to those columns. For example, in the previous tutorial section on creating Function objects, the example Function objects had target_columns such as 'data' and 'diameter' from imported data. The 'data' and 'diameter' columns would be the DataSource's unique_variables.

unique_variable_indices should be a list of intergers telling the indices of the columns within column_numbers belong to the unique_variables. For example, if column_numbers was [1, 2, 0], and Column 2 contained the data for the unique variable, then unique_variable_indices would be [1] since Column 2 is at index 1 within column_numbers.

Column Names

The keyword column_labels should be a list of strings, designating the default column labels for a single data entry. The column labels should accomodate both the names of the imported data columns, and all of the columns added by CalculationFunctions and SummaryFunctions.

To aid setting up the column_labels, a DataSource instance has the method print_column_labels_template(), which will tell how many labels belong to imported data, CalculationFunctions, sample-summary SummaryFunctions, and dataset-summary SummaryFunctions, and give a template for the column_labels input.

The example below shows creating a DataSource for tensile testing, and uses the tensile_calc, stress_sample_summary, stress_dataset_summary Function objects created in the previous tutorial section on creating Function objects.

import mcetl

tensile = mcetl.DataSource(
    'Tensile Test',
    functions=[tensile_calc, stress_sample_summary, stress_dataset_summary],
    unique_variables=['stress', 'strain'],
    column_numbers=[4, 3, 0, 1, 2],
    unique_variable_indices=[1, 0],
    # column_labels only currently has labels for the imported data, not the Functions
    column_labels=['Strain (%)', 'Stress (MPa)', 'Time (s)', 'Extension (mm)', 'Load (kN)'],
)

tensile.print_column_labels_template()

The label_entries keyword can be set to True to append a number to the column labels of each entry in a sample if there is more than one entry. For example, the column label 'data' would become 'data, 1', 'data, 2', etc.

Figure Appearance

The figure_rcparams keyword can be used to set the style of figures for fitting and plotting using Matplotlib's rcParams. The input should be a dictionary, like below:

example_rcparams = {
    'font.serif': 'Times New Roman',
    'font.family': 'serif',
    'font.size': 12,
    'xtick.direction': 'in',
    'ytick.direction': 'in'
}
Excel Output
Location within the Workbook

Where data is placed within the output Excel file is determined by the excel_column_offset and excel_row_offset keywords, which expect integers. By default, data is placed in the workbook starting at cell A1. To change the starting cell to D8, for example, the excel_column_offset would be 3 and the excel_row_offset would be 7.

Spacing between Samples and Entries

To place empty columns in the Excel workbook to help visually separate the various samples and entries, use entry_separation and sample_separation. For example, to place one empty columns between each entry in a sample, and two empty columns between each sample in a dataset, use entry_separation=1 and sample_separation=2, respectively.

Excel Plot Options

The xy_plot_indices keyword should be a list with two integers, telling the indices of the columns to use for the x- and y-axes when plotting each entry in Excel. Note that the the indices can refer to the columns from imported data or from columns added by CalculationFunctions. For example, if three columns were imported, and two additional columns were added by CalculationFunctions, then the total column indices would be [0, 1, 2, 3, 4] and plot indices could be [0, 1] or [1, 4], etc.

Sample SummaryFunctions and dataset SummaryFunctions are also allowed to be plotted in Excel and will use the xy_plot_indices.

Specifying Excel Styles

The styles used in the output Excel file are described by using the excel_writer_styles keyword. See the Excel Styles section of the tutorial for discussion on how to change the style of output Excel file.

Examples

The examples below show creating DataSources using some of the keyword inputs discussed above as well as the Function objects created in the previous tutorial section on creating Function objects.

import mcetl

xrd = mcetl.DataSource(
    'X-ray Diffraction',
    column_labels=['2\u03B8 (\u00B0)', 'Intensity (Counts)', 'Offset Intensity (a.u.)'],
    functions=[offset],
    column_numbers=[1, 2],
    unique_variables=['data'],
    unique_variable_indices=[1],
    start_row=1,
    end_row=0,
    separator=',',
    xy_plot_indices=[0, 2],
    file_type='csv',
    num_files=1,
    entry_separation=1,
    sample_separation=2,
)

tensile = mcetl.DataSource(
    'Tensile Test',
    functions=[tensile_calc, stress_sample_summary, stress_dataset_summary],
    column_numbers=[4, 3, 0, 1, 2],
    unique_variables=['stress', 'strain'],
    unique_variable_indices=[1, 0],
    xy_plot_indices=[0, 1],
    column_labels=[
        'Strain (%)', 'Stress (MPa)', 'Time (s)', 'Extension (mm)', 'Load (kN)', # imported data
        '', 'Elastic Modulus (GPa)', # CalculationFunction
        'Property', 'Average', 'Standard Deviation', # sample SummaryFunction
        'Sample', 'Elastic Modulus (GPa)', '' # dataset SummaryFunction
    ],
    start_row=6,
    separator=',',
    file_type='txt',
    num_files=3,
    entry_separation=2,
    sample_separation=3
)

Using the Main GUI

Once all of the desired DataSource objects are created, simply group the DataSource objects together in a list or tuple, and then call launch_main_gui(). For example, using the two DataSource objects created from the last section gives:

import mcetl

all_datasources = (xrd, tensile)
mcetl.launch_main_gui(all_datasources)

The main GUI has the following steps:

  • Main Menu (Choosing the DataSource and Processing Steps)

  • File Selection

  • Importing Data from Files

  • Naming Samples & Columns

  • Processing

File Selection
Manually Selecting Files

Manual selection of files is easy and fast. To start, the number of datasets needs to be entered, and then the number of samples per dataset. Each dataset will be one sheet when writing to Excel.

Next, a window will appear to select the files for each sample in each dataset (see the file selection window in the gallery section of the documentation). Simply press 'Add Files', and select all of the files for each sample. To remove a file, just select it and press 'Del Files'.

Searching Using Keywords

Files can also be searched for using keywords. However, this is usually more difficult than manually selecting the files and should only be used if searching for files within a deeply nested folder structure. Further explanation of keyword searching for files is given when actually using within the main GUI, and will not be covered here since its usage is discouraged.

Note

After selecting files, a file called "previous_files_{DataSource.name}.json" is locally saved, where {DataSource.name} is the name of the selected DataSource. This allows quickly bypassing file selection in case of repeated processing of the same files.

To get the file path where the files are saved, simply do:

print(str(mcetl.main_gui.SAVE_FOLDER))
Importing Data from Files

See the Importing Data section of the tutorial for the explanation of how mcetl imports raw data.

Note that numeric data after importing is downcast to the lowest possible representation (for example float is converted to numpy.float32) to reduce memory usage. If this would be an issue, be sure to include appropriate Function objects to convert to the desired dtype.

Naming Samples & Columns

The name of each sample for each dataset can be specified, as well as the column name for each column in the dataset. Column names are generated using the column_names input for the selected DataSource. The gallery section of the documentation shows an example of the sample and column naming window.

Processing

Processing includes any processing steps selected in the main menu, including processing the data using the Function objects for the selected DataSource, fitting the data, plotting the data, writing to Excel, or moving files. Each step should be self-explanatory.

The output of the launch_main_gui function will be a single dictionary with the following keys and values:

'dataframes': list or None

A list of lists of pandas.DataFrame, with each dataframe containing the data imported from a raw data file; will be None if the function fails before importing data, or if the only processing step taken was moving files.

'fit_results': list or None

A nested list of lists of lmfit ModelResult objects, with each ModelResult pertaining to the fitting of a data entry, each list of ModelResults containing all of the fits for a single sample, and east list of lists pertaining to the data within one dataset. Will be None if fitting is not done, or only partially filled if the fitting process ends early.

'plot_results': list or None

A list of lists, with one entry per dataset. Each interior list is composed of a matplotlib.Figure object and a dictionary of matplotlib.Axes objects. Will be None if plotting is not done, or only partially filled if the plotting process ends early.

'writer': pandas.ExcelWriter or None

The pandas ExcelWriter used to create the output Excel file; will be None if the output results were not saved to Excel.

Fitting GUI

The fitting submodule of mcetl provides functions and GUIs for fitting data. mcetl uses the lmfit library for fitting data using non-linear least-squares minimization. For in-depth discussion on the theory and available methods/models for curve fitting, the lmfit documentation is highly recommended.

Please note that the fitting submodule is included in mcetl to allow a fairly basic curve fitting option. If other curve fitting software is available, such as fityk or Origin, the author highly recommends their usage instead of mcetl.

Basic Usage

To use the fitting GUI in mcetl, simply do:

from mcetl import fitting
fitting.launch_fitting_gui()

A window will then appear to select the data file(s) to be fit and the Excel file for saving the results. No other setup is required.

Usage of the fitting GUI is fairly straightforward, and the GUI should notify the user if any issues occur and allow correction without terminating the program.

After doing the fitting, the fit results and plots will be saved to Excel.

Note

Currently, mcetl only provides functions and a GUI for performing peak fitting. A later release of mcetl (v0.5 or v0.6) is slated to add general fitting routines, to allow fitting arbitrary models to data. Further, that release should also add the option to save the fitting options from the GUI to a file so that the options can be reused for fitting other data, similar to how the plotting GUI does it.

Advanced Usage

To be added.

Plotting GUI

The plotting module of mcetl provides a GUI for plotting data. mcetl uses the Matplotlib library for plotting, which allows for saving high quality images in several formats.

Please note that the plotting submodule is included in mcetl to allow a fairly basic plotting option. Further, its development will be slower than the main GUI or the fitting submoduule since it requires covering many more options and is not a priority of the author. If other plotting software is available, such as SciDAVis, Veusz, or Origin, the author highly recommends their usage instead of mcetl.

Summary of Features

The plotting GUI has the following features:

  • plot data using line and/or scatter plots, and allow customization of the colors, style, and size of the markers/lines.

  • specify complex layouts composed of multiple rows and/or columns of axes and inset axes

  • add twin axes to plot multiple dependent responses (such as viscosity and density as functions of temperature)

  • add secondary axes to plot the same data on different scales, using user-specified conversions (such as plotting temperature in both Fahrenheit and Celsius)

  • specify figure dimensions and dots per inch (dpi)

  • add formatted annotations (text, arrows, and lines) anywhere on the figure

  • allow specifying peak positions, which will optionally place markers and/or labels above each peak position on each dataset.

In addition, changes to Matplotlib's rcParams can be specified, which allows additional control over the style and formatting of the figures.

Basic Usage

To use the plotting GUI in mcetl, simply do:

from mcetl import plotting

# add some changes to Matplotlib's rcparams
changes = {
    'font.serif': 'Times New Roman',
    'font.family': 'serif',
    'font.size': 12,
    'xtick.direction': 'in',
    'ytick.direction': 'in'
}
plotting.launch_plotting_gui(mpl_changes=changes)

Similar to fitting, a window will then appear to select the data file(s) to be plotted, and no other setup is required for doing plotting.

When plotting, the image of the plots can be saved to all formats supported by Matplotlib, including tiff, jpg, png, svg, and pdf. If the Pillow library is installed, additional options are given to allow saving compressed files for some formats, such as tiff, in order to reduce the file size.

Advanced Usage

To be added.

Saving/Reopening Plots

Using the GUI, the layout of the plots can be saved to apply to other figures later, which is referred to in the GUI as saving the Figure Theme. The necessary data will be saved with the file extension ".figjson", which is just a json file.

Further, the data for the plots can be saved to a csv file so that the entire plot can be recreated.

To reopen a figure saved through mcetl, do:

plotting.load_previous_figure()

which will open a window to select the csv and (optionally) figjson files to use.

Importing Data

mcetl allows importing data from various sources such as text files, csv files, fixed-width files, and spreadsheet-like files (xlsx, xls, xlsm, xlsb, ods, etc.). Note that the default installation of mcetl only supports importing from spreadsheet-like files with either xlsx or xlsm file extensions (the file types supported by openpyxl). To import data from other spreadsheet-like files, the appropriate Python library must be installed. For example, to read xls files, the xlrd library would need to be installed.

Figure 1 below shows the Data Import window. The meanings of the fields are as follows:

  • Columns to import: which columns in the raw data file to import

  • Start row: the first row to use from the raw data file

  • End row: the last row to use from the raw data file

In addition, if using the Main GUI through mcetl.launch_main_gui() and processing data, then a section will be added to the bottom of the import window to select the column numbers for each unique variable ('x' and 'y' in Figure 1) for the selected DataSource.

_images/data_import_11.png

Figure 1: Options for importing raw data. Columns for unique variables for a DataSource are also selected.

Figure 2 shows examples of different start rows, end rows, data columns, and separators to help show the meanings of these terms. In Python, counting starts at 0, so if you wished to use the first row, for example, you would put 0 as the start row. For the end row, it is counting up from the bottom, so the bottom-most row is row 0, the row above it is row 1, etc.

_images/data_import_4.png

Figure 2: Three raw data files with different import options.

For most file types the only other unique item that needs filled out is the separator. Common separators for files are commas (,), semicolons (;), tabs (\t), and whitespace (\s+). Regular expressions can also be used for the separator.

For fixed-width files, the widths of each column also need to be specified, as seen in Figure 3.

_images/data_import_2.png

Figure 3: Importing raw data from fixed-width file.

For importing data from existing spreadsheet-like files (xlsx, xls, etc.), there are additional options to choose (Figure 4). The assumption when importing data from a spreadsheet-like file is that the data has a common, repeating layout. The options are as follows:

  • Sheet to use: which sheet in the file to use

  • First Column: the first column in the sheet to use

  • Last Column: the last column in the sheet to use

  • Number of columns per entry: how many columns are associated with one set of data in the selected sheet; the total number data entries that will be extracted from the file will be equal to (Last Column + 1 – First Column) / Number of columns per entry. For example, if the First Column is column 0, the Last Column is column 11, and there are 4 columns per entry, then the total number of sets of data is (11 + 1 – 0) / 4 = 3.

It is recommended to start from the top of the Data Import window, and work down because various fields are filled out automatically when selecting the sheet to use, number of columns per dataset, etc.

_images/data_import_3.png

Figure 4: Importing raw data from a spreadsheet-like file.

At any time, the “Test Import” button can be pressed to look at how the imported data will look. When the data source is a spreadsheet-like file and the columns per entry is less than the total columns, two windows will show up, with the first window corresponding to the total set of data starting at the first column and ending at the last column, and the second window showing the individual data entries. Any errors that would have occurred during data importing will simply give a pop-up warning window when using “Test Import”, rather than causing any issues, so it is best to use “Test Import” if you are unsure if the import options are correct.

Writing to Excel

mcetl gives the option to write results to Excel when using either mcetl.launch_main_gui() or fitting.launch_fitting_gui() from mcetl.fitting. This section covers some information that is good to know when writing to Excel.

Appending to Existing File

mcetl allows appending to existing Excel files, but it should be used with caution, namely for two reasons: high memory usage, and potentially losing Excel objects.

Memory Usage

When appending to an existing Excel file, mcetl loads the Excel workbook using the Python library openpyxl; however, the conversion from Excel to Python can increase the necessary memory by more than an order of magnitude. For example, an Excel file that is 10 MB on disk will require several hundred MB of RAM when opened in Python. Users should to be aware of this fact, so that a MemoryError does not occur when loading existing files. If in doubt, write to a new file using mcetl, and then copy the sheets to the desired file within Excel.

Losing Excel Objects

openpyxl can convert the majority of objects within an Excel file, such as cell values and styles, defined styles for the workbook, and charts. If the existing Excel file is simply tabulated data and charts, then it should have no issues. However, there are some Excel objects which openpyxl cannot read, such as shapes and inserted equations, so these non-convertable objects will be lost when appending to a file.

If there is any doubt whether openpyxl can read an object within an Excel file, it is a good idea to create a copy of the file before appending to it with mcetl to test whether all objects are transferred, or use mcetl to write to a new file and then copy its contents to the existing file using Excel.

Excel Styles

When writing to Excel, using either mcetl.launch_main_gui() (using a DataSource to set the style) or fitting.launch_fitting_gui(), the style of the cells in the Excel workbook can be customized. The gallery section of the documentation shows two spreadsheets showing the default styles used for both the launch_main_gui function and the launch_fitting_gui function.

There are seven different styles that can be specified, each with an even and odd variant (eg. 'header_even' and 'header_odd'), allowing for fourteen total styles to be specified, making the Excel output visually distinct. The base names for the seven styles are:

  • 'header': The style used for the headers.

  • 'subheader': The style used for the subheaders.

  • 'columns': The style for all of the data columns.

  • 'fitting_header': The style used for the headers when writing results from fitting.

  • 'fitting_subheader':The style used for the subheaders when writing fit results.

  • 'fitting_columns': The style for the data and parameter columns for fit results.

  • 'fitting_descriptors': The style for the two columns specifying descriptions about the fit (whether it was successful, the goodness of fit, etc.).

The styles with the 'fitting' prefix are only used if writing results from fitting to Excel.

The styles are specified using a dictionary containing the above base names with '_even' or '_odd' appended (eg. 'header_even') as keys, and either a nested dictionary or an openpyxl NamedStyle object as values.

Using Named Styles

To make the styles usable in the output Excel file, there are two options. The first is to use openpyxl NamedStyle objects in the dictionary. The second is to use a dictionary, and include the 'name' key in the dictionary to set the NamedStyle's name.

Note

Once a Named Style is added to an Excel workbook, it cannot be overwritten using mcetl. That means that if appending to an existing workbook and trying to set a Named Style to the name of an existing style in the workbook, the format of the existing style will be used rather than the input style. To fix this, just delete or rename any styles that need to be changed within the Excel workbook before writing more data using mcetl.

Some examples of valid inputs that create NamedStyles are shown below:

from openpyxl.styles import (
    Alignment, Border, Font, NamedStyle, PatternFill, Side
)

partial_styles = {
    # can use an openpyxl NamedStyle
    'header_even': NamedStyle(
        name='Even Header',
        font=Font(size=12, bold=True),
        fill=PatternFill(fill_type='solid', start_color='F9B381', end_color='F9B381'),
        border=Border(bottom=Side(style='thin')),
        alignment=Alignment(horizontal='center', vertical='center', wrap_text=True),
        number_format='0.00'
    ),
    # or use a dictionary with a 'name' key
    'header_odd': {
        'name': 'Odd Header',
        'font': Font(size=12, bold=True),
        'fill': PatternFill(fill_type='solid', start_color='73A2DB', end_color='73A2DB'),
        'border': Border(bottom=Side(style='thin')),
        'alignment': Alignment(horizontal='center', vertical='center', wrap_text=True),
        'number_format': '0.00'
    },
    # can replace all openpyxl objects with dict to not even need to import openpyxl
    'subheader_odd': {
        'name': 'Odd Subheader',
        'font': dict(size=12, bold=True),
        'fill': dict(fill_type='solid', start_color='73A2DB', end_color='73A2DB'),
        'border': dict(bottom=dict(style='thin')),
        'alignment': dict(horizontal='center', vertical='center', wrap_text=True),
        'number_format': '0.00'
    },
    # can also reference already created NamedStyles
    'subheader_even': 'Odd Subheader'
}
Using Anonymous Styles

Anonymous styles will properly format the cells in the output Excel file, but their names will not be available styles in the Excel file. In addition, anonymous styles also have a much faster write time than Named Styles, taking ~ 50% less time to write. So if processing speed is a concern, using anonymous styles is a good choice.

An easy way to create anonymous styles is to first create the NamedStyle, like above, and then replace NamedStyle with dict and remove the 'name' key. Doing so with the styles from the previous section gives:

partial_styles = {
    # replace NamedStyle with dict and remove name=''
    'header_even': dict(
        font=Font(size=12, bold=True),
        fill=PatternFill(fill_type='solid', start_color='F9B381', end_color='F9B381'),
        border=Border(bottom=Side(style='thin')),
        alignment=Alignment(horizontal='center', vertical='center', wrap_text=True),
        number_format='0.00'
    ),
    # remove the 'name' key
    'header_odd': {
        'font': Font(size=12, bold=True),
        'fill': PatternFill(fill_type='solid', start_color='73A2DB', end_color='73A2DB'),
        'border': Border(bottom=Side(style='thin')),
        'alignment': Alignment(horizontal='center', vertical='center', wrap_text=True),
        'number_format': '0.00'
    },
    # remove 'name' and replace all openpyxl objects with dict to not even need to import openpyxl
    'subheader_odd': {
        'font': dict(size=12, bold=True),
        'fill': dict(fill_type='solid', start_color='73A2DB', end_color='73A2DB'),
        'border': dict(bottom=dict(style='thin')),
        'alignment': dict(horizontal='center', vertical='center', wrap_text=True),
        'number_format': '0.00'
    }
}
Using Unformatted Styles

To make a style unformatted (use the Excel default format), simply set its value to either None or an empty dictionary.

# both produces same, default style in the output Excel file
partial_styles = {'header_even': {}, 'header_odd': None}

To make all styles unformatted, do one of the following dictionary comprehensions:

from mcetl import DataSource

unformatted_styles = {style: {} for style in DataSource.excel_styles.keys()}
unformatted_styles2 = {style: None for style in DataSource.excel_styles.keys()}
Validate Styles

To ensure that the input style dictionary is valid, DataSource provides the test_excel_styles() static method, which will indicate the keys of all of the styles are not valid and their error tracebacks:

from mcetl import DataSource
from openpyxl.styles import Font, NamedStyle

good_styles = {
    'header_even': NamedStyle(
            name='header_even',
            font=Font(size=12, bold=True),
        ),
    'header_odd': {
        'font': dict(size=12, bold=True),
    }
}

bad_styles = {
    'header_even': dict(
            name='Header',
            font=Font(size=12, bold=True),
            number_format=1 # wrong since number_format must be a string or None
        ),
    'header_odd': {
        'font': dict(size='string'), # wrong since font size must be a float
    }
}

DataSource.test_excel_styles(good_styles) # returns True
DataSource.test_excel_styles(bad_styles) # returns False

Example Data & Programs

mcetl comes with the ability to generate raw data to emulate several different materials characterization techniques. Further, example programs are available to show basic usage of mcetl.

Generating Example Data

Files for example data from characterization techniques can be created using:

from mcetl import raw_data
raw_data.generate_raw_data()

Data produced by the generate_raw_data function covers the following characterization techniques:

  • X-ray diffraction (XRD)

  • Fourier-transform infrared spectroscopy (FTIR)

  • Raman spectroscopy

  • Thermogravimetric analysis (TGA)

  • Differential scanning calorimetry (DSC)

  • Rheometry

  • Uniaxial tensile tests

  • Pore size measurements

Example Programs

Example programs are available to show basic usage of mcetl. The examples include:

  • Generating raw data

  • Using the main GUI

  • Using the fitting GUI

  • Using the plotting GUI

  • Reopening a figure saved with the plotting GUI

The example program for using the main GUI contains all necessary inputs for processing the example raw data generated by the generate_raw_data function as described above and is an excellent resource for creating new DataSource objects.

API Reference

Warning

mcetl is going to switch GUI backends in a later version (v0.5 or v0.6), so any api that is not referenced in the Quick Start or Tutorial sections should be used with caution since it is liable to be changed or removed.

mcetl

mcetl provides user interfaces for processing, fitting, and plotting data.

mcetl is focused on easing the time required to process data files. It does this by allowing the user to define DataSource objects which contains the information for reading files for that DataSource (such as what separator to use, which rows and columns to use, labels for the columns, etc.), the calculations that will be performed on the data, and the options for writing the data to Excel (formatting, placement in the worksheet, etc.).

In addition, mcetl provides fitting and plotting user interfaces that can be used without any prior setup.

Subpackages include:

mcetl.fitting

Contains functions that ease the fitting of data. The main entry is through mcetl.fitting.launch_fitting_gui, but also contains useful functions without needing to launch a GUI.

mcetl.plotting

Contains functions that ease the plotting of data. The main entry is through mcetl.plotting.launch_plotting_gui. Can also reopen a previouly saved figure using mcetl.plotting.load_previous_figure.

@author: Donald Erb Created on Jul 15, 2020

Subpackages

mcetl.fitting

Contains functions that ease the fitting of data. The main entry is through mcetl.fitting.launch_fitting_gui, but also contains useful functions without needing to launch a GUI.

@author: Donald Erb Created on Nov 15, 2020

Submodules
mcetl.fitting.fitting_gui

Provides a GUI to fit data using lmfit Models and save the results to Excel.

@author: Donald Erb Created on May 24, 2020

Notes

openpyxl is imported within fit_to_excel to reduce the import time of the module, and is only imported if saving fit results to Excel.

Module Contents
Classes

ResultsPlot

Shows the results of a fit to allow user to decide if fit was acceptable.

SimpleEmbeddedFigure

A window containing just an embedded figure and a close button.

Functions

fit_dataframe

Creates a GUI to select data from a dataframe for fitting.

fit_to_excel

Outputs the relevant data from peak fitting to an Excel file.

launch_fitting_gui

Convenience function to fit dataframe(s) and write their results to Excel.

class mcetl.fitting.fitting_gui.ResultsPlot(fit_result)

Bases: mcetl.plot_utils.EmbeddedFigure

Shows the results of a fit to allow user to decide if fit was acceptable.

Parameters

fit_result (lmfit.ModelResult) -- The fit result to display.

Initialize self. See help(type(self)) for accurate signature.

event_loop(self)

Handles the event loop for the GUI.

Returns

Returns True if the Continue button was pressed, otherwise False.

Return type

bool

class mcetl.fitting.fitting_gui.SimpleEmbeddedFigure(dataframe, gui_values)

Bases: mcetl.plot_utils.EmbeddedFigure

A window containing just an embedded figure and a close button.

Parameters
  • dataframe (pd.DataFrame) -- The dataframe that contains the x and y data.

  • gui_values (dict) -- A dictionary of values needed for plotting.

Initialize self. See help(type(self)) for accurate signature.

event_loop(self)

Handles the event loop for the GUI.

Notes

This function should typically be overwritten by a subclass, and should typically return any desired values from the embedded figure.

This simple implementation makes the window visible, and closes the window as soon as anything in the window is clicked.

mcetl.fitting.fitting_gui.fit_dataframe(dataframe, user_inputs=None)

Creates a GUI to select data from a dataframe for fitting.

Parameters
  • dataframe (pd.DataFrame) -- A pandas dataframe containing the data to be fit.

  • user_inputs (dict, optional) -- Values to use as the default inputs in the GUI.

Returns

  • fit_result (lmfit.model.ModelResult or None) -- A lmfit.ModelResult object, which gives information for the fit done on the dataset. Is None if fitting was skipped.

  • values_df (pd.DataFrame or None) -- The dataframe containing the x and y data, the y data for every individual model, the summed y data of all models, and the background, if present. Is None if fitting was skipped.

  • params_df (pd.DataFrame or None) -- The dataframe containing the value and standard error associated with all of the parameters in the fitting (eg. coefficients for the baseline, areas and sigmas for each peak). Is None if fitting was skipped.

  • descriptors_df (pd.DataFrame or None) -- The dataframe which contains some additional information about the fitting. Currently has the adjusted r squared, reduced chi squared, the Akaike information criteria, the Bayesian information criteria, and the minimization method used for fitting. Is None if fitting was skipped.

  • gui_values (dict or None) -- The values selected in the GUI for all of the various fields, which can be used to reuse the values from a past interation. Is None if fitting was skipped.

mcetl.fitting.fitting_gui.fit_to_excel(values_dataframe, params_dataframe, descriptors_dataframe, excel_writer_handler, sheet_name=None, plot_excel=False)

Outputs the relevant data from peak fitting to an Excel file.

Parameters
  • values_dataframe (pd.DataFrame) -- The dataframe containing the x and y data, the y data for every individual model, the summed y data of all models, and the background, if present.

  • params_dataframe (pd.DataFrame) -- The dataframe containing the value and standard error associated with all of the parameters in the fitting (eg. coefficients for the baseline, areas and sigmas for each peak).

  • descriptors_dataframe (pd.DataFrame) -- The dataframe which contains some additional information about the fitting. Currently has the adjusted r squared, reduced chi squared, the Akaike information criteria, the Bayesian information criteria, and the minimization method used for fitting.

  • excel_writer_handler (mcetl.excel_writer.ExcelWriterHandler) -- The ExcelWriterHandler that contains the pandas ExcelWriter object for writing to Excel, and the styles for styling the cells in Excel.

  • sheet_name (str, optional) -- The Excel sheet name.

  • plot_excel (bool, optional) -- If True, will create a simple plot in Excel that plots the raw x and y data, the data for each peak, the total of all peaks, and the background if it is present.

mcetl.fitting.fitting_gui.launch_fitting_gui(dataframe=None, gui_values=None, excel_writer=None, save_excel=True, plot_excel=True, mpl_changes=None, save_when_done=True, excel_formats=None)

Convenience function to fit dataframe(s) and write their results to Excel.

Parameters
  • dataframe (pd.DataFrame or list/tuple, optional) -- The dataframe or list/tuple of dataframes to fit.

  • gui_values (dict, optional) -- A dictionary containing the default gui values to pass to fit_dataframe.

  • excel_writer (pd.ExcelWriter, optional) -- The Excel writer used to save the results to Excel. If input, the engine must be "openpyxl".

  • save_excel (bool, optional) -- If True (default), then the fit results will be saved to an Excel file.

  • plot_excel (bool, optional) -- If True (default), then the fit results will be plotted in the Excel file (if saving).

  • mpl_changes (dict, optional) -- A dictionary of changes to apply to matplotlib's rcParams file, which affects how plots look.

  • save_when_done (bool, optional) -- If True (default), then the Excel file will be saved once all dataframes are fit.

  • excel_formats (dict, optional) --

    A dictionary of formats to use when writing to Excel. The dictionary must have one of the following keys:

    'fitting_header_even', 'fitting_header_odd', 'fitting_subheader_even', 'fitting_subheader_odd', 'fitting_columns_even', 'fitting_columns_odd', 'fitting_descriptors_even', 'fitting_descriptors_odd'

    The values for each key must be a dictionary, with keys in this internal dictionary representing keyword arguments for openpyxl's NamedStyle or openpyxl.styles.NamedStyle objects. See mcetl.excel_writer.ExcelWriterHandler.styles as an example for the dictionary.

Returns

  • fit_results (list(lmfit.models.ModelResult)) -- A list of lmfit.ModelResult objects, which give information for each of the fits done on the dataframes.

  • gui_values (dict, optional) -- A dictionary containing the default gui values to pass to fit_dataframe.

  • proceed (bool) -- True if the fitting gui was not exited from prematurely, otherwise, the value is False. Useful when calling this function from an outside function that needs to know whether to continue doing peak fitting.

mcetl.fitting.fitting_utils

Provides utility functions, classes, and constants for the fitting module.

Useful functions are put here in order to prevent circular importing within the other files.

@author : Donald Erb Created on Nov 18, 2020

Notes

The functions get_model_name, get_model_object, and get_gui_name are to be used, rather than referring to the constants _TOTAL_MODELS and _GUI_MODELS because the implementation of these constants may change in the future while the output of the functions can be kept constant.

Module Contents
Functions

get_gui_name

Returns the name used in GUIs for the input model, eg. 'GaussianModel' -> 'Gaussian'.

get_is_peak

Determines if the input model is registered as a peak function.

get_model_name

Converts the model name used in GUIs to the model class name.

get_model_object

Returns the model object given a model name, eg. 'Gaussian' -> lmfit.models.GaussianModel.

numerical_area

Computes the numerical area of a peak using the trapezoidal method.

numerical_extremum

Computes the numerical maximum or minimum for a peak.

numerical_fwhm

Computes the numerical full-width-at-half-maximum of a peak.

numerical_mode

Computes the numerical mode (x-value at which the extremum occurs) of a peak.

print_available_models

Prints out a dictionary of all models supported by mcetl.

r_squared

Calculates r^2 and adjusted r^2 for a fit.

r_squared_model_result

Calculates r^2 and adjusted r^2 for a fit, given an lmfit.ModelResult.

subtract_linear_background

Returns y-values after subtracting a linear background constructed from points.

mcetl.fitting.fitting_utils.get_gui_name(model)

Returns the name used in GUIs for the input model, eg. 'GaussianModel' -> 'Gaussian'.

Parameters

model (str) -- The input model string.

Returns

The model's name as it appears in GUIs.

Return type

str

Notes

This is a convenience function to be used so that the internals of how model names are specified can change while code can always use this function.

mcetl.fitting.fitting_utils.get_is_peak(model)

Determines if the input model is registered as a peak function.

Parameters

model (str) -- The name of the model. Can either be the GUI name (eg. 'Gaussian') or the class name (eg. 'GaussianModel').

Returns

is_peak -- True if the input model is within _TOTAL_MODELS and _TOTAL_MODELS[model]['is_peak'] is True. If the model cannot be found, or if _TOTAL_MODELS[model]['is_peak'] is False, then returns False.

Return type

bool

mcetl.fitting.fitting_utils.get_model_name(model)

Converts the model name used in GUIs to the model class name.

For example, converts 'Gaussian' to 'GaussianModel' and 'Skewed Gaussian' to 'SkewedGaussianModel'.

Useful so that code can always refer to the original model name and not be affected if the name of the model used in GUIs changes. Also ensures that user-input model names are correctly interpreted.

Parameters

model (str) -- The model name used within a GUI or input by a user.

Returns

output -- The class name of the model, which can give other information by using _TOTAL_MODELS[output].

Return type

str

Raises

KeyError: -- Raised if the input model name is not valid.

mcetl.fitting.fitting_utils.get_model_object(model)

Returns the model object given a model name, eg. 'Gaussian' -> lmfit.models.GaussianModel.

Parameters

model (str) -- The name of the model desired. Can either be the model class name, such as 'GaussianModel', or the name used in GUIs, such as 'Gaussian.

Returns

output -- The class corresponding to the input model name.

Return type

lmfit.Model

mcetl.fitting.fitting_utils.numerical_area(x, y)

Computes the numerical area of a peak using the trapezoidal method.

Parameters
  • x (array-like) -- The x-values for the peak model.

  • y (array-like) -- The y-values for the peak model.

Returns

The integrated area of the peak, using the trapezoidal method.

Return type

float

mcetl.fitting.fitting_utils.numerical_extremum(y)

Computes the numerical maximum or minimum for a peak.

Parameters

y (array-like) -- The y-values for the peak model.

Returns

extremum -- The extremum with the highest absolute value from the input y data. Assumes the peak is negative if abs(min(y)) > abs(max(y)).

Return type

float

Notes

Use np.nanmin and np.nanmax instead of min and max in order to convert the output to a numpy dtype. This way, all of the numerical calculations using the output of this function will work, even if y is a list or tuple.

mcetl.fitting.fitting_utils.numerical_fwhm(x, y)

Computes the numerical full-width-at-half-maximum of a peak.

Parameters
  • x (array-like) -- The x-values for the peak model.

  • y (array-like) -- The y-values for the peak model.

Returns

The calculated full-width-at-half-max of the peak. If there are not at least two x-values at which y = extremum_y / 2, then None is returned.

Return type

float or None

Notes

First finds the x-values where y - extremum_y / 2 changes signs, and then uses linear interpolation to approximate the x-values at which y - extremum_y / 2 = 0

mcetl.fitting.fitting_utils.numerical_mode(x, y)

Computes the numerical mode (x-value at which the extremum occurs) of a peak.

Parameters
  • x (array-like) -- The x-values for the peak model.

  • y (array-like) -- The y-values for the peak model.

Returns

The x-value at which the extremum in y occurs.

Return type

float

mcetl.fitting.fitting_utils.print_available_models()

Prints out a dictionary of all models supported by mcetl.

Also prints out details for each model, including its class, the name used when displaying in GUIs, its parameters, and whether it is considered a peak function.

mcetl.fitting.fitting_utils.r_squared(y_data, y_fit, num_variables=1)

Calculates r^2 and adjusted r^2 for a fit.

Parameters
  • y_data (array-like) -- The experimental y data.

  • y_fit (array-like) -- The calculated y from fitting.

  • num_variables (int, optional) -- The number of variables used by the fitting model.

Returns

  • r_sq (float) -- The r squared value for the fitting.

  • r_sq_adj (float) -- The adjusted r squared value for the fitting, which takes into account the number of variables in the fitting model.

mcetl.fitting.fitting_utils.r_squared_model_result(fit_result)

Calculates r^2 and adjusted r^2 for a fit, given an lmfit.ModelResult.

Parameters

fit_result (lmfit.ModelResult) -- The ModelResult object from a fit.

Returns

The r^2 and adjusted r^2 values for the fitting.

Return type

tuple(float, float)

mcetl.fitting.fitting_utils.subtract_linear_background(x, y, background_points)

Returns y-values after subtracting a linear background constructed from points.

Parameters
  • x (array-like) -- The x-values of the data.

  • y (array-like) -- The y-values of the data.

  • background_points (list(list(float, float))) -- A list containing the [x, y] values for each point representing the background.

Returns

y_subtracted -- The input y-values, after subtracting the background.

Return type

np.ndarray

Notes

Assumes the background is represented by lines connecting each of the specified background points.

mcetl.fitting.models

Provides additional models for fitting data.

@author : Donald Erb Created on Nov 29, 2020

Module Contents
Classes

BreitWignerFanoModel

A modified version of lmfit's BreitWignerModel.

Functions

breit_wigner_fano_alt

An alternate Breit-Wigner-Fano lineshape that uses height rather than amplitude.

class mcetl.fitting.models.BreitWignerFanoModel(independent_vars=['x'], prefix='', nan_policy='raise', **kwargs)

Bases: lmfit.model.Model

A modified version of lmfit's BreitWignerModel.

Initializes with q=-3 rather than q=1, which gives a better peak. Also defines terms for fwhm, height, x_mode, and maximum that were not included in lmfit's implementation, where x_mode is the x position at the maximum y-value, and maximum is the maximum y-value. Further, the lmfit implementation uses amplitude as value for bwf as abs(x) approaches infinity, which is different than for most other peaks where amplitude is defined as the peak area.

Compared to lmfit's implementation, height = amplitude * q**2, and sigma (this model) = sigma (lmfit) / 2.

guess(self, data, x=None, negative=False, **kwargs)

Estimate initial model parameter values from data.

mcetl.fitting.models.breit_wigner_fano_alt(x, height=1.0, center=0.0, sigma=1.0, q=- 3.0)

An alternate Breit-Wigner-Fano lineshape that uses height rather than amplitude.

breit_wigner_fano(x, height, center, sigma, q) =

height * (1 + (x - center) / (q * sigma))**2 / (1 + ((x - center) / sigma)**2)

mcetl.fitting.peak_fitting

Functions for creating and fitting a model with peaks and a background and plotting the results.

Also contains two classes that create windows to allow selection of peak positions and background points.

@author: Donald Erb Created on Sep 14, 2019

Module Contents
Classes

BackgroundSelector

A window for selecting points to define the background of data.

PeakSelector

A window for selecting peaks on a plot, along with peak width and type.

Functions

find_peak_centers

Creates a list containing the peaks found and peaks accepted.

fit_peaks

Takes x,y data, finds the peaks, fits the peaks, and returns all relevant information.

plot_confidence_intervals

Plot the data, the fit, and the fit +- n_sig * sigma confidence intervals.

plot_fit_results

Plot the raw data, best fit, and residuals.

plot_individual_peaks

Plots each individual peak in the composite model and the total model.

plot_peaks_for_model

Plot the peaks found or added, as well as the ones found but rejected from the fitting.

class mcetl.fitting.peak_fitting.BackgroundSelector(x, y, click_list=None)

Bases: mcetl.plot_utils.EmbeddedFigure

A window for selecting points to define the background of data.

Parameters
  • x (array-like) -- The x-values to be plotted.

  • y (array-like) -- The y-values to be plotted.

  • click_list (list(list(float, float)), optional) -- A list of selected points (lists of x, y values) on the plot.

axis_2

The secondary axis on the figure which has no events.

Type

plt.Axes

Initialize self. See help(type(self)) for accurate signature.

event_loop(self)

Handles the event loop for the GUI.

Returns

A list of the [x, y] values for the selected points on the figure.

Return type

list(list(float, float))

class mcetl.fitting.peak_fitting.PeakSelector(x, y, click_list=None, initial_peak_width=1, subtract_background=False, background_type='PolynomialModel', background_kwargs=None, bkg_min=- np.inf, bkg_max=np.inf, default_model=None)

Bases: mcetl.plot_utils.EmbeddedFigure

A window for selecting peaks on a plot, along with peak width and type.

Parameters
  • x (array-like) -- The x-values to be plotted and fitted.

  • y (array-like) -- The y-values to be plotted and fitted.

  • click_list (list, optional) -- A nested list, with each entry corresponding to a peak. Each entry has the following layout: [[lmfit model, sigma function, aplitude function], [peak center, peak height, peak width]] where lmfit model is something like 'GaussianModel'. The first entry in the list comes directly from the mcetl.peak_fitting.peak_transformer function.

  • initial_peak_width (int or float) -- The initial peak width input in the plot.

  • subtract_background (bool) -- If True, will subtract the background before showing the plot.

  • background_type (str) -- String corresponding to a model in lmfit.models; used to fit the background.

  • background_kwargs (dict) -- Any keyword arguments needed to initialize the background model

  • bkg_min (float or int) -- Minimum x value to use for initially fitting the background.

  • bkg_max (float or int) -- Maximum x value to use for initially fitting the background.

  • default_model (str) -- The initial model to have selected on the plot, corresponds to a model in lmfit.models.

background

The y-values for the background function.

Type

array-like

Initialize self. See help(type(self)) for accurate signature.

event_loop(self)

Handles the event loop for the GUI.

Returns

A nested list, with each entry corresponding to a peak. Each entry has the following layout: [model, peak center, peak height, peak width] where model is something like 'Gaussian'. The first entry in the list comes directly from the mcetl.peak_fitting.peak_transformer function.

Return type

list

mcetl.fitting.peak_fitting.find_peak_centers(x, y, additional_peaks=None, height=None, prominence=np.inf, x_min=- np.inf, x_max=np.inf)

Creates a list containing the peaks found and peaks accepted.

Parameters
  • y (x,) -- x and y values for the fitting.

  • additional_peaks (list, optional) -- Peak centers that are input by the user and automatically accepted as peaks if within x_min and x_max.

  • height (int or float, optional) -- Height that a peak must have for it to be considered a peak by scipy's find_peaks.

  • prominence (int or float) -- Prominence that a peak must have for it to be considered a peak by scipy's find_peaks.

  • x_min (int or float) -- Minimum x value used in fitting the data.

  • x_max (int or float) -- Maximum x value used in fitting the data.

Returns

  • peaks_found (list) -- A list of all the peak centers found.

  • peaks_accepted (list) -- A list of peak centers that were accepted because they were within x_min and x_max.

Notes

Uses scipy's signal.find_peaks to find peaks matching the specifications, and adds those peaks to a list of additionally specified peaks.

mcetl.fitting.peak_fitting.fit_peaks(x, y, height=None, prominence=np.inf, center_offset=1.0, peak_width=1.0, default_model='PseudoVoigtModel', subtract_background=False, bkg_min=- np.inf, bkg_max=np.inf, min_sigma=0.0, max_sigma=np.inf, min_method='least_squares', x_min=- np.inf, x_max=np.inf, additional_peaks=None, model_list=None, background_type='PolynomialModel', background_kwargs=None, fit_kws=None, vary_voigt=False, fit_residuals=False, num_resid_fits=5, min_resid=0.05, debug=False, peak_heights=None)

Takes x,y data, finds the peaks, fits the peaks, and returns all relevant information.

Parameters
  • y (x,) -- x and y values for the fitting.

  • height (int or float) -- Height that a peak must have for it to be considered a peak by scipy's find_peaks.

  • prominence (int or float) -- Prominence that a peak must have for it to be considered a peak by scipy's find_peaks.

  • center_offset (int or float) -- Value that determines the min and max parameters for the center of each peak. min = center - center_offset, max = center + center_offset.

  • peak_width (int or float) -- A guess at the full-width-half-max of the peaks. When first estimating the peak parameters, only the data from x - peak_width / 2 to x + peak_width / 2 is used to fit the peak.

  • default_model (str) -- Model used if the model_list is None or does not have enough values for the number of peaks found. Must correspond to one of the built-in models in lmfit.models.

  • subtract_background (bool) -- If True, it will fit the background with the model in background_type.

  • bkg_min (int or float) -- Minimum x value to use for initially fitting the background.

  • bkg_max (int or float) -- Maximum x value to use for initially fitting the background.

  • min_sigma (int or float) -- Minimum value for the sigma for each peak; typically better to not set to any value other than 0.

  • max_sigma (int or float) -- maximum value for the sigma for each peak; typically better to not set to any value other than infinity.

  • min_method (str) -- Minimization method used for fitting.

  • x_min (int or float) -- Minimum x value used in fitting the data.

  • x_max (int or float) -- Maximum x value used in fitting the data.

  • additional_peaks (list) -- Peak centers that are input by the user and automatically accepted as peaks.

  • model_list (list) -- List of strings, with each string corresponding to one of the models in lmfit.models.

  • background_type (str) -- String corresponding to a model in lmfit.models; used to fit the background.

  • background_kwargs (dict, optional) -- Any keyword arguments needed to initialize the background model.

  • fit_kws (dict) -- Keywords to be passed on to the minimizer.

  • vary_voigt (bool) -- If True, will allow the gamma parameter in the Voigt model to be varied as an additional variable; if False, gamma will be set equal to sigma.

  • fit_residuals (bool) -- If True, it will attempt to fit the residuals after the first fitting to find hidden peaks, add them to the model, and fit the data again.

  • num_resid_fits (int) -- Maximum number of times the program will loop to fit the residuals.

  • min_resid (int or float) -- used as the prominence when finding peaks in the residuals, in which the prominence is set to min_resid * (y_max - y_min).

  • debug (bool) -- If True, will create plots at various portions of the code showing the peak initialization, initial fits and backgrounds, and residuals.

  • peak_heights (list) -- A list of peak heights.

Returns

output -- A dictionary of lists. For most lists, each entry corresponds to the results of a peak fitting. The dictionary has the following keys (in this order if unpacked into a list):

'resid_peaks_found':

All peak centers found during fitting of the residuals. Each list entry corresponds to a separate fitting.

'resid_peaks_accepted':

Peak centers found during fitting of the residuals which were accepted as true peak centers. Each list entry corresponds to a separate fitting.

'peaks_found':

All peak centers found during the initial peak fitting of the original data.

'peaks_accepted':

Peaks centers found during the initial peak fitting of the original data which were accepted as true peak centers

'initial_fits':

List of initial fits. Each list entry corresponds to a separate fitting.

'fit_results':

List of lmfit's ModelResult objects, which contain the majority of information needed. Each list entry corresponds to a separate fitting.

'individual_peaks':

Nested list of y-data values for each peak. Each list entry corresponds to a separate fitting.

'best_values:

Nested list of best fit parameters (such as amplitude, fwhm, etc.) for each fitting. Each list entry corresponds to a separate fitting.

Return type

dict

Notes

Uses several of the functions within this module to directly take (x,y) data and do peak fitting. All relevant data is returned from this function, so it has quite a dense input and output, but it is well worth it!

For the minimization method (min_method), the 'least_squares' method gives fast performance compared to most other methods while still having good convergence. The 'leastsq' is also very good, and seems less succeptible to getting caught in local minima compared to 'least_squares', but takes longer to evaluate. A best practice could be to use 'least_squares' during initial testing and then using 'leastsq' for final calculations.

Global optimizers do not seem to work on complicated models (or rather I cannot get them to work), so using the local optimizers 'least_squares' or 'leastsq' and adjusting the inputs will have to suffice.

Most relevant data is contained within output['fit_results'], such as the parameters for each peak, as well as all the error associated with the fitting.

Within this function:

params == parameters for all of the peaks bkg_params == parameters for the background composite_params == parameters for peaks + background

model == CompositeModel object for all of the peaks background == Model object for the background composite_model == CompositeModel object for the peaks + background

peaks_accepted are not necessarily the peak centers after fitting, just the peak centers that were found. These values can be used for understanding the peak selection process.

mcetl.fitting.peak_fitting.plot_confidence_intervals(fit_result, n_sig=3, return_figure=False)

Plot the data, the fit, and the fit +- n_sig * sigma confidence intervals.

Parameters
  • fit_result (lmfit.ModelResult) -- The ModelResult object for the fitting.

  • n_sig (float, optional) -- The multiple of the standard error to use as the plotted error. The default is 3.

  • return_figure (bool, optional) -- If True, will return the created figure; otherwise, it will display the figure using plt.show(block=False) (default).

Returns

If return_figure is True, the figure is returned; otherwise, None is returned.

Return type

plt.Figure or None

Notes

This function assumes that the independant variable in the fit_result was labeled 'x', as is standard for all built-in lmfit models.

mcetl.fitting.peak_fitting.plot_fit_results(fit_result, label_rsq=False, plot_initial=False, return_figure=False)

Plot the raw data, best fit, and residuals.

Parameters
  • fit_result (lmfit.ModelResult or list(lmfit.ModelResult)) -- A ModelResult object from lmfit; can be a list of ModelResults, in which case, the initial fit will use fit_result[0], and the best fit will use fit_result[-1].

  • label_rsq (bool, optional) -- If True, will put a label with the adjusted r squared value of the fitting.

  • plot_initial (bool, optional) -- If True, will plot the initial fitting as well as the best fit.

  • return_figure (bool, optional) -- If True, will return the created figure; otherwise, it will display the figure using plt.show(block=False) (default).

Returns

If return_figure is True, the figure is returned; otherwise, None is returned.

Return type

plt.Figure or None

Notes

This function assumes that the independant variable in the fit_result was labeled 'x', as is standard for all built-in lmfit models.

mcetl.fitting.peak_fitting.plot_individual_peaks(fit_result, individual_peaks, background_subtracted=False, plot_subtract_background=False, plot_separate_background=False, plot_w_background=False, return_figures=False)

Plots each individual peak in the composite model and the total model.

Parameters
  • fit_result (lmfit.ModelResult) -- The ModelResult object from the fitting.

  • individual_peaks (dict) -- A dictionary with keys corresponding to the model prefixes in the fitting, and their values corresponding to their y values.

  • background_subtracted (bool) -- Whether or not the background was subtracted in the fitting.

  • plot_subtract_background (bool) -- If True, subtracts the background from the raw data as well as all the peaks, so everything is nearly flat.

  • plot_separate_background (bool) -- If True, subtracts the background from just the individual peaks, while also showing the raw data with the composite model and background.

  • plot_w_background (bool) -- If True, has the background added to all peaks, i.e. it shows exactly how the composite model fits the raw data.

  • return_figures (bool, optional) -- If True, will return the created figures; otherwise, it will display the figures using plt.show(block=False) (default).

Returns

If return_figures is True, the list of created figures is returned. Otherwise, None is returned.

Return type

list(plt.Figure) or None

Notes

This function assumes that the independant variable in the fit_result was labeled 'x', as is standard for all built-in lmfit models.

mcetl.fitting.peak_fitting.plot_peaks_for_model(x, y, x_min, x_max, peaks_found, peaks_accepted, additional_peaks)

Plot the peaks found or added, as well as the ones found but rejected from the fitting.

Parameters
  • x (array-like) -- x data used in the fitting.

  • y (array-like) -- y data used in the fitting.

  • x_min (float or int) -- Minimum x values used for the fitting procedure.

  • x_max (float or int) -- Maximum x values used for the fitting procedure.

  • peaks_found (list) -- A list of x values corresponding to all peaks found throughout the peak fitting and peak finding process, as well as those input by the user.

  • peaks_accepted (list) -- A list of x values corresponding to peaks found throughout the peak fitting and peak finding process that were accepted into the model.

  • additional_peaks (list) -- A list of peak centers that were input by the user.

mcetl.plotting

Contains functions that ease the plotting of data. The main entry is through mcetl.plotting.launch_plotting_gui. Can also reopen a previouly saved figure using mcetl.plotting.load_previous_figure.

@author: Donald Erb Created on Nov 15, 2020

Submodules
mcetl.plotting.plotting_gui

GUIs to plot data using various plot layouts and save the resulting figures.

@author: Donald Erb Created on Jun 28, 2020

mcetl.plotting.plotting_gui.COLORS

A tuple with values that are used in GUIs to select the color to plot with in matplotlib. The default is ('None', 'Black', 'Blue', 'Red', Green', 'Chocolate', 'Magenta', 'Cyan', 'Orange', 'Coral', 'Dodgerblue').

Type

tuple(str, ..)

mcetl.plotting.plotting_gui.LINE_MAPPING

A dictionary with keys that are displayed in GUIs, and values that are used by matplotlib to specify the line style. The default is {

'None': '', 'Solid': '-', 'Dashed': '--', 'Dash-Dot': '-.', 'Dot': ':', 'Dash-Dot-Dot': (0, [0.75 * plt.rcParams['lines.dashdot_pattern'][0]] +

plt.rcParams['lines.dashdot_pattern'][1:] + plt.rcParams['lines.dashdot_pattern'][-2:])

}

Type

dict

mcetl.plotting.plotting_gui.MARKERS

A tuple of strings for the default markers to use for plotting. The default is (' None', 'o Circle', 's Square', '^ Triangle-Up', 'D Diamond', 'v Triangle-Down', 'p Pentagon', '< Triangle-Left', '> Triangle-Right', '* Star').

Type

tuple(str, ..)

mcetl.plotting.plotting_gui.TIGHT_LAYOUT_PAD

The padding placed between the edge of the figure and the edge of the canvas; used by matplotlib's tight_layout option. Default is 0.3.

Type

float

mcetl.plotting.plotting_gui.TIGHT_LAYOUT_H_PAD

The height (vertical) padding between axes in a figure; used by matplotlib's tight_layout option. Default is 0.6.

Type

float

mcetl.plotting.plotting_gui.TIGHT_LAYOUT_W_PAD

The width (horizontal) padding between axes in a figure; used by matplotlib's tight_layout option. Default is 0.6.

Type

float

Module Contents
Functions

launch_plotting_gui

Convenience function to plot lists of dataframes with matplotlib.

load_previous_figure

Load the options and the data to recreate a figure.

mcetl.plotting.plotting_gui.launch_plotting_gui(dataframes=None, mpl_changes=None, input_fig_kwargs=None, input_axes=None, input_values=None)

Convenience function to plot lists of dataframes with matplotlib.

Wraps the plotting in a context manager that applies the changes to the matplotlib rcParams.

Parameters
  • dataframes (list(list(pd.DataFrame))) -- A nested list of lists of pandas DataFrames. Each list of DataFrames will create one figure.

  • mpl_changes (dict) -- Changes to matplotlib's rcParams file to alter the figure.

  • input_fig_kwargs (dict, optional) -- The fig_kwargs from a previous session. Only used if reloading a figure.

  • input_axes (dict, optional) -- A dictionary of plt.Axes objects from a reloaded session.

  • input_values (dict, optional) -- The values needed to recreate the previous gui window from a reloaded figure.

Returns

figures -- A nested list of lists, with each entry containing the matplotlib Figure, and a dictionary containing the Axes.

Return type

list(list(plt.Figure, dict(str, plt.Axes)))

mcetl.plotting.plotting_gui.load_previous_figure(filename=None, new_rc_changes=None)

Load the options and the data to recreate a figure.

Parameters
  • filename (str, optional) -- The filepath string to the csv data file to be opened.

  • new_rc_changes (dict, optional) -- New changes to matplotlib's rcParams file to alter the saved figure.

Returns

figures -- A list of figures (with len=1) using the loaded data. If no file is selected, then figures = None.

Return type

list or None

Notes

Will load the data from the csv file specified by filename. If there also exists a .figjson file with the same name as the csv file, it will be loaded to set the figure layout. Otherwise, a new figure is created.

Submodules

mcetl.data_source

The DataSource class contains all needed information for importing, processing, and saving data.

@author: Donald Erb Created on Jul 31, 2020

Module Contents
Classes

DataSource

Used to give default settings for importing data and various functions based on the source.

class mcetl.data_source.DataSource(name, *, functions=None, column_labels=None, column_numbers=None, start_row=0, end_row=0, separator=None, file_type=None, num_files=1, unique_variables=None, unique_variable_indices=None, xy_plot_indices=None, figure_rcparams=None, excel_writer_styles=None, excel_row_offset=0, excel_column_offset=0, entry_separation=0, sample_separation=0, label_entries=True)

Used to give default settings for importing data and various functions based on the source.

Parameters
  • name (str) -- The name of the DataSource. Used when displaying the DataSource in a GUI.

  • functions (list or tuple, optional) -- A list or tuple of various Function objects (CalculationFunction or PreprocessFunction or SummaryFunction) that will be used to process data for the DataSource. The order the Functions are performed in is as follows: PreprocessFunctions, CalculationFunctions, SummaryFunctions, with functions of the same type being performed in the same order as input.

  • column_labels (tuple(str) or list(str), optional) -- A list/tuple of strings that will be used to label columns in the Excel output, and to label the pandas DataFrame columns for the data.

  • column_numbers (tuple(int) or list(int), optional) -- The indices of the columns to import from raw data files.

  • start_row (int, optional) -- The first row of data to use when importing from raw data files.

  • end_row (int, optional) -- The last row of data to use when importing from raw data files. Counts up from the last row, so the last row is 0, the second to last row is 1, etc.

  • separator (str, optional) -- The separator or delimeter to use to separate data columns when importing from raw data files. For example, ',' for csv files.

  • file_type (str, optional) -- The file extension associated with the data files for the DataSource. For example, 'txt' or 'csv'.

  • num_files (int, optional) -- The number of data files per sample for the DataSource. Only used when using keyword search for files.

  • unique_variables (list(str) or tuple(str), optional) -- The names of all columns from the imported raw data that are needed for calculations. For example, if importing thermogravimetric analysis (TGA) data, the unique_variables could be ['temperature', 'mass'].

  • unique_variable_indices (list(int) or tuple(int), optional) -- The indices of the columns within column_numbers that correspond with each of the input unique_variables.

  • xy_plot_indices (list(int, int) or tuple(int, int), optional) -- The indices of the columns after processing that will be the default columns for plotting in Excel.

  • figure_rcparams (dict, optional) -- A dictionary containing any changes to Matplotlib's rcParams to use if fitting or plotting.

  • excel_writer_styles (dict(str, None or dict or str or openpyxl.styles.named_styles.NamedStyle), optional) --

    A dictionary of styles used to format the output Excel workbook. The following keys are used when writing data from files to Excel:

    'header_even', 'header_odd', 'subheader_even', 'subheader_odd', 'columns_even', 'columns_odd'

    The following keys are used when writing data fit results to Excel:

    'fitting_header_even', 'fitting_header_odd', 'fitting_subheader_even', 'fitting_subheader_odd', 'fitting_columns_even', 'fitting_columns_odd', 'fitting_descriptors_even', 'fitting_descriptors_odd'

    The values for the dictionaries must be either dictionaries, with keys corresponding to keyword inputs for openpyxl's NamedStyle, or NamedStyle objects.

  • excel_row_offset (int, optional) -- The first row to use when writing to Excel. A value of 0 would start at row 1 in Excel, 1 would start at row 2, etc.

  • excel_column_offset (int, optional) -- The first column to use when writing to Excel. A value of 0 would start at column 'A' in Excel, 1 would start at column 'B', etc.

  • entry_separation (int, optional) -- The number of blank columns to insert between data entries when writing to Excel.

  • sample_separation (int, optional) -- The number of blank columns to insert between samples when writing to Excel.

  • label_entries (bool, optional) -- If True, will add a number to the column labels for each entry in a sample if there is more than one entry. For example, the column label 'data' would become 'data, 1', 'data, 2', etc.

excel_styles

A nested dictionary of dictionaries, used to create openpyxl NamedStyle objects to format the output Excel file.

Type

dict(dict)

lengths

A list of lists of lists of integers, corresponding to the number of columns in each individual entry in the total dataframes for the DataSource. Used to split the concatted dataframe back into individual dataframes for each dataset.

Type

list

references

A list of dictionaries, with each dictionary containing the column numbers for each unique variable and calculation for the merged dataframe of each dataset.

Type

list

Raises
  • ValueError -- Raised if the input name is a blank string, or if either excel_row_offset or excel_column_offset is < 0.

  • TypeError -- Raised if one of the input functions is not a valid mcetl.FunctionBase object.

  • IndexError -- Raised if the number of data columns is less than the number of unique variables.

merge_datasets(self, dataframes)

Merges all entries and samples into one dataframe for each dataset.

Also sets the length attribute, which will later be used to separate each dataframes back into individual dataframes for each entry.

Parameters

dataframes (list(list(list(pd.DataFrame)))) -- A nested list of list of lists of dataframes.

Returns

merged_dataframes -- A list of dataframes.

Return type

list(pd.DataFrame)

print_column_labels_template(self)

Convenience function that will print a template for all the column headers.

Column headers account for all of the columns imported from raw data, the columns added by CalculationFunctions, and the columns added by SummaryFunctions.

Returns

label_template -- The list of strings that serves as a template for the necessary input for column_labels for the DataSource.

Return type

list(str)

split_into_entries(self, merged_dataframes)

Splits the merged dataset dataframes back into dataframes for each entry.

Parameters

merged_dataframes (list(pd.DataFrame)) -- A list of dataframes. Each dataframe will be split into lists of lists of dataframes.

Returns

split_dataframes -- A list of lists of lists of dataframes, corresponding to entries and samples within each dataset.

Return type

list(list(list(pd.DataFrame)))

static test_excel_styles(styles)

Tests whether the input styles create valid Excel styles with openpyxl.

Parameters

styles (dict(str, None or dict or str or openpyxl.styles.named_styles.NamedStyle)) -- The dictionary of styles to test. Values in the dictionary can either be None, a nested dictionary with the necessary keys and values to create an openpyxl NamedStyle, a string (which would refer to another NamedStyle.name), or openpyxl.styles.NamedStyle objects.

Returns

Returns True if all input styles successfully create openpyxl NamedStyle objects; otherwise, returns False.

Return type

bool

Notes

This is just a wrapper of ExcelWriterHandler.test_excel_styles(), and is included because DataSource is the main-facing object of mcetl and will be used more often.

mcetl.excel_writer

ExcelWriterHandler class, used to safely open and close files and apply styles.

@author : Donald Erb Created on Dec 9, 2020

Notes

openpyxl is imported within methods of ExcelWriterHandler in order to reduce the import time of mcetl since the writer is only needed if writing to Excel.

Module Contents
Classes

ExcelWriterHandler

A helper for pandas's ExcelWriter for opening/saving files and applying styles.

class mcetl.excel_writer.ExcelWriterHandler(file_name=None, new_file=False, styles=None, writer=None, **kwargs)

A helper for pandas's ExcelWriter for opening/saving files and applying styles.

This class is used for ensuring that an existing file is closed before saving and/or writing, if appending, and that all desired styles are ready for usage before writing to Excel. Styles can either be openpyxl NamedStyle objects in order to make the style available in the Excel workbook, or dictionaries detailing the style, such as {'font': Font(...), 'border': Border(...), ...}.

Parameters
  • file_name (str or Path) -- The file name or path for the Excel file to be created.

  • new_file (bool, optional) -- If False (default), will append to an existing file. If True, or if no file currently exists with the given file_name, a new file will be created, even if a file with the same name currently exists.

  • styles (dict(str, None or dict or str or openpyxl.styles.named_styles.NamedStyle)) -- A dictionary of either nested dictionaries used to create openpyxl style objects (Alignment, Font, etc.), a string indicating the name of an openpyxl NamedStyle to use, a NamedStyle object or None (will use the default style if None). All styles in the dictionary will be added to the Excel workbook if they do not currently exist in the workbook. See Examples below to see various valid inputs for styles.

  • writer (pd.ExcelWriter or None) -- The ExcelWriter (_OpenpyxlWriter from pandas) used for writing to Excel. If it is a pd.ExcelWriter, its engine must be "openpyxl".

  • **kwargs -- Any additional keyword arguments to pass to pd.ExcelWriter.

styles

A nested dictionary of dictionaries, used to create openpyxl NamedStyle objects to include in self.writer.book. The styles are used as a class attribute to ensure that the necessary styles are always included in the Excel book.

Type

dict(str, dict)

style_cache

The currently implemented styles within the Excel workbook. Used to quickly apply styles to cells without having to constanly set all of the cell attributes (cell.font, cell.fill, etc.). The dictionary value is a tuple of the cell attribute name and the value to set. Will either be ('style', string indicating NamedStyle.name) if using NamedStyles or ('_style', openpyxl StyleArray) to indicate an anonomous style. The call to set the cell attribute for a desired key would be setattr(cell, *style_cache[key]).

Type

dict(str, tuple(str, str or openpyxl.styles.cell_style.StyleArray))

writer

The ExcelWriter (_OpenpyxlWriter from pandas) used for writing to Excel.

Type

pd.ExcelWriter

Notes

Either file_name or writer must be specified at initialization.

Examples

Below is a partial example of various allowable input styles. Can be openpyxl NamedStyle, str, None, or dictionary. (Note that NamedStyle, Font, Border, and Side are gotten by importing from openpyxl.styles)

>>> styles = {
        # Would make the style 'Even Header' available in the output Excel file
        'fitting_header_even': NamedStyle(
            name='Even Header',
            font=Font(size=12, bold=True),
            border=Border(bottom=Side(style='thin')),
            number_format='0.0'
        ),
        # Would use same format as 'fitting_header_even'
        'fitting_header_odd': 'Even Header',
        # Basically just replaces NamedStyle from 'fitting_header_even' with
        # dict and removes the 'name' key. A new style would not be created
        # in the output Excel file.
        'fitting_subheader_even': dict(
            font=Font(size=12, bold=True),
            aligment=Aligment(bottom=Side(style='thin')),
            number_format='0.0'
        ),
        # Same as 'fitting_subheader_even', but doesn't require importing
        # from openpyxl. Basically just replaces all openpyxl objects with dict.
        'fitting_subheader_odd': dict(
            font=dict(size=12, bold=True),
            aligment=dict(bottom=dict(style='thin')),
            number_format='0.0'
        ),
        # Same as 'fitting_subheader_odd', but will create a NamedStyle (and
        # add the style to the Excel file) since 'name' is within the dictionary.
        'fitting_columns_even': dict(
            name='New Style',
            font=dict(size=12, bold=True),
            aligment=dict(bottom=dict(style='thin')),
            number_format='0.0'
        ),
        # Would use default style set by openpyxl
        'fitting_columns_odd': {},
        # Would also use default style
        'fitting_descriptors_even': None
    }
Raises
  • TypeError -- Raised if both file_name and writer are None.

  • ValueError -- Raised if the engine of the input writer is not "openpyxl".

add_styles(self, styles)

Adds styles to the Excel workbook and update self.style_cache.

Parameters

styles (dict(str, None or dict or str or openpyxl.styles.named_styles.NamedStyle)) -- A dictionary of either nested dictionaries used to create openpyxl style objects (Alignment, Font, etc.), a string indicating the name of an openpyxl NamedStyle to use, a NamedStyle object or None (will use the default style if None). All styles in the dictionary will be added to the Excel workbook if they do not currently exist in the workbook.

Notes

The ordering of items within styles will be preserved, so that if two NamedStyles are input with the same name, the one appearing first in the dictionary will be created, and the second will be made to refer to the first.

save_excel_file(self)

Tries to save the Excel file, and handles any PermissionErrors.

Saving can be cancelled if other changes to self.writer are desired before saving, or if saving is no longer desired (the file must be open while trying to save to allow cancelling the save).

classmethod test_excel_styles(cls, styles)

Tests whether the input styles create valid Excel styles using openpyxl.

Parameters

styles (dict(str, None or dict or str or openpyxl.styles.named_styles.NamedStyle)) -- The dictionary of styles to test. Values in the dictionary can either be None, a nested dictionary with the necessary keys and values to create an openpyxl NamedStyle, a string (which would refer to another NamedStyle.name), or openpyxl.styles.NamedStyle objects.

Returns

success -- Returns True if all input styles successfully create openpyxl NamedStyle objects; otherwise, returns False.

Return type

bool

mcetl.file_organizer

Provides GUIs to find files containing combinations of keywords and move files.

@author: Donald Erb Created on Sep 2, 2019

Module Contents
Functions

file_finder

Finds files that match the given keywords and file type using a GUI.

file_mover

Takes in a list of file paths and moves a copy of each file to the new folder.

manual_file_finder

Allows manual selection for the files for the selected samples and datasets.

mcetl.file_organizer.file_finder(file_directory=None, file_type=None, num_files=None)

Finds files that match the given keywords and file type using a GUI.

Parameters
  • file_directory (str) -- String for the topmost folder under which all files are searched.

  • file_type (str) -- The file extension that is being searched, eg. csv, txt, pdf.

  • num_files (int) -- The default maximum and minimum number of files to be associated with each search term.

Returns

output_list -- A nested list of lists containing the file locations as strings for the files that matched the search term. len(output_list) is equal to the number of datasets, len(output_list[i]) is equal to the number of unique keywords for dataset i, and len(outupt_list[i][j]) is equal to the number of files for dataset i and unique keyword j.

Return type

list

mcetl.file_organizer.file_mover(file_list, new_folder=None, skip_same_files=True)

Takes in a list of file paths and moves a copy of each file to the new folder.

Parameters
  • file_list (list, tuple, or str) -- A list of strings corresponding to file paths, all of which will have their copies moved.

  • new_folder (str or Path) -- The folder to move all of copies of the files in the file_list into.

  • skip_same_files (bool) -- If True, will not move any copied files if they already exist in the destination folder; if False, will rename the copied file and move it to the destination folder.

Returns

new_folder -- The string of the destination folder location.

Return type

str

mcetl.file_organizer.manual_file_finder(file_type=None)

Allows manual selection for the files for the selected samples and datasets.

Parameters

file_type (str, optional) -- The desired file extension for all files.

Returns

files -- A list of lists of lists of file paths. Each list of lists corresponds to a dataset, and each internal list corresponds to a sample in the dataset.

Return type

list(list(list(str)))

mcetl.functions

Contains the classes for Functions objects.

There are three main types of Functions:
  1. PreprocessFunction: preprocesses the imported data entry; for example, can

    separate into multiple data entries or remove data columns.

  2. CalculationFunction: performs a calculation on each of the entries within

    each sample within each dataset.

  3. SummaryFunction: performs a calculation once per sample or once per

    dataset.

@author: Donald Erb Created on Jul 31, 2020

Module Contents
Classes

CalculationFunction

Function that performs a calculation for every entry in each sample.

PreprocessFunction

Function for processing data before performing any calculations.

SummaryFunction

Calculation that is only performed once per sample or once per dataset.

class mcetl.functions.CalculationFunction(name, target_columns, functions, added_columns, function_kwargs=None)

Bases: mcetl.functions._FunctionBase

Function that performs a calculation for every entry in each sample.

Parameters
  • name (str) -- The string representation for this object.

  • target_columns (str or list(str) or tuple(str)) -- A string or list/tuple of strings designating the target columns for this object.

  • functions (Callable or list(Callable, Callable) or tuple(Callable, Callable)) -- The functions that this object uses to process data. If only one function is given, it is assumed that the same function is used for both calculations for the data to be written to Excel and the data to be used in python. If a list/tuple of functions are given, it is assumed that the first function is used for processing the data to write to Excel, and the second function is used for processing the data to be used in python. The function should take args of list(list(pd.DataFrame)), target indices (a list of lists of lists of numbers, corresponding to the column index in each dataframe for each of the target columns), added columns (a list of lists of numbers, corresponding to the column index in the dataframe for each added column), the Excel columns (a list of column names corresponding to the column in Excel for each added column, eg 'A', 'B', if doing the Excel calculations, otherwise None), the first row (the row number of the first row of data in Excel, eg 3). The function should output a list of lists of pd.DataFrames. The Excel columns and first row values are meant to ease the writing of formulas for Excel.

  • added_columns (int or str or list(str) or tuple(str)) -- The columns that will be acted upon by this object's functions. If the input is an integer, then it denotes that the functions act on columns that need to be added, with the number of columns affected by the functions being equal to the input integer. If the input is a string or list/tuple of strings, it denotes that the functions will change the contents of an existing column(s), whose column names are the inputs.

  • function_kwargs (dict or list(dict, dict), optional) -- A dictionary or a list of two dictionaries containing keyword arguments to be passed to the functions. If a list of two dictionaries is given, the first and second dictionaries will be the keyword arguments to pass to the function for processing the data to write to Excel and the function for processing the data to be used in python, respectively. If a single dictionary is given, then it is used for both functions. The default is None, which passes an empty dictionary to both functions.

Raises

ValueError -- Raised if there is an issue with added_columns or target_columns, or if any key in the input function_kwargs is within self._forbidden_keys.

class mcetl.functions.PreprocessFunction(name, target_columns, function, function_kwargs=None, deleted_columns=None)

Bases: mcetl.functions._FunctionBase

Function for processing data before performing any calculations.

For example, can separate a single data entry into multiple entries depending on a criteria or delete unneeded columns.

Parameters
  • name (str) -- The string representation for this object.

  • target_columns (str or list(str) or tuple(str)) -- A string or list/tuple of strings designating the target columns for this object.

  • function (Callable) -- The function that this object uses to process data. The function should take args of dataframe and target indices (a list of numbers, corresponding to the column index in the dataframe for each of the target columns), and should return a list of dataframes.

  • function_kwargs (dict, optional) -- A dictionary of keywords and values to be passed to the function. The default is None.

  • deleted_columns (str or list(str) or tuple(str), optional) -- The names of columns that will be deleted by this object's function.

Raises

ValueError -- Raised if there is an issue with the input name or target_columns.

class mcetl.functions.SummaryFunction(name, target_columns, functions, added_columns, function_kwargs=None, sample_summary=True)

Bases: mcetl.functions.CalculationFunction

Calculation that is only performed once per sample or once per dataset.

Parameters
  • name (str) -- The string representation for this object.

  • target_columns (str or list(str) or tuple(str)) -- A string or list/tuple of strings designating the target columns for this object.

  • functions (Callable or list(Callable, Callable) or tuple(Callable, Callable)) -- The functions that this object uses to process data. If only one function is given, it is assumed that the same function is used for both calculations for the data to be written to Excel and the data to be used in python. If a list/tuple of functions are given, it is assumed that the first function is used for processing the data to write to Excel, and the second function is used for processing the data to be used in python. The function should take args of list(list(pd.DataFrame)), target indices (a list of lists of lists of numbers, corresponding to the column index in each dataframe for each of the target columns), added columns (a list of lists of numbers, corresponding to the column index in the dataframe for each added column), the Excel columns (a list of column names corresponding to the column in Excel for each added column, eg 'A', 'B', if doing the Excel calculations, otherwise None), the first row (the row number of the first row of data in Excel, eg 3). The function should output a list of lists of pd.DataFrames. The Excel columns and first row values are meant to ease the writing of formulas for Excel.

  • added_columns (int or str or list(str) or tuple(str)) -- The columns that will be acted upon by this object's functions. If the input is an integer, then it denotes that the functions act on columns that need to be added, with the number of columns affected by the functions being equal to the input integer. If the input is a string or list/tuple of strings, it denotes that the functions will change the contents of an existing column(s), whose column names are the inputs. Further, SummaryFunctions can only modify other SummaryFunction columns with matching sample_summary attributes.

  • function_kwargs (dict or list(dict, dict), optional) -- A dictionary or a list of two dictionaries containing keyword arguments to be passed to the functions. If a list of two dictionaries is given, the first and second dictionaries will be the keyword arguments to pass to the function for processing the data to write to Excel and the function for processing the data to be used in python, respectively. The default is None, which passes an empty dictionary to both functions.

  • sample_summary (bool, optional) -- If True (default), denotes that the SummaryFunction summarizes a sample; if False, denotes that the SummaryFunction summarizes a dataset.

Raises

ValueError -- Raised if there is an issue with added_columns or target_columns, or if any key in the input function_kwargs is within self._forbidden_keys.

mcetl.main_gui

Provides GUIs to import data depending on the data source used, process and/or fit the data, and save everything to Excel.

@author: Donald Erb Created on May 5, 2020

Notes

The imports for the fitting and plotting guis are within their respective functions to reduce the time it takes for this module to be imported. Likewise, openpyxl is imported within _write_to_excel.

mcetl.main_gui.SAVE_FOLDER

The file path to the folder in which all 'previous_files_{DataSource.name}.json' files are saved. Depends on operating system.

Type

pathlib.Path

Module Contents
Functions

launch_main_gui

Goes through all steps to find files, process/fit/plot the imported data, and save to Excel.

mcetl.main_gui.launch_main_gui(data_sources, fitting_mpl_params=None)

Goes through all steps to find files, process/fit/plot the imported data, and save to Excel.

Parameters
  • data_sources (list(DataSource) or tuple(DataSource) or DataSource) -- A list or tuple of mcetl.DataSource objects, or a single DataSource.

  • fitting_mpl_params (dict, optional) -- A dictionary of changes for Matplotlib's rcParams to use during fitting. If None, will use the selected DataSource's figure_rcparams attribute.

Returns

output --

A dictionary containing the following keys and values:

'dataframes': list or None

A list of lists of dataframes, with each dataframe containing the data imported from a raw data file; will be None if the function fails before importing data, or if the only processing step taken was moving files.

'fit_results': list or None

A nested list of lists of lmfit.ModelResult objects, with each ModelResult pertaining to the fitting of a data entry, each list of ModelResults containing all of the fits for a single sample, and east list of lists pertaining to the data within one dataset. Will be None if fitting is not done, or only partially filled if the fitting process ends early.

'plot_results': list or None

A list of lists, with one entry per dataset. Each interior list is composed of a matplotlib.Figure object and a dictionary of matplotlib.Axes objects. Will be None if plotting is not done, or only partially filled if the plotting process ends early.

'writer': pd.ExcelWriter or None

The pandas ExcelWriter used to create the output Excel file; will be None if the output results were not saved to Excel.

Return type

dict

Notes

The entire function is wrapped in a try-except block. If the user exits the program early by exiting out of a GUI, a custom WindowCloseError exception is thrown, which is just passed, allowing the program is close without error. If other exceptions occur, their traceback is printed.

mcetl.plot_utils

Provides utility functions, classes, and constants for plotting.

Separated from utils.py to reduce import load time since matplotlib imports are not needed for base usage. Useful functions are put here in order to prevent circular importing within the other files.

@author: Donald Erb Created on Nov 11, 2020

mcetl.plot_utils.CANVAS_SIZE

A tuple specifying the size (in pixels) of the figure canvas used in various GUIs for mcetl. This can be modified if the user wishes a larger or smaller canvas. The default is (800, 800).

Type

tuple(int, int)

Module Contents
Classes

EmbeddedFigure

Class defining a PySimpleGUI window with an embedded matplotlib Figure.

PlotToolbar

Custom toolbar without the subplots and save figure buttons.

Functions

determine_dpi

Gives the correct dpi to fit the figure within the GUI canvas.

draw_figure_on_canvas

Places the figure and toolbar onto the canvas.

get_dpi_correction

Calculates the correction factor needed to create a figure with the desired dpi.

scale_axis

Calculates the new bounds to scale the current axis bounds.

class mcetl.plot_utils.EmbeddedFigure(x, y, click_list=None, enable_events=True, enable_keybinds=True, toolbar_class=NavigationToolbar2Tk)

Class defining a PySimpleGUI window with an embedded matplotlib Figure.

Class to be used to define BackgroundSelector and PeakSelector classes, which share common functions.

Parameters
  • x (array-like) -- The x-values to be plotted.

  • y (array-like) -- The y-values to be plotted.

  • click_list (list, optional) -- A list of selected points on the plot.

  • enable_events (bool, optional) -- If True (default), will connect self.events (defaults to self._on_click, self._on_pick, and self._on_key) to the figure when it is drawn on the canvas. If False, the figure will have no connected events.

  • enable_keybinds (bool, optional) -- If True (default), will connect the matplotlib keybind events to the figure.

  • toolbar_class (NavigationToolbar2Tk, optional) -- The class of the toolbar to place in the window. The default is NavigationToolbar2Tk.

x

The x-values to be plotted.

Type

array-like

y

The y-values to be plotted.

Type

array-like

click_list

A list of selected points on the plot.

Type

list

figure

The figure embedded in the window.

Type

plt.Figure

axis

The main axis on the figure which owns the events.

Type

plt.Axes

canvas

The PySimpleGUI canvas element in the window which contains the figure.

Type

sg.Canvas

toolbar_canvas

The PySimpleGUI canvas element that contains the toolbar for the figure.

Type

sg.Canvas

picked_object

The selected Artist objected on the figure. Useful for pick events.

Type

plt.Artist

xaxis_limits

The x axis limits when the figure is first created. The values are used to determine the size of the Ellipse place by the _create_circle function. The initial limits are used so that zooming on the figure does not change the size of the Ellipse.

Type

tuple

yaxis_limits

The y axis limits when the figure is first created.

Type

tuple

events

A list containing the events for the figure. Each item in the list is a list or tuple with the first item being the matplotlib event, such as 'pick_event', and the second item is a callable (function) to be executed when the event occurs.

Type

list(tuple(str, Callable))

canvas_size

The size, in pixels, of the figure to be created. Default is (CANVAS_SIZE[0], CANVAS_SIZE[1] - 100), which is (800, 700).

Type

tuple(float, float)

toolbar_class

The class of the toolbar to place in the window. The default is NavigationToolbar2Tk.

Type

NavigationToolbar2Tk

window

The PySimpleGUI window containing the figure.

Type

sg.Window

enable_keybinds

If True, will allow using matplotlib's keybinds to trigger events on the figure.

Type

bool

Notes

This class allows easy subclassing to create simple windows with embedded matplotlib figures.

A typical __init__ for a subclass should create the figure and axes, create the window, and then place the figure within the window's canvas. For example (note that plt designates matplotlib.pyplot)

>>> class SimpleEmbeddedFigure(EmbeddedFigure):
        def __init__(x, y, **kwargs):
            super().__init__(x, y, **kwargs)
            self.figure, self.axis = plt.subplots()
            self.axis.plot(self.x, self.y)
            self._create_window()
            self._place_figure_on_canvas()

The only function that should be publically available is the event_loop method, which should return the desired output.

To close the window, use the self._close() method, which ensures that both the window and the figure are correctly closed.

Initialize self. See help(type(self)) for accurate signature.

event_loop(self)

Handles the event loop for the GUI.

Notes

This function should typically be overwritten by a subclass, and should typically return any desired values from the embedded figure.

This simple implementation makes the window visible, and closes the window as soon as anything in the window is clicked.

class mcetl.plot_utils.PlotToolbar(fig_canvas, canvas, **kwargs)

Bases: matplotlib.backends.backend_tkagg.NavigationToolbar2Tk

Custom toolbar without the subplots and save figure buttons.

Ensures that saving is done through the save menu in the window, which gives better options for output image quality and ensures the figure dimensions are correct. The subplots button is removed so that the user does not mess with the plot layout since it is handled by using matplotlib's tight layout.

Parameters
  • fig_canvas (matplotlib.FigureCanvas) -- The figure canvas on which to operate.

  • canvas (tkinter.Canvas) -- The Canvas element which owns this toolbar.

  • **kwargs -- Any additional keyword arguments to pass to NavigationToolbar2Tk.

mcetl.plot_utils.determine_dpi(fig_height, fig_width, dpi, canvas_size=CANVAS_SIZE)

Gives the correct dpi to fit the figure within the GUI canvas.

Parameters
  • fig_height (float) -- The figure height.

  • fig_width (float) -- The figure width.

  • dpi (float) -- The desired figure dpi.

  • canvas_size (tuple(int, int), optional) -- The size of the canvas that the figure will be placed on. Defaults to CANVAS_SIZE.

Returns

The correct dpi to fit the figure onto the GUI canvas.

Return type

float

Notes

When not saving, the dpi needs to be scaled to fit the figure on the GUI's canvas, and that scaling is called size_scale. For example, if the desired size was 1600 x 1200 pixels with a dpi of 300, the figure would be scaled down to 800 x 600 pixels to fit onto the canvas, so the dpi would be changed to 150, with a size_scale of 0.5.

A dpi_scale correction is needed because the qt5Agg backend will change the dpi to 2x the specified dpi when the display scaling in Windows is not 100%. I am not sure how it works on non-Windows operating systems.

The final dpi when not saving is equal to dpi * size_scale * dpi_scale.

mcetl.plot_utils.draw_figure_on_canvas(canvas, figure, toolbar_canvas=None, toolbar_class=NavigationToolbar2Tk, kwargs=None)

Places the figure and toolbar onto the canvas.

Parameters
  • canvas (tkinter.Canvas) -- The tkinter Canvas element for the figure.

  • figure (plt.Figure) -- The figure to be place on the canvas.

  • toolbar_canvas (tkinter.Canvas, optional) -- The tkinter Canvas element for the toolbar.

  • toolbar_class (NavigationToolbar2Tk, optional) -- The toolbar class used to create the toolbar for the figure. The default is NavigationToolbar2Tk.

  • kwargs (dict, optional) -- Keyword arguments designating how to pack the figure into the window. Relevant keys are 'canvas' and 'toolbar', with values being dictionaries containing keyword arguments to pass to pack.

Returns

  • figure_canvas (FigureCanvasTkAgg or None) -- The canvas containing the figure. Is None if drawing the canvas caused an exception.

  • toolbar (NavigationToolbar2Tk or None) -- The created toolbar. Is None if toolbar_canvas is None, or if there was an error when drawing figure_canvas.

Notes

The canvas children are destroyed after drawing the figure canvas so that there is a seamless transition from the old canvas to the new canvas.

The toolbar is packed before the figure canvas because the figure canvas shoulud be more flexible to resizing if they are on the same canvas.

mcetl.plot_utils.get_dpi_correction(dpi)

Calculates the correction factor needed to create a figure with the desired dpi.

Necessary because some matplotlib backends (namely qt5Agg) will adjust the dpi of the figure after creation.

Parameters

dpi (float or int) -- The desired figure dpi.

Returns

dpi_correction -- The scaling factor needed to create a figure with the desired dpi.

Return type

float

Notes

The matplotlib dpi correction occurs when the operating system display scaling is set to any value not equal to 100% (at least on Windows, other operating systems are unknown). This may cause issues when using UHD monitors, but I cannot test.

To get the desired dpi, simply create a figure with a dpi equal to dpi * dpi_correction.

mcetl.plot_utils.scale_axis(axis_bounds, lower_scale=None, upper_scale=None)

Calculates the new bounds to scale the current axis bounds.

The new bounds are calculated by multiplying the desired scale factor by the current difference of the upper and lower bounds, and then adding (for upper bound) or subtracting (for lower bound) from the current bound.

Parameters
  • axis_bounds (tuple(float, float)) -- The current lower and upper axis bounds.

  • lower_scale (float, optional) -- The desired fraction by which to scale the current lower bound. If None, will not scale the current lower bounds.

  • upper_scale (float, optional) -- The desired fraction by which to scale the current upper bound. If None, will not scale the current upper bounds.

Returns

  • lower_bound (float) -- The lower bound after scaling.

  • upper_bound (float) -- The upper bound after scaling.

mcetl.raw_data

Creates folders and files with simulated data for various characterization techniques

Notes

All data is made up and does not correspond to the materials listed. The data is meant to simply emulate real data and allow for basic analysis.

@author: Donald Erb Created on Jun 15, 2020

Module Contents
Functions

generate_raw_data

Generates data for all of the techniques in this file.

mcetl.raw_data.generate_raw_data(directory=None, num_files=None, show_plots=False)

Generates data for all of the techniques in this file.

Convenience function to generate data for all techniques rather that calling the functions one at a time.

Parameters
  • directory (str, optional) -- The file path to place the Raw Data folder.

  • num_files (int, optional) -- The number of files to create per characterization technique.

  • show_plots (bool, optional) -- If True, will show plots of the created data. If False (default), will close the created figures and not show the plots.

Notes

Currently supported characterization techniques include:

XRD, FTIR, Raman, TGA, DSC, Rheometry, Uniaxial tensile test, Pore Size Analysis

mcetl.utils

Provides utility functions, classes, and constants.

Useful functions are put here in order to prevent circular importing within the other files.

The functions contained within this module ease the use of user-interfaces, selecting options for opening files, and working with Excel.

@author: Donald Erb Created on Jul 15, 2020

mcetl.utils.PROCEED_COLOR

The button color for all buttons that proceed to the next window. The default is ('white', '#00A949'), where '#00A949' is a bright green.

Type

tuple(str, str)

Module Contents
Functions

check_availability

Checks whether an optional dependency is available to import.

doc_lru_cache

Decorator that allows keeping a function's docstring when using functools.lru_cache.

excel_column_name

Converts 1-based index to the Excel column name.

get_min_size

Returns the minimum size for a GUI element to match the screen size.

open_multiple_files

Creates a prompt to open multiple files and add their contents to a dataframe.

optimize_memory

Optimizes dataframe memory usage by converting data types.

raw_data_import

Used to import data from the specified file into pandas DataFrames.

safely_close_window

Closes a PySimpleGUI window and removes the window and its layout.

select_file_gui

GUI to select a file and input the necessary options to import its data.

series_to_numpy

Tries to convert a pandas Series to a numpy array with the desired dtype.

set_dpi_awareness

Sets DPI awareness for Windows operating system so that GUIs are not blurry.

show_dataframes

Used to show data to help select the right columns or datasets from the data.

string_to_unicode

Converts strings to unicode by replacing '\\' with '\'.

stringify_backslash

Fixes strings containing backslash, such as '\n', so that they display properly in GUIs.

validate_inputs

Validates entries from a PySimpleGUI window and converts to the desired type.

validate_sheet_name

Ensures that the desired Excel sheet name is valid.

exception mcetl.utils.WindowCloseError

Bases: Exception

Custom exception to allow exiting a GUI window to stop the program.

Initialize self. See help(type(self)) for accurate signature.

with_traceback()

Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.

mcetl.utils.check_availability(module)

Checks whether an optional dependency is available to import.

Does not check the module version since it is assumed that the parent module will do a version check if the module is actually usable.

Parameters

module (str) -- The name of the module.

Returns

True if the module can be imported, False if it cannot.

Return type

bool

Notes

It is faster to use importlib to check the availability of the module rather than doing a try-except block to try and import the module, since importlib does not actually import the module.

mcetl.utils.doc_lru_cache(function=None, **lru_cache_kwargs)

Decorator that allows keeping a function's docstring when using functools.lru_cache.

Parameters
  • function (Callable) -- The function to use. If used as a decorator and lru_cache_kwargs are specified, then function will be None.

  • **lru_cache_kwargs -- Any keyword arguments to pass to functools.lru_cache (maxsize and/or typed, as of Python 3.9).

Examples

A basic usage of this decorator would look like:

>>> @doc_lru_cache(maxsize=200)
    def function(arg, kwarg=1)
        return arg + kwarg
mcetl.utils.excel_column_name(index)

Converts 1-based index to the Excel column name.

Parameters

index (int) -- The column number. Must be 1-based, ie. the first column number is 1 rather than 0.

Returns

col_name -- The column name for the input index, eg. an index of 1 returns 'A'.

Return type

str

Raises

ValueError -- Raised if the input index is not in the range 1 <= index <= 18278, meaning the column name is not within 'A'...'ZZZ'.

Notes

Caches the result so that any repeated index lookups are faster, and uses recursion to make better usage of the cache.

chr(64 + remainder) converts the remainder to a character, where 64 denotes ord('A') - 1, so if remainder = 1, chr(65) = 'A'.

mcetl.utils.get_min_size(default_size, scale, dimension='both')

Returns the minimum size for a GUI element to match the screen size.

Parameters
  • default_size (int) -- The default number of pixels to use. Needed because sg.Window.get_screen_size() can return the total screen size when using multiple screens on some linux systems.

  • scale (float) -- The scale factor to apply to the screen size as reported by sg.Window.get_screen_size. For example, if the element size was desired to be at most 50% of the minimum screen dimension, then the scale factor is 0.5.

  • dimension (str) -- The screen dimension to compare. Can be either 'width', 'height', or 'both'.

Returns

The minimum pixel count among scale * screen height, scale * screen width, and default_size.

Return type

int

mcetl.utils.open_multiple_files()

Creates a prompt to open multiple files and add their contents to a dataframe.

Returns

dataframes -- A list of dataframes containing the imported data from the selected files.

Return type

list

mcetl.utils.optimize_memory(dataframe, convert_objects=False)

Optimizes dataframe memory usage by converting data types.

Optimizes object dtypes by trying to convert to other dtypes, if the pandas version is greater than 1.0.0. Optimizes numerical dtypes by downcasting to the most appropriate dtype.

Parameters
  • dataframe (pd.DataFrame) -- The dataframe to optimize.

  • convert_objects (bool, optional) -- If True, will attempt to convert columns with object dtype if the pandas version is >= 1.0.0.

Returns

dataframe -- The memory-optimized dataframe.

Return type

pd.DataFrame

Notes

convert_objects is needed because currently, when object columns are converted to a dtype of string, the row becomes a StringArray object, which does not have the tolist() method curently implemented (as of pandas version 1.0.5). openpyxl's dataframe_to_rows method uses each series's series.values.tolist() method to convert the dataframe into a generator of rows, so having a StringArray row without a tolist method causes an exception when using openpyxl's dataframe_to_rows.

Do not convert object dtypes to pandas's Int and Float dtypes since they do not mesh well with other modules.

Iterate through columns one at a time rather using dataframe.select_dtypes so that each column is overwritten immediately, rather than making a copy of all the selected columns, reducing memory usage.

mcetl.utils.raw_data_import(window_values, file, show_popup=True)

Used to import data from the specified file into pandas DataFrames.

Also used to show how data will look after using certain import values.

Parameters
  • window_values (dict) -- A dictionary with keys 'row_start', 'row_end', columns', 'separator', and optionally 'sheet'.

  • file (str or pathlib.Path or pd.ExcelFile) -- A string or Path for the file to be imported, or a pandas ExcelFile, to use for reading spreadsheet data.

  • show_popup (bool) -- If True, will display a popup window showing a table of the data.

Returns

dataframes -- A list of dataframes containing the data after importing if show_popup is False, otherwise returns None.

Return type

list(pd.DataFrame) or None

Notes

If using a spreadsheet format ('xls', 'xlsx', 'odf', etc.), allows using any of the available engines for pandas.read_excel, and will just let pandas notify the user if the proper engine is not installed.

Optimizes the memory usage of the imported data before returning.

mcetl.utils.safely_close_window(window)

Closes a PySimpleGUI window and removes the window and its layout.

Used when exiting a window early by manually closing the window. Ensures that the window is properly closed and then raises a WindowCloseError exception, which can be used to determine that the window was manually closed.

Parameters

window (sg.Window) -- The window that will be closed.

Raises

WindowCloseError -- Custom exception to notify that the window has been closed earlier than expected.

mcetl.utils.select_file_gui(data_source=None, file=None, previous_inputs=None, assign_columns=False)

GUI to select a file and input the necessary options to import its data.

Parameters
  • data_source (DataSource, optional) -- The DataSource object used for opening the file.

  • file (str, optional) -- A string containing the path to the file to be imported.

  • previous_inputs (dict, optional) -- A dictionary containing the values from a previous usage of this function, that will be used to overwrite the defaults. Note, if opening Excel files, the previous_inputs will have no effect.

  • assign_columns (bool, optional) -- If True, designates that the columns for each unique variable in the data source need to be identified. If False (or if data_source is None), then will not prompt user to select columns for variables.

Returns

values -- A dictionary containing the items necessary for importing data from the selected file.

Return type

dict

Notes

If using a spreadsheet format ('xls', 'xlsx', 'odf', etc.), allows using any of the available engines for pandas.read_excel, and will just let pandas notify the user if the proper engine is not installed. The file selection window, however, will only show 'xlsx', 'xlsm', 'csv', 'txt', and potentially 'xls', so that users are not steered towards selecting a format that does not work with the default mcetl libraries.

mcetl.utils.series_to_numpy(series, dtype=float)

Tries to convert a pandas Series to a numpy array with the desired dtype.

If initial conversion does not work, tries to convert series to object first. If that is not successful and if the first item is a string, assumes the first item is a header, converts it to None, and tries the conversion. If that is still unsuccessful, then an array of the series is returned without changing the dtype.

Parameters
  • series (pd.Series) -- The series to convert to numpy with the desired dtype.

  • dtype (type, optional) -- The dtype to use in the numpy array of the series. Default is float.

Returns

output -- The input series with the specified dtype if conversion was successful. Otherwise, the output is an ndarray of the input series without dtype conversion.

Return type

np.ndarray

Notes

This function is needed because pandas's pd.NA and extension arrays do not work well with other modules and can be difficult to convert.

mcetl.utils.set_dpi_awareness(awareness_level=1)

Sets DPI awareness for Windows operating system so that GUIs are not blurry.

Fixes blurry tkinter GUIs due to weird dpi scaling in Windows os. Other operating systems are ignored.

Parameters

awareness_level ({1, 0, 2}) -- The dpi awareness level to set. 0 turns off dpi awareness, 1 sets dpi awareness to scale with the system dpi and automatically changes when the system dpi changes, and 2 sets dpi awareness per monitor and does not change when system dpi changes. Default is 1.

Raises

ValueError -- Raised if awareness_level is not 0, 1, or 2.

Notes

Will only work on Windows 8.1 or Windows 10. Not sure if earlier versions of Windows have this issue anyway.

mcetl.utils.show_dataframes(dataframes, title='Raw Data')

Used to show data to help select the right columns or datasets from the data.

Parameters
  • dataframes (list or pd.DataFrame) -- Either (1) a pandas DataFrame, (2) a list of DataFrames, or (3) a list of lists of DataFrames. The layout of the window will depend on the input type.

  • title (str, optional) -- The title for the popup window.

Returns

window -- If no exceptions occur, a PySimpleGUI window will be returned; otherwise, None will be returned.

Return type

sg.Window or None

mcetl.utils.string_to_unicode(input_list)

Converts strings to unicode by replacing '\\' with '\'.

Necessary because user input from text elements in GUIs are raw strings and will convert any '\' input by the user to '\\', which will not be converted to the desired unicode. If the string already has unicode characters, it will be left alone.

Also converts things like '\\n' and '\\t' to '\n' and '\t', respectively, so that inputs are correctly interpreted.

Parameters

input_list ((list, tuple) or str) -- A container of strings or a single string.

Returns

output -- A container of strings or a single string, depending on the input, with the unicode correctly converted.

Return type

(list, tuple) or str

Notes

Uses raw_unicode_escape encoding to ensure that any existing unicode is correctly decoded; otherwise, it would translate incorrectly.

If using mathtext in matplotlib and want to do something like $\nu$, input $\\nu$ in the GUI, which gets converted to $\\\\nu$ by the GUI, and in turn will be converted back to $\\nu$ by this fuction, which matplotlib considers equivalent to $\nu$.

mcetl.utils.stringify_backslash(input_string)

Fixes strings containing backslash, such as '\n', so that they display properly in GUIs.

Parameters

input_string (str) -- The string that potentially contains a backslash character.

Returns

output_string -- The string after replacing various backslash characters with their double backslash versions.

Return type

str

Notes

It is necessary to replace multiple characters because things like '\n' are considered unique characters, so simply replacing the '\' would not work.

mcetl.utils.validate_inputs(window_values, integers=None, floats=None, strings=None, user_inputs=None, constraints=None)

Validates entries from a PySimpleGUI window and converts to the desired type.

Parameters
  • window_values (dict) -- A dictionary of values from a PySimpleGUI window, generated by using window.read().

  • integers (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be integers.

  • floats (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be floats.

  • strings (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be non-empty strings.

  • user_inputs (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be a certain data type; the values are first determined by separating each value using ',' (default) or the last index.

  • constraints (list, optional) -- A list of lists (see Notes below), with each key corresponding to a key in the window_values dictionary and whose values should be ints or floats constrained between upper and lower bounds.

Returns

True if all data in the window_values dictionary is correct. False if there is any error with the values in the window_values dictionary.

Return type

bool

Notes

Inputs for integers, floats, and strings are

[[key, display text],].

For example: [['peak_width', 'peak width']]

Inputs for user_inputs are

[[key, display text, data type, allow_empty_input (optional), separator (optional)],],

where separator is a string, and allow_empty_input is a boolean. If no separator is given, it is assumed to be a comma (','), and if no allow_empty_input value is given, it is assumed to be False. user_inputs can also be used to run the inputs through a function by setting the data type to a custom function. Use None as the separator if only a single value is wanted. For example: [

['peak_width', 'peak width', float], # ensures each entry is a float ['peak_width_2', 'peak width 2', int, False, ';'], # uses ';' as the separator ['peak_width_3', 'peak width 3', function, False, None], # no separator, verify with function ['peak_width_4', 'peak width 4', function, True, None] # allows empty input

]

Inputs for constraints are

[[key, display text, lower bound, upper bound (optional)],],

where lower and upper bounds are strings with the operator and bound, such as "> 10". If lower bound or upper bound is None, then the operator and bound is assumed to be >=, -np.inf and <=, np.inf, respectively. For example: [

['peak_width', 'peak width', '> 10', '< 20'], # 10 < peak_width < 20 ['peak_width_2', 'peak width 2', None, '<= 5'] # -inf <= peak_width_2 <= 5 ['peak_width_3', 'peak width 3', '> 1'] # 1 < peak_width_2 <= inf

]

The display text will be the text that is shown to the user if the value of window_values[key] fails the validation.

#TODO eventually collect all errors so they can all be fixed at once.

mcetl.utils.validate_sheet_name(sheet_name)

Ensures that the desired Excel sheet name is valid.

Parameters

sheet_name (str) -- The desired sheet name.

Returns

sheet_name -- The input sheet name. Only returned if it is valid.

Return type

str

Raises

ValueError -- Raised if the sheet name is greater than 31 characters or if it contains any of the following: \, /, ?, *, [, ], :

API reference documentation was auto-generated using sphinx-autoapi.

Contributing

Contributions are welcomed and greatly appreciated.

Bugs Reports/Feedback

Report bugs or give feedback by filing an issue at https://github.com/derb12/mcetl/issues.

If you are reporting a bug, please include:

  • Your operating system, Python version, and mcetl version.

  • Any further details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce/identify the bug, including the code used and any tracebacks.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible to make it easier to implement.

Pull Requests

Pull requests are welcomed for this project, but please note that unsolicited pull requests are discouraged. Please file an issue first, so that details can be discussed/finalized before a pull request is created.

Any new code or documentation must be unique (not copied from somewhere else) and able to be covered by the BSD 3-clause license. Further, any new code should follow PEP 8 standards as closely as possible and be fully documented using Numpy style docstrings. If implementing a new feature, please provide documentation discussing its implementation, and any necessary tests.

To clone the GitHub repository and install the needed libraries for development:

git clone https://github.com/derb12/mcetl.git
pip -r install mcetl/requirements/requirements-development.txt

When submitting a pull request, follow similar procedures for a feature request, namely:

  • Explain in detail how it works.

  • Keep the scope as narrow as possible to make it easier to incorporate.

Changelog

Version 0.4.2 (2021-02-02)

This is a minor patch with a critical bug fix.

Bug Fixes

  • Fixed a crash when merging styled cells due to a change in openpyxl v3.0.6. Now cells are merged first, and then styled. The fix is fully compatible with all covered versions of openpyxl (any >= v2.4).

Version 0.4.1 (2021-01-26)

This is a minor patch with critical bug fixes.

Bug Fixes

  • Fixed grouping of column labels for the main GUI when there are more than 9 entries for a sample.

  • Fixed processing of regex separators, such as \s+ when importing data.

Version 0.4.0 (2021-01-24)

This is a minor version with new features, bug fixes, deprecations, and documentation improvements.

New Features

  • The data import window extended the support of importing from spreadsheet-like files beyond just xlsx and now supports importing from any spreadsheet-like file format supported by pandas.read_excel (eg. xls, xlsm, xlsb, ods, etc). For spreadsheet formats not supported by openpyxl, and thus not supported by the default mcetl installation, it will require installing the appropriate library. For example, to read xls files, the xlrd library would need to be installed.

  • Greatly improved load time of spreadsheets when using the data import window. Now, only one sheet is loaded into memory at a time (if supported by the engine used for reading the spreadsheet).

  • The data import window now supports importing from fixed-width files.

  • Column labels and Excel plot indices are now specified individually when using the main GUI. This allows processing uneven-sized data and working with already processed data.

  • Column labels are now added to the dataframes in launch_main_gui if any processing step is specified besides moving files.

  • Excel styles no longer have to be NamedStyles, allowing not adding styles to the output Excel workbook. Also, non-named styles are ~50% faster for writing to Excel.

  • Added test_excel_styles method to DataSource, to allow testing whether the input styles will create valid Excel styles.

  • Added a more usable window for manual file selection, which is used for launch_main_gui launch_fitting_gui, and launch_plotting_gui.

  • The main GUI will now save the previous search for each DataSource separately, with a filename of previous_files_(DataSource.name).json (eg. prevous_files_XRD.json).

  • Allow using all models available from lmfit and mcetl to use as the background for fitting. The few models that need kwargs during initialization now allow selecting them in the GUI.

  • Added several functions to mcetl.fitting.fitting_utils to allow using the lmfit model names within code while displaying shortened versions in GUIs. Also, a full list of models available through lmfit and mcetl are generated at runtime, so that the available models match for different versions.

  • Created an ExcelWriterHandler class, which is used to correctly handle opening and saving existing Excel files and to apply styles to the Excel workbook.

  • Allow specifying changes to Matplotlib's rcParams for fitting when using launch_main_gui. If None are specified, will use the selected DataSource's figure_rcparams attribute.

  • Allow columns to be removed by PreprocessFunctions, which also correctly updates the unique variable references.

  • When importing data, columns of the imported data are now sorted following the user-input column numbers (eg. if the user-specified columns were 1, 0, 2, then the data would first be imported with columns ordered as 0, 1, 2 and then the columns are rearranged to 1, 0, 2. Note that column names are still specified as ascending numbers, so even though the data corresponds to columns 1, 0, 2 in the data file, the column names in the pandas DataFrame are still 0, 1, 2). Note that this feature was already implemented when importing from spreadsheets, but is now extended to all other formats.

  • Allow using Matplotlib keybindings for most embedded figures, such as within the fitting GUI. This allows doing things like switching axis scales from linear to log.

  • Added more fit descriptors to the saved Excel file from fitting, such as whether the fit converged and the number of independant variables.

  • Added numerical calculations for area, extremum, mode, and full-width-at-half-maximum when doing data fitting. These calculations are performed on the peak models after fitting, and are included to at least give an estimate of their values for models that do not have analytical solutions, like Breit-Wigner-Fano, Moffat, and skewed Gaussian/Voigt.

  • Made the file import options more flexible by allowing user to select whether or not to use same import options for all files (option is ignored for spreadsheet-like files). Also allow inputting previous inputs for the file import function, so that any changes to the defaults can be maintained, especially helpful when manually opening files for fitting or plotting.

  • Added a modified Breit-Wigner-Fano peak model for fitting.

  • The fitting GUI now allows confirmation of the fit results before proceeding. A plot of the fit results and a printed out fit report are given, and the user can select to go back and redo the fit.

  • Allow inputting Excel styles in launch_fitting_gui so that custom styles can be used when writing fit results to Excel when using launch_fitting_gui directly.

  • Created a EmbeddedFigure class in plot_utils for easily creating windows with embedded Matplotlib figures that optionally have events. Is easily subclassed to create custom windows with embedded figures and event loops.

Bug Fixes

  • DPI awareness is no longer set immediately upon importing mcetl on Windows operating systems. That way, users can decide whether or not to set dpi awareness. To set dpi awareness, users must call mcetl.set_dpi_awareness() before opening any GUIs.

  • Correctly handle PermissionError for the main GUI when deciding which folder to save file searches to and when writing file searches to file. PermissionError is still raised if read access is not granted, so that user is made aware that they need to set the folder to something that grants access.

  • Added a function in utils for converting pandas series to numpy with a specific dtype to reduce the chance of error occuring during conversion.

  • Fixed an issue when plotting in Excel with a log scale if one of the specified bounds was <= 0.

  • The data import window will only attempt to assign indices for a DataSource's unique variables if processing.

  • The entry and sample separation columns that are added when using the main GUI if processing and writing to Excel are now removed when splitting the merged dataframes back into individual entries, so that the returned DataFrames contain only the actual data.

  • Ensured that Excel sheet names are valid.

  • Simplified writing to csv for plotting GUI. Removed the column indices when reading/writing csv data within the plotting GUI. Now, columns are just directly taken from the data.

  • Switched to using df.iloc[:, col_number] rather than df[col_number] to get columns by their indices so that dataframes with repeated column names will not produce errors.

  • Made it so that '.' is removed from the user-input file extension when doing file searching only if the '.' is the first character in the string. This way, file types with multiple extensions, like tar.gz, are now possible to use.

  • The raw data generated for Raman was accidently being saved as a csv rather than as a tab-separated txt file.

  • Fixed issue when using lmfit.models.ConstantModel for fitting, which gives a single value rather than an array. Now, replace the single value with an array with the same size as the data being fit so that it does not cause errors when plotting.

  • Fixed IndexError that occurred when using the fitting GUI and trying to fit residuals.

  • Fixed issue where Voigt models with manual peak selection and vary gamma parameter set to True would not set an initial value for gamma.

Other Changes

  • Reduced import time of mcetl. On my machine, the import time for version 0.4.0 is ~80% less than version 0.3.0.

  • Replaced sympy with asteval for parsing user expressions when creating secondary axes for the plotting GUI. This requires the user to input forward and backward expressions, but otherwise requires no changes. Also, it technically drops a requirement for mcetl, since asteval is already required for lmfit.

  • Reordered package layout. Moved all fitting related files to a mcetl.fitting, and moved all plotting related files to mcetl.plotting. This will allow expansion of the fitting and plotting sections without burdening the main folder.

  • Renamed peak_fitting_gui to fitting_gui since I intend to extend the fitting beyond just peak fitting.

  • Made all of the methods that are only internally used private for DataSource and the Function objects, so that users do not use them.

  • Updated required versions for several dependencies.

  • Added Python 3.9 to the supported Python versions.

  • Created mcetl.fitting.models, which can be filled later with any additional models. Put the modified Breit-Wigner-Fano function in fitting.models.

  • Created mcetl.plot_utils that contains all helper functions and classes for plotting.

  • The plotting GUI switched back to using "utf-8" encoding when saving data to a csv file (was made to use "raw_unicode_escape" in v0.3.0).

Deprecations/Breaking Changes

  • Renamed SeparationFunction to PreprocessFunction to make its usage more clear.

  • Changed the file extension for the theme files for the plotting GUI from ".figtheme" to ".figjson" to make it more clear that it is just a json file. Converting existing files should be easy, just change the extension.

  • mcetl.launch_peak_fitting_gui() and mcetl.launch_plotting_gui() are no longer valid. Instead, use 'from mcetl import fitting, plotting; fitting.launch_fitting_gui(); plotting.launch_plotting_gui()'.

  • The keyword arguments 'excel_writer_formats' and 'figure_rcParams' for DataSource were changed to 'excel_writer_styles' and 'figure_rcparams', respectively.

  • DataSource only accepts keyword arguments besides the first argmument, which is the DataSource's name.

  • The keyword argument 'peaks_dataframe' for mcetl.fitting.fit_to_excel was changed to 'values_dataframe' to make its usage more clear.

  • mcetl.fitting.peak_fitting.fit_peaks no longer takes the keyword 'poly_n' as an argument. Instead, the function takes the keyword 'background_kwargs' which is a dictionary for background keyword arguments, allowing any model to be used as the background. For example, to get the same behavior as with the old 'poly_n' keyword, the new input would be background_kwargs={'degree': 1}.

  • Renamed datasource.py to data_source.py. This should have little effect on user code since the DataSource object is available through the main mcetl namespace.

  • Renamed the keyword argmument vary_Voigt for mcetl.fitting.peak_fitting.fit_peaks to vary_voigt.

  • The constants mcetl.main_gui.SAVE_FOLDER and mcetl.fitting.peak_fitting._PEAK_TRANSFORMS are used instead of the functions mcetl.main_gui.get_save_location (now _get_save_location) and mcetl.fitting.peak_fitting.peak_transformer (now _peak_transformer), respectively. This way, do not need to repeatedly call the functions, and their contents can be alterred by users, if desired.

Documentation/Examples

  • Improved the api documentation, added tutorials, and improved the overall documentation.

  • Updated example programs for all of the new changes in version 0.4.0.

  • Added an example program showing how to use just mcetl.fitting.fit_peaks to do peak fitting instead of using the fitting GUI.

  • Changed the readthedocs config to create static htmlzip files in addition to pdf files each time the documentation is built.

Version 0.3.0 (2020-11-08)

This is a minor version with new features, bug fixes, deprecations, and documentation improvements.

New Features

  • Added functions to generate_raw_data.py to create data for pore size analysis (emulating the output of the ImageJ software when analyzing images), uniaxial tensile tests, and rheometry.

  • The plotting GUI now uses "raw_unicode_escape" encoding when saving data to a csv file. This has no impact on the data after reloading, but it makes any Unicode more readable in the csv file. The module still uses "utf-8" encoding as the default when loading csv files, but will fall back to "raw_unicode_escape" in the event "utf-8" encoding errors.

  • Validation of user-input in the GUIs now converts the string inputs into the desired data type during validation, rather than requiring further processing after validation. Updated all modules for this new change.

  • Added the ability to use constraints in the data validation function for user-inputs, allowing user-inputs to be bounded between two values.

Bug Fixes

  • Fixed issue where an additional set of data entry column labels was erroneously created when using a SummaryCalculation object for summarizing data for a sample.

  • Fixed issue using sorted() with strings rather than integers when sorting the indices of datasets to be deleted when using the plotting GUI.

  • Fixed the naming of the standard error for parameters from peak fitting in the output Excel file from "standard deviation" to "standard error".

Other Changes

  • The output of the launch_main_gui function is now a single dictionary. This will allow potential changes to the output in later versions to not cause breaking changes.

  • The output of launch_main_gui now includes the ExcelWriter object used when saving to Excel. This allows access to the Excel file in Python after running the launch_main_gui function, in case further processing is desired.

  • The peak_fitting_gui module now includes full coverage for the data validation of user-inputs for all events.

Deprecations/Breaking Changes

  • The output of the launch_main_gui function was changed from a tuple of items to a single, dictionary output.

Documentation/Examples

  • Added DataSource objects to the use_main_gui.py example program for the three new raw data types. These analyses are more in-depth than the existing DataSource objects, and involve both CalculationFunction and SummaryFunction objects.

  • Changed the Changelog to group changes into categories rather than labelling each change with FEATURE, BUG, etc.

Version 0.2.0 (2020-10-05)

This is a minor version with new features, bug fixes, deprecations, and documentation improvements.

New Features

  • Allow marking and labelling peaks in the plotting GUI.

  • File searching is more flexible, allowing for different numbers of samples and files for each dataset.

  • The window location for the plotting GUI is maintained when reopening the window.

  • The json files (previous_search.json and the figure theme files saved by the plotting GUI) now have indentation, making them more easily read and edited.

  • Figure theme files for the plotting GUI now contain a single dictionary with all relevant sections as keys. This allows expanding the data saved to the file in later versions without making breaking changes.

  • Allow selecting which characterization techniques are used when generating raw data.

Bug Fixes

  • Changed save location for previous_search.json to an OS-dependant location, so that the file is not overwritten when updating the package.

  • Allow doing peak fitting without saving to Excel.

Other Changes

  • Changed the Excel start row sent to user-defined functions by adding 2 to account for the header and subheader rows. Now formulas can directly use the start row variable, rather than having to manually add 2 each time. Changed the use_main_gui.py example program to reflect this change.

Deprecations/Breaking Changes

  • Figure theme (.figtheme) files saved with the plotting GUI in versions < 0.2.0 will not be compatible with versions >= 0.2.0.

Documentation/Examples

  • Switched from using plt.pause and a while loop to using plt.show(block=True) to keep the peak_fitting and generate_raw_data example programs running while the plots are open.

  • Made all the documentation figures have the same file extension, and made them wider so they look better in the README where their dimensions cannot be modified.

Version 0.1.2 (2020-09-15)

This is a minor patch with a critical bug fix.

Bug Fixes

  • Fixed issue using reversed() with a dictionary causing the plotting GUI to fail with Python 3.7. Used reversed(list(dictionary.keys())) instead.

Version 0.1.1 (2020-09-14)

This is a minor patch with new features, bug fixes, and documentation improvements.

New Features

  • Extended the Unicode conversion to cover any input with "\". This mainly helps with text in the plotting GUI, such as allowing multiline text using "\n", while still giving the correct behavior when using mathtext with Matplotlib.

Bug Fixes

  • Fixed how the plotting GUI handles twin axes. Now, the main axis is plotted after the twin axes so that the bounds, tick params, and grid lines work correctly for all axes.

  • Fixed an error that occurred when a DataSource object would define Excel plot indices that were larger than the number of imported and calculation columns.

  • New DataSource objects that do not provide a unique_variables input will simply have no unique variables, rather than default "x" and "y" variables.

  • Fixed an error where column labels were assigned before performing separation functions, which potentially creates labels for less data entries than there actually are.

Documentation/Examples

  • Added a more in-depth summary for the package, more explanation on the usage of the package, and screenshots of some of the guis and program outputs to the documentation.

  • Added DataSource objects with correct calculations to the example program use_main_gui.py for each of the characterization techniques covered by mcetl's raw_data.generate_raw_data function.

Version 0.1.0 (2020-09-12)

  • First release on PyPI.

License

mcetl is open source and available under the BSD 3-clause license. For more information, refer to the license.

Author

Indices and Tables