maui.eda.heatmap_analysis

maui.eda.heatmap_analysis(df, x_axis, y_axis, color_continuous_scale='Viridis', show_plot=True, **kwargs)[source]

Generates a heatmap to analyze the relationship between two categorical variables in a DataFrame.

This function groups the data by the specified x_axis and y_axis categories, counts the occurrences of each group, and then creates a heatmap visualization of these counts using Plotly Express. The heatmap intensity is determined by the count of occurrences, with an option to customize the color scale.

Parameters:
dfpandas.DataFrame

The input DataFrame containing the data to be analyzed. Must include the columns specified by x_axis and y_axis, as well as a ‘file_path’ column used for counting occurrences.

x_axisstr

The name of the column in df to be used as the x-axis in the heatmap.

y_axisstr

The name of the column in df to be used as the y-axis in the heatmap.

color_continuous_scalestr, optional

The name of the color scale to use for the heatmap. Defaults to ‘Viridis’. For more options, refer to Plotly’s documentation on color scales.

show_plotbool, optional

If True (default), displays the heatmap plot. If False, the plot is not displayed but is still returned.

**kwargsdict

Additional arguments for plot customization, such as height and width.

Returns:
tuple

A tuple containing: - df_group (pandas.DataFrame): A DataFrame with the grouped counts for each combination of x_axis and y_axis values. - fig (plotly.graph_objs._figure.Figure): A Plotly figure object containing the heatmap.

Notes

The ‘file_path’ column in the input DataFrame is used to count occurrences of each group formed by the specified x_axis and y_axis values. This function is useful for visualizing the distribution and relationship between two categorical variables.

Examples

>>> from maui import samples, eda
>>> df = samples.get_audio_sample(dataset="leec")
>>> df_group, fig = eda.heatmap_analysis(df, 'landscape', 'environment')