maui.eda.heatmap_analysis¶
- maui.eda.heatmap_analysis(df, x_axis, y_axis, color_continuous_scale='Viridis', show_plot=True, **kwargs)[source]¶
Generates a heatmap to analyze the relationship between two categorical variables in a DataFrame.
This function groups the data by the specified x_axis and y_axis categories, counts the occurrences of each group, and then creates a heatmap visualization of these counts using Plotly Express. The heatmap intensity is determined by the count of occurrences, with an option to customize the color scale.
- Parameters:
- dfpandas.DataFrame
The input DataFrame containing the data to be analyzed. Must include the columns specified by x_axis and y_axis, as well as a ‘file_path’ column used for counting occurrences.
- x_axisstr
The name of the column in df to be used as the x-axis in the heatmap.
- y_axisstr
The name of the column in df to be used as the y-axis in the heatmap.
- color_continuous_scalestr, optional
The name of the color scale to use for the heatmap. Defaults to ‘Viridis’. For more options, refer to Plotly’s documentation on color scales.
- show_plotbool, optional
If True (default), displays the heatmap plot. If False, the plot is not displayed but is still returned.
- **kwargsdict
Additional arguments for plot customization, such as height and width.
- Returns:
- tuple
A tuple containing: - df_group (pandas.DataFrame): A DataFrame with the grouped counts for each combination of x_axis and y_axis values. - fig (plotly.graph_objs._figure.Figure): A Plotly figure object containing the heatmap.
Notes
The ‘file_path’ column in the input DataFrame is used to count occurrences of each group formed by the specified x_axis and y_axis values. This function is useful for visualizing the distribution and relationship between two categorical variables.
Examples
>>> from maui import samples, eda >>> df = samples.get_audio_sample(dataset="leec") >>> df_group, fig = eda.heatmap_analysis(df, 'landscape', 'environment')