maui.eda.duration_analysis

maui.eda.duration_analysis(df, category_column, duration_column, show_plot=True)[source]

Generates a box plot visualizing the distribution of durations across different categories.

This function takes a DataFrame and creates a box plot to analyze the distribution of durations (or any numerical data) across specified categories. The box plot provides a visual representation of the central tendency, dispersion, and skewness of the data and identifies outliers.

Parameters:
dfpandas.DataFrame

The DataFrame containing the data to be analyzed. It should include at least two columns: one for the category and one for the duration (or any numerical data to be analyzed).

category_columnstr

The name of the column in df that contains the categorical data. This column will be used to group the numerical data into different categories for the box plot.

duration_columnstr

The name of the column in df that contains the numerical data to be analyzed. This data will be distributed into boxes according to the categories specified by category_column.

show_plotbool, optional

If True (default), the function will display the generated box plot. If False, the plot will not be displayed, but the figure object will still be returned.

Returns:
plotly.graph_objs._figure.Figure

The generated Plotly figure object containing the box plot. This object can be used for further customization or to display the plot at a later time if show_plot is False.

Notes

The box plot generated by this function can help identify the range, interquartile range, median, and potential outliers within each category. This visual analysis is crucial for understanding the distribution characteristics of numerical data across different groups.

Examples

>>> from maui import samples, eda
>>> df = samples.get_audio_sample(dataset="leec")
>>> fig = eda.duration_analysis(df, 'landscape', 'duration')