maui.eda.card_summary¶

maui.eda.card_summary(df, categories, show_plot=True)[source]¶

Generates a summary card and plots for specified categories from a DataFrame. This function processes the input DataFrame to compute various statistics, including the number of samples, distinct days, total and mean duration (in minutes) of some activities. It also dynamically incorporates additional specified categories into its computations and visualizations. If enabled, a plot is generated using Plotly to visually represent these statistics alongside the categories specified.

Parameters:

dfpandas.DataFrame: The input DataFrame containing at least the following columns: ‘file_path’, ‘dt’, and ‘duration’. Additional columns should match the specified categories if any.
categorieslist of str: A list of category names (column names in df) to include in the summary and plot. At most two categories can be specified.
show_plotbool, optional: If True (default), the function will generate and show a Plotly plot representing the calculated statistics and specified categories. If False, no plot will be displayed.

Returns:

tuple

Returns a tuple containing:

card_dict (dict): A dictionary with keys for ‘n_samples’,
‘distinct_days’, ‘total_time_duration’, ‘mean_time_duration’, and one key per category specified. The values are the respective computed statistics.
fig (plotly.graph_objs._figure.Figure): A Plotly figure object with indicators
for each of the statistics and categories specified. Only returned if show_plot is True.

Raises:

Exception: If more than two categories are specified, an exception is raised due to plotting limitations.

Notes

The function is designed to work with data pertaining to durations and occurrences across different categories. It’s particularly useful for analyzing time series or event data. The ‘duration’ column is expected to be in seconds.

Examples

>>> from maui import samples, eda
>>> df = samples.get_audio_sample(dataset="leec")
>>> categories = ['landscape', 'environment']
>>> card_dict, fig = eda.card_summary(df, categories)