maui.eda.duration_distribution¶
- maui.eda.duration_distribution(df, time_unit='s', show_plot=True)[source]¶
Generates a distribution plot for the ‘duration’ column in the provided DataFrame.
This function creates a distribution plot, including a histogram and a kernel density estimate (KDE), for the ‘duration’ column in the input DataFrame. It is designed to give a visual understanding of the distribution of duration values across the dataset.
- Parameters:
- dfpandas.DataFrame
The DataFrame containing the data to be analyzed. It must include a column named ‘duration’, which contains numeric data.
- time_unit: string
The time unit of the audio duration column. It is used to make it explicit in the visualization which is the time unit. Default: ‘s’
- show_plotbool, optional
If True (default), the function will display the generated plot. If False, the plot will not be displayed but will still be returned.
- Returns:
- plotly.graph_objs._figure.Figure
A Plotly figure object representing the distribution plot of the ‘duration’ column. The plot includes both a histogram of the data and a kernel density estimate (KDE) curve.
Notes
The function uses Plotly’s create_distplot function from the plotly.figure_factory module, offering a detailed visual representation of data distribution. It’s particularly useful for analyzing the spread and skewness of numeric data. The KDE curve provides insight into the probability density of the durations, complementing the histogram’s discrete bins.
Examples
>>> from maui import samples, eda >>> df = samples.get_audio_sample(dataset="leec") >>> fig = eda.duration_distribution(df)