When starting any data science or machine learning project, one of the first and most important steps is Exploratory Data Analysis (EDA). EDA helps you understand your dataset before building models. Inside EDA, one of the most powerful techniques is Univariate Analysis.
Univariate analysis focuses on analyzing one variable at a time. It helps you understand data distribution, identify outliers, detect skewness, check frequencies, and discover patterns that may impact your final model.
Whether you are a beginner in Python, machine learning, or data analytics, mastering univariate analysis will improve your workflow significantly.
What is Univariate Analysis?
Univariate Analysis means analyzing a single column or feature in a dataset.
Examples:
- Age
- Salary
- Gender
- Marks
- Product Category
The goal is to understand:
- How values are distributed
- Whether data is balanced
- Whether outliers exist
- Whether data is skewed
- Frequency of categories
This gives clarity before moving to bivariate or multivariate analysis.
Why Univariate Analysis is Important
Many beginners jump directly into machine learning models. That is a mistake.
If you don’t understand your data first:
- Your model may perform poorly
- You may miss missing values
- Outliers can affect predictions
- Wrong assumptions waste time
Benefits:
- Better feature understanding
- Better preprocessing decisions
- Better feature engineering
- Better model performance
- Faster debugging
Best Plots for Univariate Analysis
1. Histogram
Best For: Numerical continuous data
Use When: You want to understand data distribution.
Shows:
- Frequency of values
- Normal distribution
- Skewness
- Gaps in data
Code:
sns.histplot(df["Age"])
2. KDE Plot
Best For: Smooth density curve
Use When: You want a cleaner version of histogram.
Shows:
- Data concentration
- Shape of distribution
- Peaks in data
Code:
sns.kdeplot(df["Salary"])
3. Boxplot
Best For: Outlier detection
Use When: You want to quickly detect extreme values.
Shows:
- Median
- Quartiles
- Outliers
- Spread
Code:
sns.boxplot(x=df["Price"])
4. Count Plot
Best For: Categorical data
Use When: You want category frequency counts.
Shows:
- Category frequency
- Imbalanced data
Code:
sns.countplot(x=df["Gender"])
5. Bar Plot
Best For: Comparing values across categories
Use When: You want average or summary comparison.
Code:
sns.barplot(x="Category", y="Sales", data=df)
6. Pie Chart
Best For: Simple proportion comparison
Use When: Small number of categories
Code:
plt.pie(values, labels=labels)
7. Violin Plot
Best For: Distribution + Density + Spread
Use When: You need more detail than boxplot.
Code:
sns.violinplot(x=df["Score"])
8. Rug Plot
Best For: Showing individual data points
Use When: You want to see exact value placement.
Code:
sns.rugplot(df["Age"])
How to Choose the Right Plot
If your column is numerical:
- Histogram
- KDE Plot
- Boxplot
- Violin Plot
If your column is categorical:
- Count Plot
- Bar Plot
- Pie Chart
If you want exact observations:
- Rug Plot
Final Thoughts
Univariate analysis may look simple, but it is one of the most important parts of EDA. If you understand one variable deeply, the rest of analysis becomes easier.
Good EDA = Better Decisions = Better Models.
Start mastering univariate analysis today.
Dialogue (0)
Add your thoughts