Name: Apponix Academy
Brand: Apponix Academy
SKU: 8050580888
Rating: 5 (7985 reviews)

Table of contents

1. Why Should You Care About Missing Data?

2. Why Does Data Go Missing?

3. Different Types of Missing Data

Missing Completely at Random (MCAR)
Missing at Random (MAR)
Missing Not at Random (MNAR)

4. How to Handle Missing Data: Easy Techniques

Removing Missing Data
Filling Missing Data (Imputation)
Using Algorithms That Handle Missing Data
Creating a New Category for Missing Data
Adding a Missing Data Flag

5. Simple Example Using Python (pandas)

6. Best Tips for Handling Missing Data

7. Mistakes to Avoid

8. Tools You Can Use

9. Final Thoughts

Have you ever looked at a dataset and noticed some empty boxes or blank spaces? That’s called missing data. It happens often and can create problems in your analysis or machine learning models.

In this blog by Apponix Academy, let’s learn how to handle missing data easily, using simple language and practical tips. By the end, you’ll feel confident cleaning your data for any project.

Why Should You Care About Missing Data?

You might think, “Why does it matter if some data is missing?”

Here’s why:

Missing data can change your results and give wrong insights.
Many machine learning models don’t work if there are missing values.
Your reports and decisions will be more reliable when the data is clean.

That’s why data scientists spend a lot of time fixing missing data before any analysis.

Why Does Data Go Missing?

Data can be missing for many reasons:

Someone forgot to enter it.
There was a technical issue while recording it.
People didn’t want to share that information.

Knowing why data is missing helps you decide how to handle it properly.

Different Types of Missing Data

Here are the three types of missing data:

1. Missing Completely at Random (MCAR)

The data is missing by chance. For example, if a sensor stops working randomly and skips recording temperature for one hour.

2. Missing at Random (MAR)

The missing data is related to some other data in your dataset. For example, income is missing, but you know their job title, which can help you guess it.

3. Missing Not at Random (MNAR)

The missing data is missing for a reason related to itself. For example, people with very high incomes may not want to reveal their salary.

How to Handle Missing Data: Easy Techniques

1. Removing Missing Data

If only a few rows or columns have missing data, you can simply remove them.

Drop rows with missing values if they are not important.
Drop columns if almost all the data is missing in them.

But remember, don’t remove too much data. You might lose important information.

2. Filling Missing Data (Imputation)

If you don’t want to remove data, you can fill it with other values. This is called imputation.

a. Mean or Median

For numbers, you can fill in missing values with:

Mean: The average value
Median: The middle value (better if the data has extreme values)

Example: If someone’s age is missing, fill it with the average or median age of the group.

b. Mode

For categories like city or gender, fill in missing values with the most common value.

Example: If many people live in Bangalore and one entry is missing the city, fill it with Bangalore.

c. Forward Fill / Backward Fill

For time-based data like stock prices, use:

Forward fill: Fill the missing value with the last available value
Backward fill: Fill the missing value with the next available value

d. Constant Value

Sometimes, you can fill missing values with a constant like:

“Unknown” for categories
0 for numbers (only if it makes sense)

e. Predicting Missing Values

This is an advanced method where you use other data to predict missing values. For example, using a regression model or KNN (nearest neighbors). It takes extra effort but gives better results.

3. Using Algorithms That Handle Missing Data

Some machine learning algorithms, like XGBoost, can handle missing data on their own. But it’s still better to clean the data yourself for better control.

4. Creating a New Category for Missing Data

For categories, you can create a new value called “Missing” or “Unknown”. This way, you keep the data and let your model know it was missing.

5. Adding a Missing Data Flag

Create a new column showing if data was missing (1) or not (0). This helps your model learn patterns related to missing data.

Simple Example Using Python (pandas)

Here’s how you can handle missing data using Python:

python

CopyEdit

import pandas as pd

# Load data

df = pd.read_csv('data.csv')

# Check missing values

print(df.isnull().sum())

# Fill missing ages with median

df['Age'].fillna(df['Age'].median(), inplace=True)

# Fill missing city with mode

df['City'].fillna(df['City'].mode()[0], inplace=True)

# Drop rows where purchase is missing

df.dropna(subset=['Purchase'], inplace=True)

In the data science course in Bangalore by Apponix Academy, you will practice such techniques with real datasets.

Best Tips for Handling Missing Data

Check why the data is missing before deciding what to do.
Look at missing data patterns using tools like missingno in Python.
Don’t blindly fill in values without thinking about the business meaning.
Keep notes of what you did for future reference.

Mistakes to Avoid

Ignoring missing data completely
Removing too many rows and losing useful data
Using the mean for skewed data instead of the median
Filling the target (output) variable with guesses – never do this

Tools You Can Use

Here are some tools that make handling missing data easier:

Excel: For small datasets
Python (pandas, scikit-learn): For powerful data cleaning
R (dplyr, tidyr, mice): For statistical imputations
Power Query (Excel/Power BI): For BI workflows
OpenRefine: For text data cleaning

At Apponix Academy, we teach these tools step by step to build your practical confidence.

Final Thoughts

Missing data is a normal part of working with real-world data. Don’t be scared of it. Just remember:

Understand why data is missing
Choose the best way to handle it
Keep your data clean for better results

If you want to learn data cleaning, data preparation, and data science skills from scratch, join the data science course in Bangalore offered by Apponix Academy. You will practice these techniques with expert guidance and real projects.

Key Takeaways

Missing data can affect your analysis results.
Handle missing data by removing, filling, predicting, or creating missing flags.
Practice these techniques to become confident in data preparation.

How to Handle Missing Data: Best Practices in Data Preparation